acoustic-carpenter-78188
11/02/2023, 9:25 PM@task
decorated task in a Jupyter notebook, and be able to register and run it from the same Jupyter notebook.
Details
This is part of the data-scientist first story we are pursuing in H2 2022 - continuing to tie together as seamlessly as possible the local/back-end execution story in support of a data science first iteration cycle.
This should probably happen through the FlyteRemote experience. We are thinking of using cloud pickle to pickle the task and ship it off to S3 much like how script mode works.
Some questions to think about
• Does cloud pickle actually work? What are the limitations and how can we detect/prevent the user from making them (for example, cloud pickle probably doesn't work if you're pickling a function that has an import statement inside it). We should at least list out all these limitations and make the user aware of them.
• What will be the task command, specifically how will the task resolver bit work? Unlike in script-mode, which merely updated the code and relied on the same usual python-module/task-finding mechanisms as in a regular non-fast run, this will not only need to download the file of the pickled code from S3, it will also need to unpickle it as part of the resolver process. There's already a cloudpickle resolver, can we use that?
Playing around with cloudpickle
Testing notes
Cursory testing was done by going back and forth between a jupyter notebook with one virtual environment and running code in PyCharm with another.
The version of Python has to match (from docs). The version of cloudpickle also needs to match. Had issues between v2.0 and v2.1 of cloudpickle where something written by 2.1 was not readable by 2.0. Did not try the other way around, but probably best to not assume it will always work. We should just aim for minor version matching.
If in the jupyter notebook I did from dyn_task import t1
and then used t1
inside a dynamic task, and I then pickled the dynamic task, both the jupyter notebook and the Pycharm instance were able to unpickle and run it. If I did import dyn_task
and then did dyn_task.t1
in the dynamic task, and then pickled it, it only worked in Jupyter, not in Pycharm.
If you then add cloudpickle.register_pickle_by_value(dyn_task)
then again both work.
Resources
• GH repo for cloudpickle. The serialize by value or reference discussion is relevant.
• The background section of this stackoverflow thread is a good short read.
Misc
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteacoustic-carpenter-78188
11/02/2023, 9:25 PM