Hi! I have a project directory with the following ...
# flyte-support
f
Hi! I have a project directory with the following structure:
Copy code
├── conf
├── data
└── src
When I register my workflow (I want to use fast registration), I need files in
conf
and
data
TO BE PRESENT during execution. The workflow itself is defined under
src/my_package/workflows/workflow.py
. At execution time in Flyte it seems like only files from
src
are present in the container. How to include additional files outside of the src folder? They change too during interations on the workflow. I expected that's a basic scenario that is supported to enable fast development cycle (as stated here https://docs.flyte.org/en/latest/flyte_fundamentals/registering_workflows.html#fast-registration ) Right now I'm using
pyflyte register src/my_package/workflows
in order to register the workflow. When I use `pyflyte register `pwd`` logs from CLI show that wrong root is detected (parent folder of my current one, dunno why).
t
yeah it’s trying to be intelligent, but maybe that’s not enough. when pyflyte runs to pickup/detect all your tasks/workflows, it has to load your code, and it also needs to assign unique and relatively correct names to all those tasks/workflows. the way it does it today is by using the first folder that doesn’t have an init file. (i’m assuming that’s
src
here)
once it finds that magic root folder, then that’s the one where naming starts.
that folder is also assumed to be the only thing you need copied, which is why you’re running into this issue.
basically, “where naming starts, is where copying starts, is how in container import is run”
that’s the cleanest semantic we could come up with.
basically you have this right?
Copy code
Local                         Container                                                                  
                      ======================        =====================                                                      
                (cwd) .                            /root  .................... (container workdir)                             
                      ├── conf/                       ├── my.img                                                               
    (detected root)...└── src                         ├── resources                                                            
                          ├── my.img                  │   └── file.txt                                                         
                          ├── resources               └── work ......(name of entities are basically in-container import path) 
                          │   └── file.txt                ├── __init__.py                                                      
  (naming starts here)....└── work                        ├── tt.py                                                            
                              ├── __init__.py             └── wfs.py.........(has `from <http://work.tt|work.tt> import ...`)                   
                              ├── tt.py                                                                                        
                              └── wfs.py
if you register this you get
Copy code
[✔] Registration work.tt.create_large_list type TASK successful with version XW5O98buTy-eqfLoLSwGFA
[✔] Registration work.wfs.make_large_list type WORKFLOW successful with version XW5O98buTy-eqfLoLSwGFA
[✔] Registration work.wfs.make_large_list type LAUNCH_PLAN successful with version XW5O98buTy-eqfLoLSwGFA
if we were to copy everything from $(cwd) and extract into $(workdir) (effectively mirroring the structure), you’d basically have to export pythonpath=/root/src inside the container. and flytekit probably has to be the one to do that.
which means keeping track of, from cwd, where to find user code.
cc @flaky-parrot-42438 re dest dir discussion
as a workaround for now @flat-waiter-82487 maybe just add a symlink to the conf/ and data/ folders you want inside src? and then add
--deref-symlinks
to your register command?
f
Thanks for your response @thankful-minister-83577! It would be nice to at least allow power-users to override the magic behaviour, WDYT? 🤔 Symlinks are not good option for me at the moment. What I've come up with is to fool pyflyte that I have workflows in multiple dirs:
Copy code
pyflyte register src data conf
then the effective
cwd
is begin copied into the container and I add
/root/src
to
PYTHONPATH
in the Dockerfile before. It seems to work that way. 🤷
f
@flat-waiter-82487 Does your workflow involve running python code locally that is in
src
?
f
Of course
f
For local execution, what does the python import statements look like it
src/my_package/workflows/workflow.py
?
f
from my_package.sub_package import my_function
Any more hints @thankful-minister-83577 / @flaky-parrot-42438?
t
can you cut an issue for this? we’ll have to think about this more. i feel like it’s one of those features that’s kinda fraught with edge cases and it’s a bit troublesome maybe that it basically requires a pythonpath setting inside the container beyond just current dir
👍 1
you’re basically asking flytekit to keep track of one more folder
@flat-waiter-82487 what do you think about something like this? https://github.com/flyteorg/flytekit/pull/2715
you’ll have to switch to using imagespec if you aren’t, but i think that’s okay
it doesn’t quite do what you want, but it does allow you to explicitly copy in additional files/folders you want
thank you @glamorous-carpet-83516 for the idea
f
I have my own CI for building the docker images, so I guess no
t
i see, okay, i’ll keep thinking about it
🙏 1