Hello dear Flyte community ! I am starting with Fl...
# ask-the-community
n
Hello dear Flyte community ! I am starting with Flyte and could experiment a bit with it. There are still some blurry parts for me tho. Maybe you will be able to help me 🙂 The steps for the development to the workflow registration are, for me : • Python venv with all necessary dependences • create the python filewith the worflow, defining the needed docker images • register the workflow specifying the docker image if necessary The problem here is that, There is kinda a duplicate work with the docker image (which included all the dependences), and the python venv (which also need to include all the dependencies). Is there a way, to do this work only once ? I was thinking to build the docker image, and register the workflow inside the docker container, but there's some errors, even with host network. Does anyone have a better way to develop workflows ? Or maybe you will explain how the venv+container way of developing is the best way to do so ahah 😉 Thank you ! 🙂
k
you could use image spec, new feature in flytekit, which enables you to build a docker image without dockerfile. https://docs.flyte.org/projects/cookbook/en/latest/auto_examples/image_spec/image_spec.html
n
But I still need to work from a environement with all the dependencies right ?
k
yes. I guess you want flyte somehow copy the dependencies in the local venv to the container, so you don’t need specify all the dependencies in the dockerfile or image spec?
n
I am just used to workflow engine, where you work inside a container, and you just tell the workflow engine to use that container. You kinda have 2 times the same environement in Flyte if I understand well how it works
But I see, I will try this image spec functionality it might save a bit of time 🙂
And above all, if you install something from source, in a more complex way than just a pip install or similar, you have to do it in the dockerfile AND in the developement environement you will register with, which is a bit time consuming
y
you can register from within the container yes.
we used to do this at lyft… but it didn’t match our users workflow too well. image building took too long, and almost everyone had a local version of dependencies installed.
and for large workflows with different requirements, users could then split up their workflows so that different tasks ran with different images.
if you want to do this just have to figure out the networking. what’s the network error?
n
Hi @Yee I'm gonna try some more with the docker container used to run the entire workflow, and tell you if I encounter any difficulties. When you say for large workflow, using different images, how would you do it ? As when registering the worflow, you need to do it in an environnement containing allll dependencies. What if one task running with a specific image has conflict with another task running with a different image ?
When registering the workflow, it seems that pyflyte struggles with symlink. It says that a file is not existing, but it does through symlink. Have you experienced that ? Is there a way to overcome that issue ? I might post it outside this thread as it may help others
y
symlink follow doesn’t exist i don’t think. it’s something we need to add
“environnement containing allll dependencies” yeah exactly. it was just that in our experience users tend to have a giant local env with everything.
conflict - split the registration flow into two steps/based on two envs? or maybe if you can stand the cognitive dissonance and the registration part works fine, then register with the wrong version for one of the tasks (knowing that at execution time, the correct version will be used for both)
n
Thank you for your answers!
Concerning the use of the docker image itself to register the workflow. The error I am facing is :
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:30080: recvmsg:Connection reset by peer"
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:30080: recvmsg:Connection reset by peer
Entering the docker running (sharing host networking with the container):
docker run -it --net=host --mount type=bind,source=/home/<user>,target=/home/<user> localhost:30000/<image> bash
And registering the workflow doing :
pyflyte --verbose register /home/<user>/Documents/data/Flyte/flytesnacks/venvtensorflow/extractWf/WorkflowExtraction.py --image localhost:30000/<image>:latest
I don't know if you remember how you were doing the networking when using the docker container to register the workflow? Thank you again 🙂