Hello, I have some common libraries (including pys...
# ask-the-community
Hello, I have some common libraries (including pyspark) from both public and private repos that I would like all my flyte workflows to share. To do that, I would like to build a custom flyte docker image. However, I am not able to find a clear documentation on how to do it and how to refer to the custom image when running or registering my flows. Could you please help? Thanks a lot.
@Frank Shen, first of all, sorry for the confusion in the docs. You have a few options to specify the image during registration, if you're using any of our CLIs you can follow https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/deploying_workflows.html#register-your-workflows-and-tasks
but if you're following a more programmatic approach (in python), you can use flyteremote to register your workflows/tasks: https://docs.flyte.org/projects/flytekit/en/latest/design/control_plane.html#registering-entities
@Eduardo Apolinario (eapolinario), thank you. I need to figure out how to build a custom flyte docker image. Are you aware of any git repo to start the process?
if you follow https://docs.flyte.org/projects/cookbook/en/latest/auto/larger_apps/index.html you'll notice that you can use
pyflyte init
to get scaffolding for a flyte project, including a dockerfile
then it's a matter of using building an image and pushing it to a docker registry accessible to your flyte deployment.
Thanks @Eduardo Apolinario (eapolinario). I noticed that in the standard flyte image, spark jobs cannot be run due to missing flyte spark modules, is that correct?
If so, what spark dependencies do I need to specify in the requirements.txt? e.g. flytekitplugins-spark==1.2.1
yes, the flytekit plugin packages don’t exist in the default image.
@Eduardo Apolinario (eapolinario), Do I need flytekit==1.1.3 in the requirements.txt?
flytekit 1.2.1 is the latest, @Frank Shen. And yes, flytekitplugins-spark==1.2.1 should suffice.