https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Alykhan Tejani

11/30/2023, 6:05 PM
Hi all, I am trying to use ktensorflow.TFJob wit a custom image. However I'm not sure what exactly gets executed. So here is a rough outline of the code:
Copy code
@task(
    task_config=TfJob(
        chief=Chief(replicas=1, image=<some_image>),
        ps=PS(replicas=0, image=<some_image>),
        worker=Worker(replicas=1, image=<some_image>),
    ),
)
def func(some_arg: int):
    do somethign...
k

Kevin Su

11/30/2023, 6:13 PM
does the image <some_image> get run on each pod
yes, all the pods will use same image worker and master will run the same code https://www.tensorflow.org/guide/keras/distributed_training
you can pass an input from UI or use flytekit remote
or create a workflow, like @workflow def wf() x = create_output() training(x=x)
a

Alykhan Tejani

11/30/2023, 6:16 PM
Thanks for your responses. so my current setup builds the image for workers/chief etc and then spawns a TFJob so that those images get executed and TF_CONFIG is set properly by flyte. This is one step of my workflow. The problem Im running into is that the image that is being run for chief/worker 's entrypoint expects cmdline args. How can I pass these in via the definition of the TFJob task?
Thanks in advance for any help
so Im using absl flags and seeing
Copy code
Unknown command line flag 'inputs'
which makes me think flyte is passing in some flags already
ok so I think what is happening is that flyte creates a container from this image and then runs pyflyte-execute.
So my image cant have an entrypoint thats an executable
k

Kevin Su

11/30/2023, 6:51 PM
yes, you can’t override the entrypoint
a

Alykhan Tejani

11/30/2023, 6:51 PM
ok trying to figure out how to do this in my toolchain (bazel docker io)
k

Kevin Su

11/30/2023, 6:51 PM
what is absl? what do you want to run? download data?
a

Alykhan Tejani

11/30/2023, 6:52 PM
absl is a command line parser
k

Kevin Su

11/30/2023, 6:55 PM
you want to use it in the pod running TF training?
a

Alykhan Tejani

11/30/2023, 6:55 PM
yes
k

Kevin Su

11/30/2023, 6:56 PM
a

Alykhan Tejani

11/30/2023, 7:06 PM
would this init container then run my custom image?
with args passed in
k

Kevin Su

11/30/2023, 7:10 PM
yes, you can use any image and args
a

Alykhan Tejani

11/30/2023, 7:15 PM
can I alternatively not specify a custom image but make the TFJob decorate a container task
in the running container will TF_CONFIG be set?