Hi In a model pipeline the tasks are taking longer time to r Flyte #flyte-support

Hi, In a model pipeline, the tasks are taking lon...

nice-judge-43061

11/17/2022, 1:58 PM

Hi, In a model pipeline, the tasks are taking longer time to run than usual. A func. which would take ~2 mins to run is taking 35-40 min when using flyte. When I checked the cloudwatch logs, I see that there is a dormant period in every task which is taking up >98% of run time. In the image attached below, after 18 min (@ 17:27), the actual process is running and it takes seconds to complete. In between the initialization and the actual code run, when I checked the logs, these are available:

flyteadmin-c9cccb94d-fcq5s_flyte_flyteadmin-51bb75904a0581e506a547fc4e434e29b564ec2bcf9d41c343575ffdb29c5bc4

flytepropeller-cbf5cb9f-9kn76_flyte_flytepropeller-b5c4de694b13a1687857b30012166b8218f8ddd20e141f94492b441fa7fdf425

flyteconsole-5c7b456999-jqt42_flyte_flyteconsole-7f461bce1be5bcf45010474a20c7b6602ed0892374d968838ca2ed72f9c1d878

Are these contributing to the increased runtime ?

hallowed-mouse-14616

11/17/2022, 3:29 PM

@nice-judge-43061 this is very unusual. Flyte does have some unavoidable overhead to ensure reproducability, scalability, etc but these are typically minimal and account for a relatively small portion of the overall runtime. Do you have access to the k8s Pod information that Flyte uses to execute the task? I'm interested in the Pods start / end times and then the container start / end times as well.

hallowed-mouse-14616

11/17/2022, 3:31 PM

Also, how big is the data sent as inputs to the task and what is the type?

glamorous-carpet-83516

11/18/2022, 12:47 AM

Does the task produce a dataframe? if so, it might be the flyte deck issue.

nice-judge-43061

11/18/2022, 5:00 AM

@hallowed-mouse-14616 The shape of the dataframe is (25000,12000). I don't have the exact timestamps of the pods. But as soon as the task starts running, I checked using the

kubectl get pods

and it shows the respective pod is running and on describing the pod

kubectl describe pod

it is showing this:

nice-judge-43061

11/18/2022, 5:00 AM

@glamorous-carpet-83516 Yes,The output of a task is returned as dataframe & is given as input to the next task. Can you elaborate on the flyte deck issue ?

glamorous-carpet-83516

11/18/2022, 5:07 AM

The problem is that flytekit will create a very big HTML file that renders a dataframe table. you can disable it by

@task(disable_deck=True)

, or you can render top 10 rows only

Copy code

@task
def t1() -> typing.Annotated[pd.DataFrame, TopFrameRenderer(10)]:

glamorous-carpet-83516

11/18/2022, 5:08 AM

we have disabled flyte deck by default since flytekit v1.2.3.

nice-judge-43061

11/18/2022, 9:16 AM

Ok.

hallowed-mouse-14616

11/23/2022, 4:39 PM

@glamorous-carpet-83516 / @thankful-minister-83577 / @high-accountant-32689 we have been diving into this a bit and just had a few flytekit questions. Basic context: There is a task which has about 1G - 2G data on the low-end for input / output values. Flytepropeller is executing the task and the container is immediately created to start running user code. This runs for 10+ minutes (which is longer than expected - locally around 2-3min). Once the container completes the cloudwatch logs, which are output from the task, are all dumped with the same timestamp. A few questions here: (1) are the cloudwatch logs captured in real-time or once the task completes they are all gathered and written? (2) how can we better profile the execution duration from a flytekit-side? users tried to manually write the dataframe inputs / outputs to S3 and then pass file paths between tasks with some performance improvements, but still not as fast as hoped.

thankful-minister-83577

11/23/2022, 5:27 PM

1. so cloudwatch logs are just a capture of the output from the container. there’s some agent on the nodes that will take container logs and push them up to cloudwatch. these agents work in real time afaik, with some delay. so it should be as the logging happens. everything with the same time stamp is a bit weird. can you try again @nice-judge-43061 with kubectl logs? if k logs is showing incremental text, but cloudwatch doesn’t then it’s something with the agent. if it’s also nothing until done then this is a deeper issue that we should look at. how many log statements are there? maybe flushing of the buffer needs to happen more frequently.

thankful-minister-83577

11/23/2022, 5:30 PM

2. i think we should talk about this @hallowed-mouse-14616 - maybe there’s something we can tie into the stuff you’re doing on the propeller side. in general i just increase the verbosity of the logging by setting

FLYTE_SDK_LOGGING_LEVEL=10

and looking at logging levels. but if you’re saying they’re all the same timestamp then that’s not really helpful. be nice to have a timeline in flyte deck though right?

💯 1

high-accountant-32689

11/23/2022, 6:16 PM

also, re 1. how are the logs being generated? flytekit's loggers are not buffered, but I've seen

print

statements get buffered.

nice-judge-43061

11/30/2022, 4:52 AM

@thankful-minister-83577 The flushing of the buffer is currently setup to happen once in 5 minutes. @high-accountant-32689 Yes, print statements are being used in this case. And they are reflected in the cloudwatch log of the respective pod with the same timestamp.

thankful-minister-83577

11/30/2022, 4:55 PM

what version of flytekit is this?

thankful-minister-83577

11/30/2022, 4:56 PM

and is it every task in the workflow?

thankful-minister-83577

11/30/2022, 4:56 PM

or only some of them?

182 Views

Open in Slack

Previous Next