Matheus Moreno
09/19/2022, 8:33 PMreturn
statement confirm that), but it hangs at the end and never actually finishes. We were able to reproduce it in our remote server and locally. On the remote server, none of the prints (or logs) are being shown on Stackdriver. What could be happening?flytekit~=1.0.0
, so I believe we're using the latest version if compatible release is to be trustedpyflyte run
. This 7 None
happens right before the return statement (None
is the value of that statement)Ketan (kumare3)
Matheus Moreno
09/19/2022, 9:08 PMKetan (kumare3)
Matheus Moreno
09/19/2022, 9:09 PMNone
, and that does not send its return value to another variable in the workflow spec. The workflow also returns None
.Eduardo Apolinario (eapolinario)
09/19/2022, 9:09 PMMatheus Moreno
09/19/2022, 9:09 PMpyflyte run
Yee
return 5
and make the signature an int?Matheus Moreno
09/19/2022, 9:16 PM@extended_task(integrations=['gcloud'], requests=Resources(mem='4Gi'))
def update_bq_table(
amnt_dataframe: pd.DataFrame,
gcs_config_path: str
) -> None:
config_dict = read_file(gcs_config_path)
update_gbq_table( # Function that calls pandas_gbq.to_gbq()
amnt_dataframe,
config_dict['table_schema'],
config_dict['table_destination']
)
@extended_task
is a special decorator we use that does some pre and post-processing on tasks. Other tasks with it are running fine; the 7 None
on my screenshot above is being called on the wrapper, after a output = task_func(*args, **kwargs)
and before a return output
.Yee
Matheus Moreno
09/19/2022, 9:18 PM@workflow
def main_workflow(
hotel_amnt_sql_path: str,
config_path: str,
config_pre_process_path: str,
model_config_path: str
) -> None:
preview_amnt = ...
# Some other tasks, all with <output> = <function call>
update_bq_table(
amnt_dataframe = hotel_topics,
gcs_config_path = config_path
)
Sérgio de Melo Barreto Junior
09/20/2022, 4:05 PMYee
Matheus Moreno
09/20/2022, 5:09 PMYee
print(amnt_dataframe.describe().to_html())
and keep all the print(7)
sSérgio de Melo Barreto Junior
09/20/2022, 8:03 PMprint(amnt_dataframe.describe().to_html())
Yee
.to_html()
?Sérgio de Melo Barreto Junior
09/20/2022, 8:42 PMEduardo Apolinario (eapolinario)
09/20/2022, 8:45 PMYee
Sérgio de Melo Barreto Junior
09/20/2022, 8:46 PMYee
Sérgio de Melo Barreto Junior
09/20/2022, 8:48 PMYee
Sérgio de Melo Barreto Junior
09/20/2022, 8:53 PMYee
from flytekit.deck.renderer import TopFrameRenderer
from typing_extensions import Annotated
and then make the task like
@task
def mytask() -> Annotated[pd.DataFrame, TopFrameRenderer(10)]: ...
that should make it so that the renderer used just grabs the first 10 rowsSérgio de Melo Barreto Junior
09/22/2022, 7:32 PMYee
Eduardo Apolinario (eapolinario)
09/23/2022, 3:02 AM❯ ipython
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.read_parquet("/home/eduardo/Downloads/amnt_dataframe.parquet.gzip")
Out[2]:
generic_sku topics
0 HT-0008-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST2, ST13, ST4, ST3]
1 HT-000M-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST2, ST7, ST13, ST4, ST3]
2 HT-000W-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST12, ST10, ST7, ST4, ST1]
3 HT-000X-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST13, ST4, ST1]
4 HT-000Z-0-0-0-0-0-0-0-0-0 [ST5, ST10, ST13, ST4, ST1, ST3]
... ... ...
237852 HT-ZZY9-0-0-0-0-0-0-0-0-0 [ST1, ST4]
237853 HT-ZZYC-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST2, ST12, ST7, ST13, ST4, ST1, ST3]
237854 HT-ZZYZ-0-0-0-0-0-0-0-0-0 [ST5, ST10, ST13, ST4, ST1]
237855 HT-ZZZ2-0-0-0-0-0-0-0-0-0 [ST5, ST12, ST7, ST4, ST1, ST3]
237856 HT-ZZZJ-0-0-0-0-0-0-0-0-0 [ST4, ST5, ST6, ST2]
[237857 rows x 2 columns]
In [3]: df = pd.read_parquet("/home/eduardo/Downloads/amnt_dataframe.parquet.gzip")
In [4]: df.describe()
Matheus Moreno
09/23/2022, 2:00 PM.describe()
?Ketan (kumare3)
Eduardo Apolinario (eapolinario)
09/23/2022, 5:04 PMTopFrameRenderer
) does not run describe
, instead it turns a fixed number of rows directly into html: https://github.com/flyteorg/flytekit/blob/3cf063955907957de65b035066fe415503a9bd65/flytekit/deck/renderer.py#L17-L27Rahul Mehta
10/12/2022, 9:23 PMDataFrame
in a task. We were previously able to run this in ~2 mins on flytekit 1.1.x, but since upgrading this stage is stalling for over 2-3 hrs. It also takes ~90 seconds to read this dataframe in a jupyter notebook
We're noticing an interesting memory usage pattern here as well w/ memory inching upwards as the task executes. The CPU (currently 1) is maxed out towards the start of execution
Any thoughts on what might have caused this? We're also about to try rolling back flytekit to see if that resolves thingsEduardo Apolinario (eapolinario)
10/12/2022, 9:57 PMdescribe
takes a long time to run). Can you say more about what you're seeing (in a separate thread)?