sparse-window-1536
09/19/2022, 8:33 PMreturn
statement confirm that), but it hangs at the end and never actually finishes. We were able to reproduce it in our remote server and locally. On the remote server, none of the prints (or logs) are being shown on Stackdriver. What could be happening?sparse-window-1536
09/19/2022, 8:34 PMflytekit~=1.0.0
, so I believe we're using the latest version if compatible release is to be trustedsparse-window-1536
09/19/2022, 8:36 PMpyflyte run
. This 7 None
happens right before the return statement (None
is the value of that statement)freezing-airport-6809
freezing-airport-6809
sparse-window-1536
09/19/2022, 9:08 PMfreezing-airport-6809
sparse-window-1536
09/19/2022, 9:09 PMNone
, and that does not send its return value to another variable in the workflow spec. The workflow also returns None
.high-accountant-32689
09/19/2022, 9:09 PMsparse-window-1536
09/19/2022, 9:09 PMpyflyte run
thankful-minister-83577
return 5
and make the signature an int?thankful-minister-83577
sparse-window-1536
09/19/2022, 9:16 PM@extended_task(integrations=['gcloud'], requests=Resources(mem='4Gi'))
def update_bq_table(
amnt_dataframe: pd.DataFrame,
gcs_config_path: str
) -> None:
config_dict = read_file(gcs_config_path)
update_gbq_table( # Function that calls pandas_gbq.to_gbq()
amnt_dataframe,
config_dict['table_schema'],
config_dict['table_destination']
)
@extended_task
is a special decorator we use that does some pre and post-processing on tasks. Other tasks with it are running fine; the 7 None
on my screenshot above is being called on the wrapper, after a output = task_func(*args, **kwargs)
and before a return output
.thankful-minister-83577
thankful-minister-83577
sparse-window-1536
09/19/2022, 9:18 PM@workflow
def main_workflow(
hotel_amnt_sql_path: str,
config_path: str,
config_pre_process_path: str,
model_config_path: str
) -> None:
preview_amnt = ...
# Some other tasks, all with <output> = <function call>
update_bq_table(
amnt_dataframe = hotel_topics,
gcs_config_path = config_path
)
sparse-window-1536
09/19/2022, 9:18 PMabundant-night-96152
09/20/2022, 4:05 PMthankful-minister-83577
sparse-window-1536
09/20/2022, 5:09 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
print(amnt_dataframe.describe().to_html())
and keep all the print(7)
sabundant-night-96152
09/20/2022, 8:03 PMabundant-night-96152
09/20/2022, 8:11 PMprint(amnt_dataframe.describe().to_html())
thankful-minister-83577
thankful-minister-83577
.to_html()
?thankful-minister-83577
abundant-night-96152
09/20/2022, 8:42 PMhigh-accountant-32689
09/20/2022, 8:45 PMthankful-minister-83577
abundant-night-96152
09/20/2022, 8:46 PMthankful-minister-83577
abundant-night-96152
09/20/2022, 8:48 PMthankful-minister-83577
thankful-minister-83577
abundant-night-96152
09/20/2022, 8:53 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
from flytekit.deck.renderer import TopFrameRenderer
from typing_extensions import Annotated
and then make the task like
@task
def mytask() -> Annotated[pd.DataFrame, TopFrameRenderer(10)]: ...
that should make it so that the renderer used just grabs the first 10 rowsthankful-minister-83577
thankful-minister-83577
abundant-night-96152
09/20/2022, 11:15 PMabundant-night-96152
09/22/2022, 7:32 PMthankful-minister-83577
thankful-minister-83577
high-accountant-32689
09/23/2022, 3:02 AM❯ ipython
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.read_parquet("/home/eduardo/Downloads/amnt_dataframe.parquet.gzip")
Out[2]:
generic_sku topics
0 HT-0008-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST2, ST13, ST4, ST3]
1 HT-000M-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST2, ST7, ST13, ST4, ST3]
2 HT-000W-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST12, ST10, ST7, ST4, ST1]
3 HT-000X-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST13, ST4, ST1]
4 HT-000Z-0-0-0-0-0-0-0-0-0 [ST5, ST10, ST13, ST4, ST1, ST3]
... ... ...
237852 HT-ZZY9-0-0-0-0-0-0-0-0-0 [ST1, ST4]
237853 HT-ZZYC-0-0-0-0-0-0-0-0-0 [ST5, ST6, ST2, ST12, ST7, ST13, ST4, ST1, ST3]
237854 HT-ZZYZ-0-0-0-0-0-0-0-0-0 [ST5, ST10, ST13, ST4, ST1]
237855 HT-ZZZ2-0-0-0-0-0-0-0-0-0 [ST5, ST12, ST7, ST4, ST1, ST3]
237856 HT-ZZZJ-0-0-0-0-0-0-0-0-0 [ST4, ST5, ST6, ST2]
[237857 rows x 2 columns]
In [3]: df = pd.read_parquet("/home/eduardo/Downloads/amnt_dataframe.parquet.gzip")
In [4]: df.describe()
sparse-window-1536
09/23/2022, 2:00 PMsparse-window-1536
09/23/2022, 2:01 PM.describe()
?freezing-airport-6809
high-accountant-32689
09/23/2022, 5:04 PMTopFrameRenderer
) does not run describe
, instead it turns a fixed number of rows directly into html: https://github.com/flyteorg/flytekit/blob/3cf063955907957de65b035066fe415503a9bd65/flytekit/deck/renderer.py#L17-L27elegant-australia-91422
10/12/2022, 9:23 PMDataFrame
in a task. We were previously able to run this in ~2 mins on flytekit 1.1.x, but since upgrading this stage is stalling for over 2-3 hrs. It also takes ~90 seconds to read this dataframe in a jupyter notebook
We're noticing an interesting memory usage pattern here as well w/ memory inching upwards as the task executes. The CPU (currently 1) is maxed out towards the start of execution
Any thoughts on what might have caused this? We're also about to try rolling back flytekit to see if that resolves thingshigh-accountant-32689
10/12/2022, 9:57 PMdescribe
takes a long time to run). Can you say more about what you're seeing (in a separate thread)?