:wave: when using `pyflyte serialize workflows` is...
# flyte-support
g
πŸ‘‹ when using
pyflyte serialize workflows
is it possible to constrain its memory utilization? we're seeing this OOM quite frequently
f
how many workflows are you folks serializing?
also oom - do you run this in a container?
g
cc @thankful-journalist-40373 is it caused by serializing the workflow in parallel?
t
Hey, thanks for reporting. Perhaps you can create an issue on github with the number of entities you're trying to serialize and your machine setup. @glamorous-carpet-83516 From what I recall, serialization doesn't involve non-blocking mechanism, only registration does.
g
hmm it should only be one workflow - we only set one
workflow_packages
in the config file and we run this on a devbox ec2 instance which has about 32 Gis of memory
πŸ‘€ 1
f
What it’s taking more than 32G this is not right
g
hmm actually it looks like available memory sits around 16 Gis
serialization doesn't involve non-blocking mechanism
from the logs it always gets killed at the same spot btw, these are the final logs
Copy code
from pyarrow import HadoopFileSystem
278147
Killed
above this is all normal flytekit warnings like turn off postponed annotations, etc.
f
What!!!! Ohh man arrow dependency
πŸ‘€ 1
g
if this step works it looks like this
Copy code
from pyarrow import HadoopFileSystem
196688
17639897
===== experiment_0 =====
              merchant_id
it prints out two numbers and then prints some tabulated experiment data which I assume is specific to our workflow πŸ€”
oh yeah this is the log right before the arrow thing
Copy code
papermill/papermill/papermill/iorw.py:50: FutureWarning: pyarrow.HadoopFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
looks like it comes from papermill
I'm kinda curious what step of serialization this could be πŸ˜…
f
Is it papermill or your data? Are you loading something at module level
πŸ‘€ 1
g
ah your suspicion is that code outside the workflow/task decorators are being run here and bloating up memory?
hmm we don't really have code like that
f
Just checking
πŸ‘ 1
g
so it turns out ketan was right, our user actually did have a script with code outside any function/class context that was causing the bloat πŸ˜…
βœ… 2
we wrapped it in
if __name__ == "__main__"
and that worked, thank you for the insights!
πŸ‘ 3