https://flyte.org logo
Title
f

Frank Shen

11/22/2022, 6:19 PM
I am using Ray with Modin to process large dataset in my workflow. Therefore I use modin.pandas.DataFrame & modin.pandas.Series instead of pandas version of the DataFrame & Series in my task’s input params and return values. However, the data serialization error messages I got below suggested that modin.pandas.DataFrame & modin.pandas.Series are not supported by Flyte yet. Am I correct? I think Ray with Modin is an high impact feature since Flyte team wants to support Ray. What will be the process of submitting a Change Request? Thanks. CC: @Kevin Su @Eduardo Apolinario (eapolinario)
flytekit.exceptions.scopes.FlyteScopedUserException: Could not find a renderer for <class 'modin.pandas.dataframe.DataFrame'>
...
  File ".../flytekit/types/structured/structured_dataset.py", line 699, in to_html
    raise NotImplementedError(f"Could not find a renderer for {type(df)} in {self.Renderers}")
NotImplementedError: Could not find a renderer for <class 'modin.pandas.dataframe.DataFrame'> in {<class 'pandas.core.frame.DataFrame'>: <flytekit.deck.renderer.TopFrameRenderer object at 0x
e

Eduardo Apolinario (eapolinario)

11/22/2022, 6:54 PM
@Frank Shen, can you confirm which version of flytekit you're running? From the stack trace it looks like this is specific to flytedecks, but those should have been disabled as of flytekit 1.2.3
That doesn't mean we shouldn't support decks for modin dataframes of course, only that this should be tracked separately.
f

Frank Shen

11/22/2022, 6:58 PM
Hi @Eduardo Apolinario (eapolinario), I am using flytekit 1.2.0
e

Eduardo Apolinario (eapolinario)

11/22/2022, 6:59 PM
ok, can you do one of two things to unblock you. Either: 1. set
disable_deck=True
in the task definition 2. update to flytekit 1.2.3 and try again?
f

Frank Shen

11/22/2022, 7:00 PM
@Eduardo Apolinario (eapolinario), do you mean modin DataFrame, etc. has already been supported by Flyte?
k

Kevin Su

11/22/2022, 7:01 PM
Seems like we forget to register a renderer in the modin plugin, I’ll create a pr shortly.
e

Eduardo Apolinario (eapolinario)

11/22/2022, 7:05 PM
@Frank Shen, just want to flag in case it wasn't very clear, but https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/flytekit_plugins/modin_examples/knn_classifier.html#knn-classifier is an example of using ray and modin.
f

Frank Shen

11/22/2022, 7:08 PM
Oh, I haven’t install flytekitplugins-modin yet. Thanks.
@Kevin Su, @Eduardo Apolinario (eapolinario) installing flytekitplugins-modin causing downgrade of flytekit from 1.2.0 to 0.32.6. What have I done wrong?
Successfully uninstalled flytekit-1.2.0
Successfully installed checksumdir-1.2.0 flytekit-0.32.6 flytekitplugins-modin-0.31.0
e

Eduardo Apolinario (eapolinario)

11/22/2022, 10:03 PM
@Frank Shen, how did you install it?
f

Frank Shen

11/22/2022, 10:06 PM
pip install flytekitplugins-modin
e

Eduardo Apolinario (eapolinario)

11/22/2022, 10:09 PM
@Frank Shen, can you force a version? Something like
pip install flytekitplugins-modin==1.2.4 flytekit==1.2.4
f

Frank Shen

11/22/2022, 10:11 PM
@Eduardo Apolinario (eapolinario), that won’t work, because we are also using flytekitplugins-snowflake, and flytekitplugins-snowflake requires flytekit<1.2.0 and >=1.1.0b0
The conflict is caused by:
    The user requested flytekit>=1.2.3
    flytekitplugins-snowflake 1.1.1 depends on flytekit<1.2.0 and >=1.1.0b0
    The user requested flytekit>=1.2.3
    flytekitplugins-snowflake 1.1.0 depends on flytekit<1.2.0 and >=1.1.0b0
    The user requested flytekit>=1.2.3
    flytekitplugins-snowflake 1.0.5 depends on flytekit<1.2.0 and >=1.0.0b3
    The user requested flytekit>=1.2.3
    flytekitplugins-snowflake 1.0.4 depends on flytekit<1.2.0 and >=1.1.0b0
    The user requested flytekit>=1.2.3
e

Eduardo Apolinario (eapolinario)

11/22/2022, 10:13 PM
can you also force the snowflake plugin to the same version?
f

Frank Shen

11/22/2022, 10:13 PM
like
flytekitplugins-snowflake==1.2.4
?
e

Eduardo Apolinario (eapolinario)

11/22/2022, 10:13 PM
yeah, something like
pip install flytekitplugins-modin==1.2.4 flytekit==1.2.4 flytekitplugins-snowflake==1.2.4
f

Frank Shen

11/22/2022, 10:14 PM
I think the highest flytekitplugins-snowflake version is 1.1.1, am I wrong? How could I confirm if flytekitplugins-snowflake 1.2.4 exists?
e

Eduardo Apolinario (eapolinario)

11/22/2022, 10:15 PM
f

Frank Shen

11/22/2022, 10:18 PM
I see. I will try right now. Thank you @Eduardo Apolinario (eapolinario)!
@Eduardo Apolinario (eapolinario), I have the conflict as shown below. However, it doesn’t make sense to me. Could you tell me where the conflict is? Thanks.
The conflict is caused by:
    flytekit 1.2.4 depends on pandas<2.0.0 and >=1.0.0
    modin 0.17.0 depends on pandas==1.5.1
    flytekit 1.2.4 depends on pandas<2.0.0 and >=1.0.0
    modin 0.16.2 depends on pandas==1.5.1
    flytekit 1.2.4 depends on pandas<2.0.0 and >=1.0.0
    modin 0.16.1 depends on pandas==1.5.0
    flytekit 1.2.4 depends on pandas<2.0.0 and >=1.0.0
    modin 0.16.0 depends on pandas==1.5.0
1.0.0 < 1.5.1 < 2.0.0 I don’t see any conflicts.
So I don’t know how to fix.
My requirements.txt is like
flytekit==1.2.4
flytekitplugins-snowflake==1.2.4
flytekitplugins-spark==1.2.4
flytekitplugins-modin==1.2.4
xgboost
ray
modin
xgboost_ray
scikit-learn
e

Eduardo Apolinario (eapolinario)

11/23/2022, 12:12 AM
interesting. I just tried in a brand new venv and it worked. Can you paste the full stacktrace of the error you're seeing, @Frank Shen?
f

Frank Shen

11/23/2022, 12:21 AM
It worked one time for me, then it kept failing multiple times with various conflicting reasons.
e

Eduardo Apolinario (eapolinario)

11/23/2022, 12:21 AM
can you say more? What do you mean by "it kept failing multiple times with various conflicting reasons"?
f

Frank Shen

11/23/2022, 12:23 AM
@Eduardo Apolinario (eapolinario)
e

Eduardo Apolinario (eapolinario)

11/23/2022, 12:29 AM
@Frank Shen, I see this line in the logs:
Collecting pandas<2.0.0,>=1.0.0
  Using cached <https://maven.homebox.com/repository/max-pypi-releases/packages/pandas/1.3.5/pandas-1.3.5-cp37-cp37m-macosx_10_9_x86_64.whl> (11.0 MB)
can you add
pandas==1.5.1
in your requirements file?
f

Frank Shen

11/23/2022, 12:41 AM
new error:
ERROR: Ignored the following versions that require a different python version: 1.4.0 Requires-Python >=3.8; 1.4.0rc0 Requires-Python >=3.8; 1.4.1 Requires-Python >=3.8; 1.4.2 Requires-Python >=3.8; 1.4.3 Requires-Python >=3.8; 1.4.4 Requires-Python >=3.8; 1.5.0 Requires-Python >=3.8; 1.5.0rc0 Requires-Python >=3.8; 1.5.1 Requires-Python >=3.8
ERROR: Could not find a version that satisfies the requirement pandas==1.5.1 (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5)
ERROR: No matching distribution found for pandas==1.5.1
e

Eduardo Apolinario (eapolinario)

11/23/2022, 12:43 AM
ok, so the package index you're installing from (https://maven.homebox.com/repository/max-pypi-releases/) doesn't have the latest version of pandas
f

Frank Shen

11/23/2022, 12:50 AM
it does. It’s the list above that doesn’t include it.
1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5)
e

Eduardo Apolinario (eapolinario)

11/23/2022, 12:51 AM
oooh, pandas dropped support for python 3.7 in 1.5.0: https://github.com/pandas-dev/pandas/releases/tag/v1.5.0
is there any way you can use a python version >=3.8 ?
@Frank Shen ^
f

Frank Shen

11/23/2022, 5:40 AM
Got it. I am working on it. Thank you @Eduardo Apolinario (eapolinario)
Hi @Eduardo Apolinario (eapolinario), it worked! Thank you so much!
e

Eduardo Apolinario (eapolinario)

11/23/2022, 6:07 PM
amazing! Let us know how it goes.
f

Frank Shen

11/23/2022, 8:25 PM
Hi @Eduardo Apolinario (eapolinario), I am having trouble with passing bool input to the workflow at commandline.
@workflow
def test_wf_train(use_ray: bool = False)
...
when I do this at command line, it doesn’t work.
pyflyte run tests/test_xgboost.py test_wf_train --use_ray True
Do you know how to work with bool input at workflow level?
e

Eduardo Apolinario (eapolinario)

11/23/2022, 8:31 PM
oh, just drop the value
True
f

Frank Shen

11/23/2022, 8:36 PM
You mean
pyflyte run tests/test_xgboost.py test_wf_train --use_ray
?
Or I cannot use bool as input?
k

Kevin Su

11/23/2022, 8:46 PM
yeah. if you use the flag
use_ray
, which means the value of
use_ray
is True
pyflyte run tests/test_xgboost.py test_wf_train --use_ray
f

Frank Shen

11/23/2022, 8:48 PM
Thanks @Kevin Su
Hi @Kevin Su @Eduardo Apolinario (eapolinario), I installed flytekitplugins.modin as advised. I am using input param use_ray: bool to control when to use Ray & modin DataFrame vs pandas.DataFrame. When use_ray is True, the task will return modin DataFrame, if False, return pandas.DataFrame. Therefore I am defining a task’s return type as -> Union[pd.DataFrame, modin_pd.DataFrame].
import flytekitplugins.modin
@task
def preprocess(df: pd.DataFrame,
               use_ray: bool
               ) -> Union[pd.DataFrame, modin_pd.DataFrame]:
    if use_ray:
        ray.init()
        df = modin.pandas.DataFrame(df)
....
However, I still got error:
{"asctime": "2022-11-23 13:10:46,651", "name": "flytekit", "levelname": "ERROR", "message": "Failed to convert return value for var o0 with error <class 'TypeError'>: Ambiguous choice of variant for union type"}
Traceback (most recent call last):
  File "/Users/fshen/.pyenv/versions/3.8.7/lib/python3.8/site-packages/flytekit/core/base_task.py", line 522, in dispatch_execute
    literals[k] = TypeEngine.to_literal(exec_ctx, v, py_type, literal_type)
  File "/Users/fshen/.pyenv/versions/3.8.7/lib/python3.8/site-packages/flytekit/core/type_engine.py", line 752, in to_literal
    lv = transformer.to_literal(ctx, python_val, python_type, expected)
  File "/Users/fshen/.pyenv/versions/3.8.7/lib/python3.8/site-packages/flytekit/core/type_engine.py", line 1060, in to_literal
    raise TypeError("Ambiguous choice of variant for union type")
TypeError: Ambiguous choice of variant for union type
I am using flytekit==1.2.4 flytekitplugins-modin==1.2.4
Is this because returning a Union type is not supported?
e

Eduardo Apolinario (eapolinario)

11/23/2022, 9:35 PM
cc: @Kevin Su
k

Kevin Su

11/24/2022, 9:59 PM
Seems like modin dataframe can also be serialize to flyte literal by pandas transformer because modin dataframe inherits from pandas dataframe.
I’m working on it