When trying to register a workflow using the spark...
# ask-the-community
b
When trying to register a workflow using the spark plugin, i get this error when trying to register it:
Copy code
Traceback (most recent call last):
  File "/usr/local/bin/pyflyte", line 5, in <module>
    from flytekit.clis.sdk_in_container.pyflyte import main
  File "/usr/local/lib/python3.11/site-packages/flytekit/__init__.py", line 305, in <module>
    load_implicit_plugins()
  File "/usr/local/lib/python3.11/site-packages/flytekit/__init__.py", line 301, in load_implicit_plugins
    p.load()
  File "/usr/local/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekitplugins/spark/__init__.py", line 20, in <module>
    from .agent import DatabricksAgent
  File "/usr/local/lib/python3.11/site-packages/flytekitplugins/spark/agent.py", line 12, in <module>
    from flytekit.extend.backend.base_agent import AgentBase, AgentRegistry, convert_to_flyte_state, get_agent_secret
ImportError: cannot import name 'convert_to_flyte_state' from 'flytekit.extend.backend.base_agent' (/usr/local/lib/python3.11/site-packages/flytekit/extend/backend/base_agent.py)
In looking at the underlying code, it looks like those imports are missing. Im using the following versions in my project:
Copy code
flytekit = "1.10.7"
flytekitplugins-async-fsspec = "1.10.7"
flytekitplugins-duckdb = "1.10.7"
flytekitplugins-deck-standard = "1.10.7"
flytekitplugins-polars = "1.10.7"
flytekitplugins-spark  = "1.10.7"
flytekitplugins-pod = "1.10.7"
Is there something im missing?
k
could you try flytekit==1.11.0 and flytekitplugins-spark==1.11.0?
b
Thats actually what i had, prior to downgrading to 1.10.7. Same issue
k
Copy code
File "/usr/local/lib/python3.11/site-packages/flytekitplugins/spark/agent.py", line 12, in <module>
    from flytekit.extend.backend.base_agent import AgentBase, AgentRegistry, convert_to_flyte_state, get_agent_secret
hmm, but we already remove
convert_to_flyte_state
from spark agent since flytekit==1.11.0 https://github.com/flyteorg/flytekit/pull/2123
b
I can try 1.11.0 again. Maybe my poetry lock file wasnt updated, let me take a look
k
@Kevin Su if we change like we have to change the min version pin right
b
@Kevin Su going back to 1.11.0 fixed the issue. But i havent been able to tell what is causing this error:
Copy code
[1/1] currentAttempt done. Last Error: USER::The node was low on resource: ephemeral-storage. Threshold quantity: 2146223340, available: 1752248Ki. 
[flytesnacks-dev] terminated with ExitCode 0.
[primary] terminated with exit code (1). Reason [Error]. Message:
I mean, i get what it staying, but im not sure what ephemeral-storage means in the context of the plugin or flyte. any thoughts?
is this related to the size of the image?
k
this is a known issue in flyte 1.11.0. we accidentally set the default ephemeralStorage to 20 MB https://github.com/flyteorg/flyte/pull/4929/files#diff-33b4463f6057591a533425d1f947752711a81da1952ff745ed9fae049e155995L181
you can fix that by updating the default.
b
Gotcha, okay - so do i need to change this in my task decorator, or is this at the kub cluster config levle?
k
wait, sorry. I was wrong. 1752248Ki is more than 20MB.
could you try to increase storage in the task decorator
might relate to the size of the image
b
gotcha, so would that be in the spark_conf property or in the resources property?
Copy code
spark_conf={
            "spark.driver.memory": "5000M",
            "spark.executor.memory": "5000M",
            "spark.executor.cores": "2",
            "spark.executor.instances": "2",
            "spark.driver.cores": "2",
            "ephemeralStorage":"1000Mi"
            #"spark.jars": "<https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar>",
        }
k
resources property
b
Copy code
limits=Resources(mem="2000M"),
gotcha
okay, let me give that a try!!! thanks so much for your help!!!
Well i got past that issue, but now ive been stuck on this odd error
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[flytesnacks-dev] terminated with ExitCode 0.
[primary] terminated with exit code (1). Reason [Error]. Message: 
.
any thoughts on this?
k
are you able to share the code snippet