When trying to register a workflow using the spark...
# flyte-support
h
When trying to register a workflow using the spark plugin, i get this error when trying to register it:
Copy code
Traceback (most recent call last):
  File "/usr/local/bin/pyflyte", line 5, in <module>
    from flytekit.clis.sdk_in_container.pyflyte import main
  File "/usr/local/lib/python3.11/site-packages/flytekit/__init__.py", line 305, in <module>
    load_implicit_plugins()
  File "/usr/local/lib/python3.11/site-packages/flytekit/__init__.py", line 301, in load_implicit_plugins
    p.load()
  File "/usr/local/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekitplugins/spark/__init__.py", line 20, in <module>
    from .agent import DatabricksAgent
  File "/usr/local/lib/python3.11/site-packages/flytekitplugins/spark/agent.py", line 12, in <module>
    from flytekit.extend.backend.base_agent import AgentBase, AgentRegistry, convert_to_flyte_state, get_agent_secret
ImportError: cannot import name 'convert_to_flyte_state' from 'flytekit.extend.backend.base_agent' (/usr/local/lib/python3.11/site-packages/flytekit/extend/backend/base_agent.py)
In looking at the underlying code, it looks like those imports are missing. Im using the following versions in my project:
Copy code
flytekit = "1.10.7"
flytekitplugins-async-fsspec = "1.10.7"
flytekitplugins-duckdb = "1.10.7"
flytekitplugins-deck-standard = "1.10.7"
flytekitplugins-polars = "1.10.7"
flytekitplugins-spark  = "1.10.7"
flytekitplugins-pod = "1.10.7"
Is there something im missing?
g
could you try flytekit==1.11.0 and flytekitplugins-spark==1.11.0?
h
Thats actually what i had, prior to downgrading to 1.10.7. Same issue
g
Copy code
File "/usr/local/lib/python3.11/site-packages/flytekitplugins/spark/agent.py", line 12, in <module>
    from flytekit.extend.backend.base_agent import AgentBase, AgentRegistry, convert_to_flyte_state, get_agent_secret
hmm, but we already remove
convert_to_flyte_state
from spark agent since flytekit==1.11.0 https://github.com/flyteorg/flytekit/pull/2123
h
I can try 1.11.0 again. Maybe my poetry lock file wasnt updated, let me take a look
f
@glamorous-carpet-83516 if we change like we have to change the min version pin right
🫡 1
h
@glamorous-carpet-83516 going back to 1.11.0 fixed the issue. But i havent been able to tell what is causing this error:
Copy code
[1/1] currentAttempt done. Last Error: USER::The node was low on resource: ephemeral-storage. Threshold quantity: 2146223340, available: 1752248Ki. 
[flytesnacks-dev] terminated with ExitCode 0.
[primary] terminated with exit code (1). Reason [Error]. Message:
I mean, i get what it staying, but im not sure what ephemeral-storage means in the context of the plugin or flyte. any thoughts?
is this related to the size of the image?
g
this is a known issue in flyte 1.11.0. we accidentally set the default ephemeralStorage to 20 MB https://github.com/flyteorg/flyte/pull/4929/files#diff-33b4463f6057591a533425d1f947752711a81da1952ff745ed9fae049e155995L181
you can fix that by updating the default.
h
Gotcha, okay - so do i need to change this in my task decorator, or is this at the kub cluster config levle?
g
wait, sorry. I was wrong. 1752248Ki is more than 20MB.
could you try to increase storage in the task decorator
might relate to the size of the image
h
gotcha, so would that be in the spark_conf property or in the resources property?
Copy code
spark_conf={
            "spark.driver.memory": "5000M",
            "spark.executor.memory": "5000M",
            "spark.executor.cores": "2",
            "spark.executor.instances": "2",
            "spark.driver.cores": "2",
            "ephemeralStorage":"1000Mi"
            #"spark.jars": "<https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar>",
        }
g
resources property
h
Copy code
limits=Resources(mem="2000M"),
gotcha
okay, let me give that a try!!! thanks so much for your help!!!
Well i got past that issue, but now ive been stuck on this odd error
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[flytesnacks-dev] terminated with ExitCode 0.
[primary] terminated with exit code (1). Reason [Error]. Message: 
.
any thoughts on this?
g
are you able to share the code snippet