thread Hello I have an rough Armada plugin that is able to Flyte #flyte-support

:thread: Hello. I have an rough Armada plugin that...

clean-glass-36808

05/17/2024, 10:54 PM

🧵 Hello. I have an rough Armada plugin that is able to successfully run basic tasks in Armada (ie. hello world). I decided to take it a step further and use a couple of these

ArmadaTasks

that I have defined in a workflow and I seem to have run into an issue related to (de)serialization. Wanted to know if there is something obvious I'm missing before I start digging into the code.

clean-glass-36808

05/17/2024, 10:55 PM

My code

Copy code

from flytekit import dynamic, task, workflow
from flytekit.core.array_node_map_task import map_task
from flytekitplugins.armada.task import ArmadaConfig


@task(task_config=ArmadaConfig(queue="compute"), container_image="<redacted>")
def say_hello(name: str) -> str:
    print(f"Hello, {name}!")
    return f"Hello, {name}!"


@task(task_config=ArmadaConfig(queue="compute"), container_image="<redacted>")
def simple_map_task(a: int) -> int:  # noqa: D103
    return a * a


@task(task_config=ArmadaConfig(queue="compute"), container_image="<redacted>")
def simple_reduce(b: list[int]) -> int:  # noqa: D103
    return sum(b)


@task(task_config=ArmadaConfig(queue="compute"), container_image="<redaced>")
def to_list(size: int) -> list[int]:  # noqa: D103
    return list(range(size))


@workflow
def map_reduce(size: int) -> int:
    """Simple workflow to illustrate a large fan out and fan in."""
    input_array = to_list(size=size)
    output = map_task(simple_map_task)(a=input_array)
    return simple_reduce(b=output)

clean-glass-36808

05/17/2024, 10:56 PM

Run with

pyflyte run --remote example.py map_reduce --size 10

and it seems to break on the ArrayNodeTask with the following errors.

Copy code

[1]: failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[2]: failed at Node[n2]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[3]: failed at Node[n3]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[4]: failed at Node[n4]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[5]: failed at Node[n5]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[6]: failed at Node[n6]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[7]: failed at Node[n7]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[8]: failed at Node[n8]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase
[9]: failed at Node[n9]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [agent-service]: [CorruptedPluginState] Failed to unmarshal custom state in Handle, caused by: gob: wrong type (webapi.Phase) for received field PluginState.Phase

clean-glass-36808

05/17/2024, 10:56 PM

clean-glass-36808

05/17/2024, 11:03 PM

I just tried to create a workflow with only

ArmadaTasks

and that seems to work fine, so there seems to be some sort of issue plumbing data between

ArmadaTasks

and non-

ArmadaTasks

for some reason.

freezing-airport-6809

05/17/2024, 11:17 PM

Armada plugin will not work with array node today

freezing-airport-6809

05/17/2024, 11:18 PM

we are working on generalized support for all task types. This is because the plugin state can be large and supporting that in array node would cause a problem.

freezing-airport-6809

05/17/2024, 11:18 PM

Any reason why you are looking into armada? Is Flyte not scaling for something?

freezing-airport-6809

05/17/2024, 11:18 PM

also is this a backendplugin in golang?

freezing-airport-6809

05/17/2024, 11:19 PM

ohh nm, this is a Agent plugin. how were you able to use array node for this? cc @flat-area-42876 this should have just failed right?

clean-glass-36808

05/17/2024, 11:21 PM

This is a Python plugin since we are prototyping. I believe our team (Stack) has spoken to you. We are running on prem and we don’t really have the ability to autoscale due to fixed capacity so we are looking to use Armada for queueing workloads instead of scheduling directly into k8s and having tons of pending workloads. We also rely on gang scheduling and I’m not sure that is supported.

freezing-airport-6809

05/17/2024, 11:27 PM

ya, makes sense

freezing-airport-6809

05/17/2024, 11:27 PM

i would love to use a gang scheduler like batch/kqueue but armada sounds cool

freezing-airport-6809

05/17/2024, 11:28 PM

and agent would be loved by the community

freezing-airport-6809

05/17/2024, 11:28 PM

but this is interesting, array node should not work with agents today

freezing-airport-6809

05/17/2024, 11:28 PM

can you simply use

@dynamic

for now

freezing-airport-6809

05/17/2024, 11:28 PM

Array node support for agents as well as any task type is coming soon

clean-glass-36808

05/17/2024, 11:30 PM

Yeah I can give that a shot. By coming soon is that work that is publicly in progress or something you'll roll into open source later.

freezing-airport-6809

05/17/2024, 11:31 PM

we will upstream it from union, but it is shipping to union soon

freezing-airport-6809

05/17/2024, 11:31 PM

we will upstream soon though

👍 1

clean-glass-36808

05/17/2024, 11:32 PM

I see, the reason why this code didn't short circuit is probably because the

ArmadaTask

extends

PythonFunctionTask

Copy code

class ArmadaTask(AsyncAgentExecutorMixin, PythonFunctionTask[ArmadaConfig]):
    """Wrapper around PythonFunctionTask which includes details on how to run on Armada."""

👍 1

freezing-airport-6809

05/17/2024, 11:35 PM

hmm this is interesting, which should be ok, but still should, what is the task-type?

clean-glass-36808

05/17/2024, 11:36 PM

Not sure what you are referring to

freezing-airport-6809

05/17/2024, 11:38 PM

can you share your

init

for the agent executor?

clean-glass-36808

05/17/2024, 11:39 PM

Copy code

def __init__(self) -> None:
        """Constructor for ArmadaAgent."""
        super().__init__(task_type_name="armada", metadata_type=ArmadaMetadata)

clean-glass-36808

05/17/2024, 11:44 PM

dynamic (that works normally with regular

@task

fails with

Copy code

[1/1] currentAttempt done. Last Error: USER::
[f91a21ca9a5bc4234935-n0-0] terminated with exit code (1). Reason [Error]. Message: 
y", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py", line 506, in execute_task_cmd
    _execute_task(
  File "/usr/local/lib/python3.11/site-packages/flytekit/exceptions/scopes.py", line 148, in f
    return outer_f(inner_f, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekit/exceptions/scopes.py", line 178, in system_entry_point
    return wrapped(*args, **kwargs)

clean-glass-36808

05/17/2024, 11:45 PM

Copy code

@dynamic
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)


@workflow
def map_reduce_dynamic(size: int) -> int:
    """Simple workflow to illustrate a large fan-out/fan-in with tasks."""
    return dynamic_task(size=size)

pyflyte run --remote example.py map_reduce_dynamic --size 10

clean-glass-36808

05/17/2024, 11:46 PM

We can probably wait for the generic task handling effort to land

flat-area-42876

05/18/2024, 12:00 AM

@freezing-airport-6809 right now there's just a check if the passed in subnode is a TaskNode. We should probably add a check for the task type as well to have this fail more gracefully.

flat-area-42876

05/18/2024, 12:02 AM

*on the propeller end. Flytekit has a check of instance type:

Copy code

if not (isinstance(actual_task, PythonFunctionTask) or isinstance(actual_task, PythonInstanceTask)):
            raise ValueError("Only PythonFunctionTask and PythonInstanceTask are supported in map tasks.")

freezing-airport-6809

05/18/2024, 5:30 AM

interesting

clean-glass-36808

05/23/2024, 9:50 PM

I am back again trying to use

@dynamic

and it runs the code as a

Python

task and not the

Armada

task. I'm guessing this is expected until some changes land? I may have updated flytekit since the last time I tried to do this.

Copy code

pyflyte run --remote example.py dynamic_task --size 10

Copy code

@task(task_config=ArmadaConfig(queue="compute"), container_image="<>")
def simple_map_task(a: int) -> int:  # noqa: D103
    return a * a


@task(task_config=ArmadaConfig(queue="compute"), container_image="<>")
def simple_reduce(b: list[int]) -> int:  # noqa: D103
    return sum(b)


@dynamic(container_image="<>")
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)

freezing-airport-6809

05/23/2024, 11:12 PM

Dynamic should use agent as dynamic is a simple literally a wf

freezing-airport-6809

05/23/2024, 11:13 PM

Does a regular task work with armada

clean-glass-36808

05/23/2024, 11:13 PM

Yeah, if I execute a single task (or even a chain in a workflow) it runs on Armada. But dynamic throws it all into

PythonTask

and runs it on the local data plane

freezing-airport-6809

05/24/2024, 1:59 AM

that is odd,

freezing-airport-6809

05/24/2024, 2:00 AM

i will share an example that runs, you can try that too?

clean-glass-36808

05/24/2024, 5:48 AM

Sure!

freezing-airport-6809

05/24/2024, 5:55 AM

ok let me try

freezing-airport-6809

05/24/2024, 2:07 PM

@damp-lion-88352 can you please help verify if dynamic work with agents (I fell it does) but I don’t get a chance to get this

damp-lion-88352

05/24/2024, 2:50 PM

No problem

damp-lion-88352

05/24/2024, 2:50 PM

Doing it now

damp-lion-88352

05/24/2024, 2:57 PM

Yes we can

damp-lion-88352

05/24/2024, 2:57 PM

Copy code

from flytekit import dynamic, task, workflow
from flytekit.sensor.file_sensor import FileSensor

from flytekit.sensor.file_sensor import FileSensor
sensor = FileSensor(name="test_file_sensor")

@dynamic
def dyanmic_sensor():
    for _ in range(2):
        sensor(path="<s3://my-s3-bucket>") 

@workflow
def wf():
    return dyanmic_sensor()

damp-lion-88352

05/24/2024, 2:58 PM

damp-lion-88352

05/24/2024, 2:58 PM

clean-glass-36808

05/24/2024, 4:32 PM

I am able to reproduce what you've done.

clean-glass-36808

05/24/2024, 4:33 PM

But when I run it with my task it doesn't recognize the task type.

Copy code

@task(task_config=ArmadaConfig(queue="compute"), container_image="<>")
def say_hello(name: str) -> str:
    print(f"Hello, {name}!")
    return f"Hello, {name}!"

@dynamic(container_image="<>")
def dyanmic_sensor():
    for _ in range(2):
        say_hello(name="test")


@workflow()
def wf():
    return dyanmic_sensor()

clean-glass-36808

05/24/2024, 4:35 PM

BaseSensor

is a

PythonTask

but my

ArmadaTask

is a

PythonFunctionTask

maybe thats related..

damp-lion-88352

05/24/2024, 4:36 PM

I can help you take a look tomrrow morning

damp-lion-88352

05/24/2024, 4:36 PM

it's midnight now

damp-lion-88352

05/24/2024, 4:36 PM

~~there's a way to check if it is normal or not~~

damp-lion-88352

05/24/2024, 4:41 PM

If run

pythonFunctionTask

will show

pythonFunctionTask

on flyte console, then we can't run dynamic

pythonFunctionTask

on agent.

damp-lion-88352

05/24/2024, 4:42 PM

But actually, PythonFunctionTask will inherit PythonTask I guess?

clean-glass-36808

05/24/2024, 4:43 PM

Yeah it does..

clean-glass-36808

05/24/2024, 4:44 PM

When I run the my task directly it comes up correctly as

Armada

clean-glass-36808

05/24/2024, 4:46 PM

When I run it inside a workflow it comes up correctly too

damp-lion-88352

05/24/2024, 5:15 PM

damp-lion-88352

05/24/2024, 5:15 PM

Maybe there’s some improvement we can do

damp-lion-88352

05/24/2024, 5:15 PM

Thank you

clean-glass-36808

05/24/2024, 5:16 PM

The code is not in a state that I can share it yet but hopefully I can soon so ya'll can reproduce the issue

freezing-airport-6809

05/24/2024, 5:17 PM

cc @glamorous-carpet-83516 if you know by chance?

damp-lion-88352

05/25/2024, 4:46 AM

I've tried in kubeflow pytorch task

damp-lion-88352

05/25/2024, 4:46 AM

it works

damp-lion-88352

05/25/2024, 4:46 AM

it's weird agent doesn't work

damp-lion-88352

05/25/2024, 4:46 AM

Copy code

from flytekit import Resources, task, workflow, dynamic


from flytekitplugins.kfpytorch import PyTorch, Worker

# %%
cpu_request = "500m"
mem_request = "500Mi"
gpu_request = "0"
mem_limit = "500Mi"
gpu_limit = "0"


# %%
@task(
    task_config=PyTorch(worker=Worker(replicas=2)),
    retries=2,
    # cache=True,
    # cache_version="0.1",
    requests=Resources(cpu=cpu_request, mem=mem_request, gpu=gpu_request),
    limits=Resources(mem=mem_limit, gpu=gpu_limit),
)
def mnist_pytorch_job() -> str:
    return "Hi"
    
@dynamic
def d_wf():
    for _ in range(2):
        mnist_pytorch_job()

@workflow
def wf():
    d_wf()

damp-lion-88352

05/25/2024, 4:46 AM

damp-lion-88352

05/25/2024, 4:47 AM

kubeflow pytorch is also

PythonFunctionTask

🙇 2

damp-lion-88352

05/25/2024, 4:50 AM

I got some setup problem for the only one agent use

PythonFunctionTask

, Databricks Agent.

damp-lion-88352

05/25/2024, 4:50 AM

Will try it after fix it

freezing-airport-6809

05/25/2024, 4:56 AM

@damp-lion-88352 you rock!

🙏 1

clean-glass-36808

05/30/2024, 6:32 PM

Circling back to this. So it seems like this might be an issue with the agent vs the plugin. It seems that if we are willing to we should probably just develop a go based plugin instead.

glamorous-carpet-83516

05/30/2024, 6:37 PM

Hey @clean-glass-36808 what’s the current issue you run into with the agent? you can’t use it in the map task?

clean-glass-36808

05/30/2024, 6:39 PM

I can't use agent based tasks with dynamic workflows and map tasks. Seems to end up running at a generic

PythonTask

on the flyte data plane which is not what we want. ie.

Copy code

@workflow
def map_reduce(size: int) -> int:
    """Simple workflow to illustrate a large fan out and fan in."""
    input_array = to_list(size=size)
    output = map_task(simple_map_task)(a=input_array)
    return simple_reduce(b=output)


@workflow
def map_reduce_dynamic(size: int) -> int:
    """Simple workflow to illustrate a large fan-out/fan-in with tasks."""
    return dynamic_task(size=size)

clean-glass-36808

05/30/2024, 6:41 PM

I think for dynamic it will work as a PythonTask but I think for map_task there was some serialization related issue.

glamorous-carpet-83516

05/30/2024, 6:44 PM

Regarding the dynamic, how does the task look like

Copy code

@dynamic
def d1():
  agent_task(...)

glamorous-carpet-83516

05/30/2024, 6:47 PM

we are going to test it. Using go plugin may have the same issue.

clean-glass-36808

05/30/2024, 6:56 PM

It seems that @damp-lion-88352 has shown that it works with a go plugin since he tested wit PyTorch? I only see a task defined for PyTorch and it looks like there is some operator implementation in the main

flyte

repo in go.

clean-glass-36808

05/30/2024, 6:57 PM

Copy code

@task(task_config=ArmadaConfig(queue="compute"), container_image="<>")
def simple_map_task(a: int) -> int:  # noqa: D103
    return a * a


@task(task_config=ArmadaConfig(queue="compute"), container_image="<>")
def simple_reduce(b: list[int]) -> int:  # noqa: D103
    return sum(b)


@dynamic(container_image="<>")
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)

damp-lion-88352

05/31/2024, 2:57 AM

Kevin and I find the root cause

damp-lion-88352

05/31/2024, 2:58 AM

We currently do not support using PythonFunctionTasks (except for pod tasks) in map tasks.

damp-lion-88352

05/31/2024, 2:59 AM

So if a task is created by K8S plugin or agent, it will probably fail

damp-lion-88352

05/31/2024, 3:00 AM

Will discuss with Kevin how to fix it later, thank you for catching the bug

clean-glass-36808

05/31/2024, 3:11 AM

Thanks for looking into it

🙏 1

freezing-airport-6809

05/31/2024, 3:29 AM

I am confused is the problem with map task or dynamic. If it’s map we already discussed that map is not supported today for anything but simple tasks and pods. More exotic support is in testing right now

damp-lion-88352

05/31/2024, 3:31 AM

dynamic works for agent task and k8s plugin task

damp-lion-88352

05/31/2024, 3:32 AM

map taks will fail for agent task and k8s plugin task

clean-glass-36808

05/31/2024, 3:43 AM

I still haven’t got dynamic to work for an agent plugin task. You tested against PyTorch which wasn’t an agent plugin?

damp-lion-88352

05/31/2024, 3:43 AM

Yes it's not

damp-lion-88352

05/31/2024, 3:43 AM

dynamic should work

freezing-airport-6809

05/31/2024, 3:45 AM

That is odd

glamorous-carpet-83516

05/31/2024, 3:45 AM

@freezing-airport-6809 do we support list of promise?

Copy code

@dynamic(container_image="<>")
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)  <- values here is [promise, promise, promise]

glamorous-carpet-83516

05/31/2024, 3:45 AM

I remember we can’t do that

damp-lion-88352

05/31/2024, 3:45 AM

Copy code

from flytekit import task, dynamic

@task()
def simple_map_task(a: int) -> int:  # noqa: D103
    return a * a


@task()
def simple_reduce(b: list[int]) -> int:  # noqa: D103
    return sum(b)


@dynamic
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)

freezing-airport-6809

05/31/2024, 3:45 AM

Yes we do

damp-lion-88352

05/31/2024, 3:46 AM

damp-lion-88352

05/31/2024, 3:46 AM

We have error in local execution

damp-lion-88352

05/31/2024, 3:46 AM

I just removed the config

freezing-airport-6809

05/31/2024, 3:46 AM

What ok I will tal in few minutes with kids now

freezing-airport-6809

05/31/2024, 3:47 AM

Something is wrong in your code

damp-lion-88352

05/31/2024, 3:48 AM

Copy code

from flytekit import task, dynamic, workflow

@task()
def simple_map_task(a: int) -> int:  # noqa: D103
    return a * a


@task()
def simple_reduce(b: list[int]) -> int:  # noqa: D103
    return sum(b)


@dynamic
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)

@workflow
def wf(size: int) -> int:
    return dynamic_task(size=size)

if __name__ == "__main__":
    print(f"Running {__file__} main...")
    print(f"Output: {wf(size=5)}")

damp-lion-88352

05/31/2024, 3:48 AM

This will work

damp-lion-88352

05/31/2024, 3:48 AM

@clean-glass-36808 Would you give it a try with your Armada config?

clean-glass-36808

05/31/2024, 3:49 AM

Sure

freezing-airport-6809

05/31/2024, 3:51 AM

I have a guess

freezing-airport-6809

05/31/2024, 3:51 AM

The armada config task type is not available at runtime

freezing-airport-6809

05/31/2024, 3:51 AM

Is this defined locally?

clean-glass-36808

05/31/2024, 3:52 AM

The issue is that it runs on Flyte, but it runs the code as a Python task and not as a custom task. There is no issue with pyflyte-execute actually running

glamorous-carpet-83516

05/31/2024, 3:54 AM

so the code is running in the new container as regular python task, not sending to the agent?

clean-glass-36808

05/31/2024, 3:54 AM

Correct

freezing-airport-6809

05/31/2024, 3:55 AM

Ya my guess is because task type is not set correctly

freezing-airport-6809

05/31/2024, 3:55 AM

If you look at the Ui what do you see as the task type

freezing-airport-6809

05/31/2024, 3:55 AM

Happy to hop On a quick call in a few minutes

clean-glass-36808

05/31/2024, 3:56 AM

I'm running @damp-lion-88352’s code (which doesn't seem much different than my original code?). Can show you in a sec

🙏 1

clean-glass-36808

05/31/2024, 3:58 AM

My code.

Copy code

@task(task_config=ArmadaConfig(queue="compute"), container_image="redacted")
def simple_map_task(a: int) -> int:  # noqa: D103
    return a * a


@task(task_config=ArmadaConfig(queue="compute"), container_image="redacted")
def simple_reduce(b: list[int]) -> int:  # noqa: D103
    return sum(b)


@dynamic(container_image="redacted")
def dynamic_task(size: int) -> int:
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        value = simple_map_task(a=i)
        values.append(value)

    return simple_reduce(b=values)


@workflow
def wf(size: int) -> int:
    return dynamic_task(size=size)

Not recognized as armada tasks

glamorous-carpet-83516

05/31/2024, 3:59 AM

if you pyflyte run

simple_map_task

, does it show armada on UI?

clean-glass-36808

05/31/2024, 4:00 AM

Now if I just run the simple_map_task it detects correctly as Armada (and blows up because the Armada cluster is unstable)

glamorous-carpet-83516

05/31/2024, 4:01 AM

got it, thanks for sharing. something wrong during serialization. looking at it

🙏 1

freezing-airport-6809

05/31/2024, 4:29 AM

@glamorous-carpet-83516 / @clean-glass-36808 did you folks figure it out

freezing-airport-6809

05/31/2024, 4:30 AM

happy to hop on a call now

damp-lion-88352

05/31/2024, 4:32 AM

damp-lion-88352

05/31/2024, 4:33 AM

Databricks Agent, Python Function Task works ?

freezing-airport-6809

05/31/2024, 4:33 AM

i am pretty confused - https://meet.google.com/jcn-nhzi-qge

damp-lion-88352

05/31/2024, 4:33 AM

https://meet.google.com/hpp-wzit-mem?authuser=0

damp-lion-88352

05/31/2024, 4:34 AM

Kevin and I are in this meeting

clean-glass-36808

05/31/2024, 4:35 AM

I’m AFK, be back in 10

damp-lion-88352

05/31/2024, 4:39 AM

freezing-airport-6809

05/31/2024, 4:39 AM

@clean-glass-36808 so let me summarize what i learnt. The agent (example databricks agent) works with dynamic. I am pretty sure, there is a bug in your agent code

damp-lion-88352

05/31/2024, 4:39 AM

This databricks agent example works for me.

Copy code

from flytekit import task, dynamic, workflow
from flytekitplugins.spark import Databricks

# pyflyte run --remote --image localhost:30000/databricks-map:0531 databricks_example.py wf --size 3

@task(task_config=Databricks(
        spark_conf={
            "spark.driver.memory": "1000M",
            "spark.executor.memory": "1000M",
            "spark.executor.cores": "1",
            "spark.executor.instances": "2",
            "spark.driver.cores": "1",
        },
        databricks_conf={
            "run_name": "flytekit databricks plugin example",
            "new_cluster": {
                "spark_version": "11.0.x-scala2.12",
                "node_type_id": "r3.xlarge",
                "aws_attributes": {
                    "availability": "ON_DEMAND",
                    "instance_profile_arn": "arn:aws:iam::<AWS_ACCOUNT_ID_DATABRICKS>:instance-profile/databricks-flyte-integration",
                },
                "num_workers": 4,
            },
            "timeout_seconds": 3600,
            "max_retries": 1,
        },
    ),
)
def simple_map_task(a: int):
    return


@task(task_config=Databricks(
        spark_conf={
            "spark.driver.memory": "1000M",
            "spark.executor.memory": "1000M",
            "spark.executor.cores": "1",
            "spark.executor.instances": "2",
            "spark.driver.cores": "1",
        },
        databricks_conf={
            "run_name": "flytekit databricks plugin example",
            "new_cluster": {
                "spark_version": "11.0.x-scala2.12",
                "node_type_id": "r3.xlarge",
                "aws_attributes": {
                    "availability": "ON_DEMAND",
                    "instance_profile_arn": "arn:aws:iam::<AWS_ACCOUNT_ID_DATABRICKS>:instance-profile/databricks-flyte-integration",
                },
                "num_workers": 4,
            },
            "timeout_seconds": 3600,
            "max_retries": 1,
        },
    ),
)
def simple_reduce(b: list[int]):  # noqa: D103
    return


@dynamic
def dynamic_task(size: int):
    """DAG shape is not known at compile time so we need to use @dynamic."""
    values = []
    for i in range(size):
        simple_map_task(a=i)
        values.append(i)

    simple_reduce(b=values)

    return 

@workflow
def wf(size: int):
    dynamic_task(size=size)

if __name__ == "__main__":
    print(f"Running {__file__} main...")
    print(f"Output: {wf(size=5)}")

13 Views

Open in Slack

Previous Next