Tarik
06/27/2022, 9:18 PMMatheus Moreno
06/29/2022, 4:39 PMFile "/home/flyte/.venv/lib/python3.9/site-packages/flytekit/core/data_persistence.py", line 447, in put_data
raise FlyteAssertion(
flytekit.exceptions.user.FlyteAssertion: Failed to put data from /tmp/flyte-9ibktn0s/sandbox/local_flytekit/engine_dir to <s3://my-s3-bucket/metadata/propeller/search-filter-improvement-by-semantic-search-development-alqv5l4nv4kms759s7k2/n0/data/0> (recursive=True).
Original exception: Called process exited with error code: 1. Stderr dump:
b'upload failed: ../../tmp/flyte-9ibktn0s/sandbox/local_flytekit/engine_dir/error.pb to <s3://my-s3-bucket/metadata/propeller/search-filter-improvement-by-semantic-search-development-alqv5l4nv4kms759s7k2/n0/data/0/error.pb> An error occurred (AccessDenied) when calling the PutObject operation: Access Denied\n'
Why is this happening?
Sandbox image I'm using: cr.flyte.org/flyteorg/flyte-sandbox:dind-106a8147446a6f0221162f47a9260ea0a764426e
Signature of the task:
def check_new_amnt(sql_path: str, config_path: str) -> pd.DataFrame:
Edgar Trujillo
06/30/2022, 1:04 AMflytectl
?
I have 3 workflow executions that have been stuck in ABORTING
for the past ~2 hours.Tim Bauer
07/14/2022, 2:39 PMtar_strip_file_attributes
functions. What is the purpose of that? When unpacking fast-registered archives it messes up the permissions:
d--------- 3 root root 4096 Jul 14 14:04 unpacked
-rw-r--r-- 1 root root 335 Jul 14 13:59 scriptmode.tar.gz
It doesn't matter if you run as root but we deliberately don't in our containers so everything fast-registered with new flytekit currently breaks.Robert Everson
07/14/2022, 10:05 PMMatheus Moreno
07/20/2022, 8:24 PMFlyteRemote
and not flytectl
, that is. My use case is that I want to develop a Python script that executes the entire deployment process, but I do not want to call subprocesses, since that would require the user to install flytectl
(and some members of our team use Windows!).
I know that FlyteRemote
can register individual tasks and workflows, but is there a way to register an entire package with it?Ketan (kumare3)
NamedTuple
vs dataclass
support.
• @workflow
and @dynamic
seem to have some special Promise handling for NamedTuple
types specifically
◦ Within a @workflow
I can construct NamedTuple
instances, including passing promise objects into their attributes
◦ Within a @workflow
I can access attributes on a NamedTuple
instance and get a promise object back which resolves to a field in that record
◦ The return types of these seem equivalent to the original NamedTuple
definition, but not the same as, ie. isinstance(output, MyNamedTuple) == False
on a task output from a task that returns a MyNamedTuple
, even though the object in all other respects pretends to be a MyNamedTuple
instance.
• None of this behavior is supported for dataclass
types
◦ Including the wonky isinstance
stuff
Is this special-cased logic for NamedTuple
?
• We’d really love to use exactly one of NamedTuple
or dataclass
for all of our structures in our Flyte workflows
• The above behaviors are very desirable (in fact I think I asked for exactly this ~6mo ago! very excited to see it)
• But the mis-matched support of features for NamedTuple
and dataclass
is making it hard for us to put a stake in the ground and just pick one.
• Can we either implement the above behaviors (minus the isinstance bit) for dataclass
or allow NamedTuple
s as task inputs?
----------
Let me explain what NamedTuple are meant to be.
So when you have multi-valued return from a python function, i.e.,
def foo() -> (int, int, int):
return 10, 10, 10
In this case the int, int, int
is actually a tuple in the python world right, as you can receive it as
x = foo()
and x
would be a tuple.
But Python does special handing for Tuples as the following is valid
m, n, o = foo()
Flyte actually supports multiple values as outputs. These are not python but arbitrary lanugages.
By default, flytekit names all the return types as o1, o2, …, on
This is because every returned value in Flyte is actually bound by its name o1
etc in the downstream consumer
Flyte also supports naming the outputs - as x, y, z
instead of the o1…
But, in python there is no easy way to name the outputs. So we decided to use NamedTuples.
Rationale, NamedTuples behave the same as Tuples, but allow naming.
NamedTuples are not handled specially in flytekit, except for the naming.
On the otherhand, dataclasses are very python native thing. We wanted users to be able to return arbitrarily complex objects as a single value, so in the above example
def foo() -> (int, dataclass, ...)"
...
Dataclasses are sent through flyte as opaque Json Objects. And the engine tries to not load them to keep its security high.
thus dataclasses are more like int
tuples and namedtuples
cannot be returned today as an invidual object
def foo() -> (int, tuple)
....
Is not validMarcin Zieminski
07/28/2022, 11:14 AMpyflyte lp -p {{ your project }} -d {{ your domain }} activate-all
The thing is, lp
subcommand does not exist at all, so the command above fails. Ideally would like to do it programmatically from Python having a handle to a launch plan object. Is this possible?Zachary Carrico
07/28/2022, 6:10 PMremote.execute(
workflow,
inputs=inputs,
options=Options(labels=common_models.Labels({"flyte.user": os.getenv("USER", "na")}))
where remote
is of type FlyteRemote
.
I can see that the workflow’s pod has the expected label, but none of the subworkflows do. Is there a way to have all pod labels be passed to all child pods of a workflow pod?Dylan Wilder
07/28/2022, 8:49 PMRahul Mehta
07/29/2022, 1:00 AM@flytekit.dynamic
& with_overrides
? Curious if we could pass a resource configuration object to a workflow at runtime and have the @dynamic
step specify overrides at runtimeDylan Wilder
07/29/2022, 1:35 AMKetan (kumare3)
Dylan Wilder
08/01/2022, 3:40 PMlv and lv.scalar and lv.scalar.schema
. looking at the inputs it seems that old versions of the task don't have the format
field in the flyte literal, but now they do. also looks like this code returns empty dataframe rather than erroring. any context on these changes that might help point to the behavior we're seeing?Dylan Wilder
08/01/2022, 3:45 PMDylan Wilder
08/01/2022, 8:20 PMDylan Wilder
08/03/2022, 4:03 PMAnnotated[StructuredDataset, subset_cols]
varsha Parthasarathy
08/03/2022, 11:48 PMAbdullah Mobeen
08/09/2022, 1:28 PM@task(task_config=MyContainerExecutionTask(
plugin_specific_config_a=...,
plugin_specific_config_b=...,
...
))
def foo(...) -> ...:
...
And
query_task = SnowflakeTask(
query="Select * from x where x.time < {{.inputs.time}}",
inputs=kwtypes(time=datetime),
output_schema_type=pandas.DataFrame,
)
@workflow
def my_wf(t: datetime) -> ...:
df = query_task(time=t)
return process(df=df)
I'm trying to find examples/tutorials/how-tos on the first approach, but I can't find any. Can someone point me to an example?Brandon Segal
08/16/2022, 2:48 PMclass GenerateOutputs(NamedTuple):
uri: str
@reference_task(
...
)
def generate(
input:str
) -> GenerateOutputs:
...
@workflow
def my_wf(
input="default_input"
) -> str:
output = generate(
input:str=input
)
return output.uri
However anytime I try to mock this out with something like
class TestNamedTuple(TestCase):
def test_run_workflow(self):
with task_mock(generate) as fake_generate:
fake_generate.return_value = GenerateOutputs(uri="test_uri")
uri = my_wf()
self.assertIsNotNone(uri)
I get an error like : AttributeError: 'Promise' object has no attribute 'uri'
How do I properly mock a task that has a namedTuple output with only one attribute?Dylan Wilder
08/19/2022, 2:21 PMopen().write()
)
• We have an issue setting the bigquery project for perms for bq URIs, there's a couple approaches to fixing this we can consider
• There doesn't appear to be any type checking or casting of types based on the schema. this could be pretty valuableRahul Mehta
08/19/2022, 3:20 PMdocker
to allow v6? flytekit
is currently blocking us upgrading beyond v5.xEvan Sadler
08/22/2022, 6:16 PMAlex Pozimenko
08/22/2022, 11:06 PMflytectl get execution execid --details --nodeID n0 -o yaml
execution = flyte_remote.fetch_execution(name = execution_id)
print (execution.node_executions)
after executing above execution.node_executions
is not populated. I also tried exec = flyte_remote.sync(execution)
but it returns the sameNicholas LoFaso
08/23/2022, 2:50 PM@dataclass_json
class Foo
. It said i needed types.FooSchema
not Foo
.
I assume there is some way to transform Foo
into FooSchema
using either dataclass_json api or flyte api, but I’m not sure how to do it. Could you direct me?
Exception and sample code in thread
Thanks!Evan Sadler
08/23/2022, 2:55 PMAnna Cunningham
08/23/2022, 11:25 PMFlyteRemote.sync
but the response is too large:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (4762259 vs. 4194304)"
debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.3.75:81 {grpc_message:"Received message larger than max (4762259 vs. 4194304)", grpc_status:8, created_time:"2022-08-23T23:23:18.247266112+00:00"}"
Is there a way for me to adjust the max amount or somehow still get my synced execution data?Dylan Wilder
08/29/2022, 6:10 PMRahul Mehta
08/29/2022, 7:31 PMTypeTransformer
w/ the TypeEngine
, does the module need to be explicitly imported by one of the files containing workflows or tasks in order for it to be picked up? Based on the example of the NumpyArrayTransformer
it seemed like it, but not sure where that needs to be declaredVarun Kulkarni
08/29/2022, 7:54 PMFlyteSchema
object that also includes metadata on whether a particular column is nullable vs required, along with its type?