GitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:23 PM<http://alb.ingress.kubernetes.io/target-type|alb.ingress.kubernetes.io/target-type>: 'ip'
in the ingress annotation of value-eks.yaml
@brucearctor
@EngHabu
RE: #2566
Originally posted by @jw0515 in #2566 (comment)
flyteorg/flyteGitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:23 PMSendgrid sent email {\"errors\":[{\"message\":\"The from email does not contain a valid address
Expected behavior
It should be easy to pinpoint invalid notification emails from logs that report email failures
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:23 PMpip install flytekitplugins-mlflow
API Proposal 1: Decorator Plugin
Use the task decorator and/or workflow decorator pattern to create a more seamless experience. This would introduce a new plugin pattern in flytekit, which modifies the underlying function wrapped by @task
and @workflow
.
Example
import mlflow
import flytekitplugins.mlflow
from flytekit import task, dynamic
@dynamic
@flytekitplugins.mlflow.experiment(
# args to <https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment>
name=None # defaults to "{workflow_name}-{execution_id}" ?
artifact_location=None, # defaults to the flytekit location?
tags=...,
)
def model_experiment(hyperparameter_grid: List[dict]):
models = []
data = ...
for hyperparameters in hyperparameter_grid:
models.append(train_model(hyperparameters=hyperparameters, data=data))
...
@task
@flytekitplugins.mlflow.run(
# by default, this run will use the parent workflow's mlflow experiment config
params="hyperparameters", # log config parameters automatically
autolog=True, # enable autologging, could also be a dict of mlflow.autolog args: <https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.autolog>
)
def train_model(hyperparameters: dict, data: ...):
model = MySklearnModel(**hyperparameters)
... # fit
# without autolog=True, users can manually log here
mlflow.log_metric("key", value)
return model
API Proposal 2: extend @task
and @workflow
arguments
Task config plugins don't really make sense for MLFlow experiment tracking/logging, since the task_config
argument is typically used for task types that have specific backend resource requirements (e.g. Spark, Ray, MPI tasks) and is orthogonal to configuring experiments and logging metrics.
Therefore, to support similar functionality to proposal 1, we could introduce additional arguments to the @task
and @workflow
decorators, e.g.
import mlflow
from flytekitplugins.mlflow import RunConfig, ExperimentConfig
from flytekit import task, workflow
@dynamic(..., logging_config=ExperimentConfig(name=..., artifact_location=..., tags=...)
)
def model_experiment(hyperparameter_grid: List[dict]):
models = []
data = ...
for hyperparameters in hyperparameter_grid:
models.append(train_model(hyperparameters=hyperparameters, data=data))
...
@task(..., logging_config=RunConfig(experiment=..., params=..., autolog=...)
)
def train_model(hyperparameters: dict):
model = MySklearnModel(**hyperparameters)
... # fit
# without autolog=True, users can manually log here
mlflow.log_metric("key", value)
return model
flyteorg/flyteGitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:23 PM@task
def t1() -> Annotated[pd.DataFrame, kwtypes(a=int)]:
...
@task
def t2(df: Annotated[pd.DataFrame, kwtypes(a=float)]):
...
It's conceivable that sometimes you want this type conversion (from int to float) to happen and sometimes you do not.
What is the correct way of allowing the user to control this behavior?
Also where should this be enforced? Enforcing this at run-time is likely best to give as much flexibility to the plugin author as possible.
Implementation
?
Misc
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:23 PMpyflyte-fast-execute
in that case, the problem is that by default we add the argument --no-sign-request
when using AWS CLI for S3 and that makes it impossible to download the content from private S3 buckets.
Expected behavior
Not being anonymous by default, we shouldn't add the argument --no-sign-request
by default
Additional context to reproduce
1. Run `pyflyte-fast-execute`with the needed arguments
❯ pyflyte-fast-execute --additional-distribution s3://<private-bucket-name>/ea/<project-name>/<domain>/path_to.tar.gz --dest-dir /src
{"asctime": "2022-08-16 14:21:55,941", "name": "flytekit", "levelname": "ERROR", "message": "Error from command '['aws', '--no-sign-request', 's3', 'cp', 's3://<private-bucket-name>/ea/<project-name>/<domain>/path_to.tar.gz', '/src']':\nb'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\\n'\n"}
{"asctime": "2022-08-16 14:21:55,941", "name": "flytekit", "levelname": "ERROR", "message": "Exception when trying to execute ['aws', 's3', 'cp', '3://<private-bucket-name>/ea/<project-name>/<domain>/path_to.tar.gz', '/src'], reason: Called process exited with error code: 1. Stderr dump:\n\nb'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\\n'"}
2. It can also be reproduced by using the AWS CLI commands directly
When using --no-sign-request
❯ aws --no-sign-request s3 cp s3://<private-bucket-name>/ea/<project-name>/<domain>/path_to.tar.gz .
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
When not using --no-sign-request
❯ aws s3 cp s3://<private-bucket-name>/ea/<project-name>/<domain>/path_to.tar.gz .
download: s3://<private-bucket-name>/ea/<project-name>/<domain>/path_to.tar.gz to ./fast14de81dc9a6b0f459a1ab49e1d871e01.tar.gz
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PMdf.export(...)
as explained in the docs.
We would ideally also like to support cases where if the dataframe is large, we can export the chunks in parallel to multiple parts using df.export_many(...)
as demostrated in the vaex api docs https://vaex.readthedocs.io/en/docs/api.html#vaex.dataframe.DataFrameLocal.export_many
Goal: What should the final outcome look like, ideally?
if df chunksize is greater than some threshold (1M ?) than should serialise to blob using df.export_many(...)
,
otherwise default to current implementation of df.export
https://github.com/flyteorg/flytekit/blob/8ae879eb379acf2e0b4923f1b0c855d01a1f14e5/plugins/flytekit-vaex/flytekitplugins/vaex/sd_transformers.py#L29-L38
When decoding use df.open_many(...)
if dir has multiple files/parts (use glob pattern) or df.open(...)
(as currently implemented) if just single part.
https://github.com/flyteorg/flytekit/blob/8ae879eb379acf2e0b4923f1b0c855d01a1f14e5/plugins/flytekit-vaex/flytekitplugins/vaex/sd_transformers.py#L51-L57
Describe alternatives you've considered
N/A
Propose: Link/Inline OR Additional context
See discussion in this PR flyteorg/flytekit#1230 (comment)
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PMScreenshot 2022-10-26 at 10 26 00 AM▾
GitHub
11/02/2023, 9:23 PMfrom flytekit import workflow
from flytekitplugins.dbt.task import DBTRun, DBTRunInput
dbt_task = DBTRun(name="the_name_of_the_task")
@workflow
def my_wf() -> None:
input = DBTRunInput(
project_dir="dbt_project_dir",
profiles_dir="dbt_project_dir/docker-context",
profile="default",
select=["some_model"]
)
dbt_task(input=input)
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PMpyflyte run
. We should add support launchplans as well.
Goal: What should the final outcome look like, ideally?
We should be able to use the same experience currently offered in pyflyte run
to kickoff launchplans, i.e. users should be able to execute:
> pyflyte run --remote path/to/my/file.py my_launchplan
Go to <http://flyte.example.com/console/projects/flytesnacks/domains/development/executions/f6e7113930e3043e79cc> to see execution in the console.
...
Describe alternatives you've considered
launchplans are an indispensable entity type in the Flyte programming model, hence it should be supported in the CLIs.
Propose: Link/Inline OR Additional context
We should add launch plans to the list of entities handled by pyflyte run
in here.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PMmake download_tooling
or make generate
in order to compile proto buffs.
Using wsl2 get this error:
Makefile:17: warning: overriding recipe for target 'generate'
boilerplate/flyte/golang_test_targets/Makefile:13: warning: ignoring old recipe for target 'generate'
Makefile:30: warning: overriding recipe for target 'test_unit'
boilerplate/flyte/golang_test_targets/Makefile:38: warning: ignoring old recipe for target 'test_unit'
make: boilerplate/flyte/golang_test_targets/download_tooling.sh: Command not found
make: *** [boilerplate/flyte/golang_test_targets/Makefile:9: download_tooling] Error 127
Also tried running the make commands in windows shell wth wsl, in wsl shell directly, in git bash. - no results.
What if we do not do this?
Some windows developers might not be able to help extend or add proto buffs if needed
Related component(s)
flyteorg/flyteidl#331
#2911 <-- this might help my issue
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:23 PM<-
button, the UI of some components appears to be overlapping with each other.
Expected behavior
The UI should remain consistent. After user refreshes the page, the UI gets consistent again.
Additional context to reproduce
No response
Screenshots
Screenshot 2022-10-14 at 10 31 23▾
Screenshot 2022-10-14 at 10 31 36▾
GitHub
11/02/2023, 9:23 PMGitHub
11/02/2023, 9:24 PMfrom typing import Dict, List, NamedTuple
from flytekit import task, workflow
class OpenFlightsData(NamedTuple):
routes: List[Dict[str, str]]
airlines: Dict[str, str]
airports: Dict[str, Dict[str, str]]
@task()
def extract_reference_data() -> OpenFlightsData:
pass
Fails with
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[fb5562eecbc2f40f9b0d-n0-0] terminated with exit code (137). Reason [Error]. Message:
.
Expected behavior
No failure
Additional context to reproduce
flytekit 1.2.1
flyte admin 1.1.46
Screenshots
Screen Shot 2022-10-13 at 10 36 58 pm▾
GitHub
11/02/2023, 9:24 PMFile "/Users/.../lib/python3.7/site-packages/flytekit/core/launch_plan.py", line 143, in create
native_types=workflow.python_interface.inputs,
AttributeError: 'NoneType' object has no attribute 'inputs'
Code snippet:
from flytekit import LaunchPlan
flyte_workflow = remote.fetch_workflow(
name="my_workflow", version="v1", project="flytesnacks", domain="development"
)
launch_plan = LaunchPlan.get_or_create(name="my_launch_plan", workflow=flyte_workflow)
Goal: What should the final outcome look like, ideally?
Should be able to create a launch plan for FlyteWorkflow
Describe alternatives you've considered
NA
Propose: Link/Inline OR Additional context
https://flyte-org.slack.com/archives/CP2HDHKE1/p1665377001295529
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:24 PM--verbose
flag of flytectl sandbox start/teardown
and flytectl demo start/teardown
only captures logs of helm/k8s
commands, we would want to capture the logs of other components as well like docker
, k3s
, etc. The above mentioned PR already brings support to interfacing with docker
I would be happy to send a PR for the same! :D
Provide a possible output or UX example
Kindly refer to flyteorg/flytectl#359 for more details (screenshots attached)
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:24 PMGitHub
11/02/2023, 9:24 PMGitHub
11/02/2023, 9:24 PMimage▾
GitHub
11/02/2023, 9:24 PM[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[adbrq7s7x2kbwln2k5df-n0-0] terminated with exit code (1). Reason [Error]. Message:
a_worflows.py", line 211, in
def process_data_frames(lc_df: pd.DataFrame=None, data_frame: pd.DataFrame=None)->pd.DataFrame:
File "/usr/local/lib/python3.9/site-packages/flytekit/core/task.py", line 209, in task
return wrapper(_task_function)
File "/usr/local/lib/python3.9/site-packages/flytekit/core/task.py", line 193, in wrapper
task_instance = TaskPlugins.find_pythontask_plugin(type(task_config))(
File "/usr/local/lib/python3.9/site-packages/flytekit/core/tracker.py", line 30, in *call*
o = super(InstanceTrackingMeta, cls).*call*(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/flytekit/core/python_function_task.py", line 118, in *init*
super().*init*(
File "/usr/local/lib/python3.9/site-packages/flytekit/core/python_auto_container.py", line 72, in *init*
super().*init*(
File "/usr/local/lib/python3.9/site-packages/flytekit/core/base_task.py", line 389, in *init*
interface=transform_interface_to_typed_interface(interface),
File "/usr/local/lib/python3.9/site-packages/flytekit/core/interface.py", line 210, in transform_interface_to_typed_interface
inputs_map = transform_variable_map(interface.inputs, input_descriptions)
File "/usr/local/lib/python3.9/site-packages/flytekit/core/interface.py", line 318, in transform_variable_map
res[k] = transform_type(v, descriptions.get(k, k))
File "/usr/local/lib/python3.9/site-packages/flytekit/core/interface.py", line 332, in transform_type
return _interface_models.Variable(type=TypeEngine.to_literal_type(x), description=description)
File "/usr/local/lib/python3.9/site-packages/flytekit/core/type_engine.py", line 611, in to_literal_type
transformer = cls.get_transformer(python_type)
File "/usr/local/lib/python3.9/site-packages/flytekit/core/type_engine.py", line 584, in get_transformer
raise ValueError(f"Generic Type {python_type.*origin*} not supported currently in Flytekit.")
ValueError: Generic Type typing.Union not supported currently in Flytekit.
the data frames has columns with strings, consequently the dtypes
for these columns were object
.
Would it be possible, for some one to let me know what is the problem or if I'm missing something? Of if this because, I'm running it in a Windows machine?
Thanks and Regards
flyteorg/flyteGitHub
11/02/2023, 9:24 PMGitHub
11/02/2023, 9:24 PMGitHub
11/02/2023, 9:24 PM~/dir/test.vcf
and ~/dir/test.vcf.tbi
).
• A task that takes a file family as input should take 1 input value, not N. In the user's mental model of computation to be done, the file family of N files is one entity, not N. Allowing the inputs/outputs to map to this model will improve the UX.
• Helper utilities to enforce presence of files in the family at runtime (e.g., if you create one of these objects, and you're missing one the components, you can have an error raised)
• It should be easy to extend these classes with additional methods/transformers to allow easy read out of the file families into memory. Typically this is done with another library (e.g., vcf + .vcf.tbi is loaded with pyvcf, a python library backing C-based htslib, a high performance library for loading/manipulating vcf files and their indices).
Describe alternatives you've considered
• FlyteDirectory
• Extend FlyteFile (suggested by Greg Gydush in slack https://flyte-org.slack.com/archives/CP2HDHKE1/p1663719669816229?thread_ts=1663692816.505609&cid=CP2HDHKE1)
• Compose multiple FlyteFiles into a composite
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:24 PMFastSerializationSettings
as rendered in the flytekit repo.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
11/02/2023, 9:24 PM