GitHub
05/11/2023, 1:58 PMmake html
seems to execute notebooks as well.
Making a change and rerunning make html
worked once for me, then it started inexplicably breaking with
Warning, treated as error:
/some/path/flytesnacks/cookbook/docs/getting_started/optimizing_tasks.md:225:py:class reference target not found: flytekitplugins.kfpytorch.PyTorch
make: *** [Makefile:7: html] Error 2
flyteorg/flytesnacks
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/sql)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/papermilltasks)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/pandera_examples)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/onnx_examples)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/modin_examples)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/mlflow_example)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/greatexpectations)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/duckdb_examples)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/dolt)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/flytekit_plugins/dbt_example)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/external_services/databricks)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/aws/sagemaker_training)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/aws/sagemaker_pytorch)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/aws/batch)
GitHub Actions: Build & Push to GHCR (cookbook/integrations/aws/athena)
GitHub Actions: Build & Push to GHCR (cookbook/core)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/ml_training/spark_horovod)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/ml_training/pima_diabetes)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/ml_training/nlp_processing)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/ml_training/mnist_classifier)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/ml_training/house_price_prediction)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/feature_engineering/feast_integration)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/feature_engineering/eda)
GitHub Actions: Build & Push to GHCR (cookbook/case_studies/bioinformatics/blast)
GitHub Actions: Docs Warnings
✅ 3 other checks have passed
3/28 successful checksGitHub
05/11/2023, 2:39 PM<https://github.com/flyteorg/flytepropeller/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flytepropeller/commit/9a4ea000af6bb7b959daa00f26abea7c2e3262e7|9a4ea000>
- Use group attribute for KV version and add DB engine support (#539)
flyteorg/flytepropellerGitHub
05/11/2023, 2:39 PMgroup
field to allow using both (and other Vault secret backends) dynamically.
This also allows using the Database Secrets Engine, which effectively means noop since we want to read the whole credential and not just a specific key.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☑︎ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
The Secret protobuf defines a group field which we have previously ignored. Leveraging that field allows to simplify the secret manager config and allows supporting multiple Vault secret backends dynamically by deciding what secret template to inject based on the optional group
parameter.
This is a breaking change for people using the previous version of the Vault Secret Manager. To migrate they will need to remove the kvVersion
key from their propeller configs and specify the appropriate group version in their secret requests, like for example so:
@task(
secret_requests=[
Secret(group="foo", key="bar", group_version="kv2"),
]
)
Follow-up issue
NA
flyteorg/flytepropeller
GitHub Actions: Build & Push Flytepropeller Image
GitHub Actions: Goreleaser
GitHub Actions: Bump Version
✅ 11 other checks have passed
11/14 successful checksGitHub
05/11/2023, 3:17 PMGitHub
05/11/2023, 3:48 PM<https://github.com/flyteorg/flyte/tree/master|master>
by jeevb
<https://github.com/flyteorg/flyte/commit/5f390819c1479572ef3b36ad21eaf5abc333c4b8|5f390819>
- Fix indentation in flyte-binary helm chart template (#3669)
flyteorg/flyteGitHub
05/11/2023, 5:15 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/4037fa0135b8faf078f0be652eb6abd486c570ce|4037fa01>
- Add support for using a list as an input for a subworkflow (#1605)
flyteorg/flytekitGitHub
05/11/2023, 5:32 PMGitHub
05/11/2023, 6:15 PM@workflow
decorator) and dynamic workflows (via the @dynamic
decorator). As the names suggest, static workflows are created at compile time and registered to some target Flyte cluster. On the other hand, dynamic workflows are compiled at runtime so that they can materialize the inputs of the workflow and use them to influence the shape of the execution graph.
Problem Statement
Both static and dynamic workflows pose a problem. While they provide type safety (moreso for static, although type errors will occur when dynamic workflows are created at runtime), they both suffer from inflexibility in expressing execution graphs that many Python flytekit
users may be accustomed to. This is because in actuality, @workflow
and @dynamic
function code is not Python code: it's a DSL for constructing execution graphs that suffer from the "uncanny valley" of looking like Python code, but isn't actually Python code. For example:
• if... elif... else
statements not supported and the equivalent syntax is cumbersome to write with conditionals
.
• try... except
statements are not supported.
• writing async
code is not supported.
For Python users who come in with expectations of writing Python to compose their workflows, Flyte is surprising both in terms of (a) the lack of useful error messages when trying illegal combinations of Flyte and Python syntax and (b) the inability to compose tasks using the asyncio
syntax. The scope of this RFC is to focus on the latter.
Proposal
This RFC proposes adding support for "eager workflows" indicated by the @eager
decorator in a new subpackage flytekit.experimental
, which will contain experimental features. This construct allows users to write workflows pretty much like how one would write asynchronous Python code. For example:
from flytekit import task
from flytekit.experimental import eager
class CustomException(Exception): ...
BestModel = NamedTuple("BestModel", model=LogisticRegression, metric=float)
@task
def get_data() -> pd.DataFrame:
"""Get the wine dataset."""
return load_wine(as_frame=True).frame
@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
"""Simplify the task from a 3-class to a binary classification problem."""
return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))
@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
"""Train a model on the wine dataset."""
features = data.drop("target", axis="columns")
target = data["target"]
return LogisticRegression(max_iter=3000, **hyperparameters).fit(features, target)
@task
def evaluate_model(data: pd.DataFrame, model: LogisticRegression) -> float:
"""Train a model on the wine dataset."""
features = data.drop("target", axis="columns")
target = data["target"]
return float(accuracy_score(target, model.predict(features)))
@eager
async def main() -> BestModel:
data = await get_data()
processed_data = await process_data(data=data)
# split the data
try:
train, test = train_test_split(processed_data, test_size=0.2)
except Exception as exc:
raise CustomException(str(exc)) from exc
models = await asyncio.gather(*[
train_model(data=train, hyperparameters={"C": x})
for x in [0.1, 0.01, 0.001, 0.0001, 0.00001]
])
results = await asyncio.gather(*[
evaluate_model(data=test, model=model) for model in models
])
best_model, best_result = None, float("-inf")
for model, result in zip(models, results):
if result > best_result:
best_model, best_result = model, result
assert best_model is not None, "model cannot be None!"
return best_model, best_result
Trade-offs
At a high-level, we can think of these three ways of writing workflows in terms of Flyte promises, and what data are accessible to the user in the workflow code:
Open Questions
• How to handle FlyteRemote
configuration?
• Authentication: can flytepropeller pass in everything needed into eager workflows to execute tasks?
• Use flytepropeller's token to mint a new token with limited permissions, e.g. the eager workflow can only kick off new executions from the eager workflow's execution.
MVP
WIP PR: flyteorg/flytekit#1579
• Rely on secret requests to use client secrets to be able to authenticate
• We'll provide OSS users with instructions to use this feature, pushing the responsibility of creating the client secret to the user's platform team
• No backend changes: collect feedback first before investing in more changes.
• Eager workflows (tasks masquerading as workflows) also produce a Flyte Deck that shows the list of subtasks that are executed:
image▾
flytepropeller
and hard-coded secret group/key:
SECRET_GROUP = "eager-mode"
SECRET_KEY = "client_secret"
flyteorg/flyteGitHub
05/11/2023, 8:12 PMGitHub
05/11/2023, 8:51 PMGitHub
05/11/2023, 9:19 PMGitHub
05/11/2023, 9:28 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/993201f5b1fc4398943aef96413fc5850c0bee68|993201f5>
- Improve task decorator type hints with overload (#1631)
flyteorg/flytekitGitHub
05/11/2023, 10:32 PMnode_executions
has a self-referencing foreign key. The main id
column of the table is a bigint
whereas the self-foreign-key parent_id
was an int
. This was a rooted in an early version of gorm and should not affect most users. Out of an abundance of caution however, we are adding a migration to patch this issue in a manner that minimizes any locking.
To Deploy
When you deploy this release of Flyte, you should make sure that you have more than one pod for Admin running. (If you are running the flyte-binary helm chart, this patch release does not apply to you at all. All those deployments should already have the correct column type.) When the two new migrations that #554 added runs, the first one may take an extended period of time (hours). However, this is entirely non-blocking as long as there is another Admin instance available to serve traffic.
The second migration is locking, but even on very large tables, this migration was over in ~5 seconds, so you should not see any noticeable downtime whatsoever.
The migration will also check to see that your database falls into this category before running (ie, the parent_id
and the id
columns in node_executions
are mismatched). You can also do check this yourself using psql. If this migration is not needed, the migration will simply mark itself as complete and be a no-op otherwise.
flyteorg/flyteGitHub
05/11/2023, 10:34 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flyteadmin/commit/7f622dd73f18bb5374ee49c1699ca2994ff89419|7f622dd7>
- Retract v1.1.94 from go.mod (#562)
flyteorg/flyteadminGitHub
05/11/2023, 10:54 PMGitHub
05/11/2023, 11:16 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flytekit/commit/dd44bbaae99aebc35eb54a1ea6f4723606fd0cbc|dd44bbaa>
- add annotation option for serialization (#1615)
flyteorg/flytekitGitHub
05/11/2023, 11:28 PMGitHub
05/11/2023, 11:45 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/b7c4b80319f9c7bb1287ecebdf7a984f35557bb6|b7c4b803>
- feat: added support for mapped task parent retryAttempt (#756)
flyteorg/flyteconsoleGitHub
05/11/2023, 11:49 PMworkflow
decorator is hinted as always returning a WorkflowBase
, which is not true when _workflow_function
is `None`; this leads to a spurious type error depending on how workflow
is called; similar to #1631, I propose using typing.overload
to differentiate the return type of workflow
based on the value of _workflow_function
.
Type
☑︎ Bug Fix
☐ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☐ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
Here is an example of the typing bug this fix addresses:
import flytekit
import flytekit.remote
@flytekit.workflow
def my_workflow1() -> int:
return 0
@flytekit.workflow()
def my_workflow2() -> int:
return 0
my_workflow3 = flytekit.workflow(lambda: 0)
remote = flytekit.remote.FlyteRemote(...) # type: ignore
# before
reveal_type(my_workflow1) # Type of "my_workflow1" is "WorkflowBase"
remote.register_workflow(my_workflow1) # OK
reveal_type(my_workflow2) # Type of "my_workflow2" is "Tuple[Promise] | Promise | VoidPromise | Tuple | None"
remote.register_workflow(my_workflow2) # error: Argument of type "Tuple[Promise] | Promise | VoidPromise | Tuple | None" cannot be assigned to parameter "entity" of type "WorkflowBase" in function "register_workflow"
reveal_type(my_workflow3) # Type of "my_workflow3" is "WorkflowBase"
remote.register_workflow(my_workflow3) # OK
# after
reveal_type(my_workflow1) # Type of "my_workflow1" is "PythonFunctionWorkflow"
remote.register_workflow(my_workflow1) # OK
reveal_type(my_workflow2) # Type of "my_workflow2" is "PythonFunctionWorkflow"
remote.register_workflow(my_workflow2) # OK
reveal_type(my_workflow3) # Type of "my_workflow3" is "PythonFunctionWorkflow"
remote.register_workflow(my_workflow3) # OK
I am proposing to change the return type from WorkflowBase
to PythonFunctionWorkflow
. This is the class being instantiated and seems to match what is done by task
, which returns PythonFunctionTask
instead of the base class PythonTask
which is expected by FlyteRemote.register_task
.
I am also proposing we use the condition if _workflow_function is not None:
instead of if _workflow_function:
. This is a recommendation by the Google Python Style Guide: https://google.github.io/styleguide/pyguide.html#2144-decision
Tracking Issue
N/A
Follow-up issue
N/A
flyteorg/flytekit
✅ All checks have passed
2/2 successful checksGitHub
05/11/2023, 11:56 PMGitHub
05/12/2023, 12:19 AMGitHub
05/12/2023, 12:56 PMGitHub
05/12/2023, 2:13 PMGitHub
05/12/2023, 3:42 PM<https://github.com/flyteorg/flyte/tree/master|master>
by jeevb
<https://github.com/flyteorg/flyte/commit/41e2ef4537663cbefe8efb4c1da8d39c159b6c84|41e2ef45>
- Only login to GHCR when pushing a single binary image (#3674)
flyteorg/flyteGitHub
05/12/2023, 4:02 PMGitHub
05/12/2023, 4:22 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/0a1f2897275a3f7fbc27bf54c3d74d139ff85763|0a1f2897>
- Delete removed data persistence classes from docs (#1633)
flyteorg/flytekitGitHub
05/12/2023, 5:17 PMGitHub
05/12/2023, 5:55 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flyteadmin/commit/2fdd3992430509acdc4ea1a9d8e58c032fb160c2|2fdd3992>
- Add environment variables to execution spec (#556)
flyteorg/flyteadminGitHub
05/12/2023, 6:37 PMGitHub
05/12/2023, 6:45 PMecho '{"client_id":"abc","client_secret":"def","url":"<http://my-endpoint.com|my-endpoint.com>"}' | base64
FLYTE_CREDENTIALS_API_KEY=eyJjbGllbnRfaWQiOiJhYmMiLCJjbGllbnRfc2VjcmV0IjoiZGVmIiwidXJsIjoibXktZW5kcG9pbnQuY29tIn0K pyflyte run --remote my_wp.py wf
Minor fixes:
☑︎ Pass verify
options (for SSL) to other authentication providers not just PKCE
☑︎ If an APIKey is set as an env var and no config file exist, do not assume it's sandbox cluster
Tracking Issue
https://github.com/flyteorg/flyte/issues/
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/
flyteorg/flytekit
GitHub Actions: lint
✅ 29 other checks have passed
29/30 successful checks