GitHub
04/05/2023, 7:07 PMrich
Tracking Issue
flyteorg/flyte#3506
Follow-up issue
NA
flyteorg/flytekit
✅ All checks have passed
2/2 successful checksGitHub
04/05/2023, 7:10 PMGitHub
04/05/2023, 7:53 PM<https://github.com/flyteorg/community/tree/main|main>
by kumare3
<https://github.com/flyteorg/community/commit/f8c06448561a27b136c6a02faaa619d4d5fd35f1|f8c06448>
- Update ADOPTERS.md (#4)
flyteorg/communityGitHub
04/05/2023, 8:14 PMfrom dataclasses import dataclass
import torch
from dataclasses_json import dataclass_json
from flytekit import dynamic, task, workflow
from flytekitplugins.kfpytorch import PyTorch
from .torch_elastic_task import Elastic
@dataclass_json
@dataclass
class Config:
lr: float = 1e-5
bs: int = 64
name: str = "foo"
@task
def init_model() -> torch.nn.Module:
model = torch.nn.Linear(11, 22)
return model
"""
This doesn't start a kubelfow pytorch job yet but a single python task Pod which then
runs a local worker group in sub-processes.
The changes in the flyteidl protobuf definitions, the flytekit python api, and the
flytepropeller (operator) which we need to actually make this distributed on multiple nodes
are easy (see RFC document linked in PR description).
"""
@task(
task_config=Elastic(
min_replicas=1,
max_replicas=1,
start_method="spawn",
)
)
def train(config: Config, model: torch.nn.Module) -> tuple[str, Config, torch.nn.Module]:
import os
import torch
local_rank = os.environ["LOCAL_RANK"]
out_model = torch.nn.Linear(1000, int(local_rank) * 2000 + 1)
print(f"Training with config {config}")
config.name = "modified"
return f"result from local rank {local_rank}", config, out_model
@workflow
def wf(config: Config=Config()) -> tuple[str, Config, torch.nn.Module]:
model = init_model()
return train(config=config, model=model)
if __name__ == "__main__":
print(wf(config=Config()))
Type
☐ Bug Fix
☐ Feature
☐ Plugin
Are all requirements met?
☐ Code completed
☐ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
https://github.com/flyteorg/flyte/issues/
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/
flyteorg/flytekit
✅ All checks have passed
30/30 successful checksGitHub
04/05/2023, 11:51 PMimage▾
GitHub
04/06/2023, 10:44 AMAWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.
Essentially I want to do what k8s already nicely provides:
https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#define-a-container-environment-variable-with-data-from-a-single-secret
Using flyte secrets there is no way to control the ENV var name, something like e.g.
Secret(group="minio-write-creds", key="AWS_ACCESS_KEY_ID", mount_requirement=Secret.MountType.ENV_VAR)
will inject the secret as _FSEC_MINIO_WRITE_CREDS_AWS_ACCESS_KEY_ID
and this is currently hardcoded in flytepropeller.
This seems sensible iff you need to get the secret in Python code via flytekit context/secret manager, but not for raw containers.
Even for a Python task it would often be easier to just control the env var name (as e.g. in the case of the S3 credentials that boto3 would automatically pick up then).
Goal: What should the final outcome look like, ideally?
Maybe something like
Secret(group="minio-write-creds", key="access_key", mount_requirement=Secret.MountType.ENV_VAR, name="AWS_ACCESS_KEY_ID")
To decouple group/key name from env var name, just like k8s itself.
If name is not given it could still work by naming as prefix + group + key.
Describe alternatives you've considered
Working around this by wrapping my container task in a shell script that pull out the secret from the env var (or file) and populate the correct env var for the actual process.
But this is super cumbersome and actually requires the wrapper script to know about the group name (so k8s secret name), which makes it super awkward to change credentials by just using a different secret.
Same goes for secrets injected as file...
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
04/06/2023, 12:07 PMfrom flytekit import ContainerTask, kwtypes, workflow, Resources
hello_task = ContainerTask(
name="hello",
image="ubuntu:20.04",
requests=Resources(cpu="2", mem="1Gi"),
limits=Resources(mem="2Gi"),
command=["echo", "hello"]
)
Running this task results in a pod with cpu limit also set to 2 (same as request), but there should be no cpu limit.
Expected behavior
If no CPU limit is specified, it is also not set implicitly and no CPU limit is applied in k8s.
So I propose to only copy the requests to limits as a whole if limits are completely unset instead of filling missing limits with the respective requests values.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
04/06/2023, 1:55 PMGitHub
04/06/2023, 4:57 PM<https://github.com/flyteorg/flyte/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flyte/commit/bc521dac6f0dc6e1942efa6447611cd3354540c4|bc521dac>
- Update Flyte components (#3572)
flyteorg/flyteGitHub
04/06/2023, 5:29 PMnoop
) but please be aware that it will add a minute or so to the
init container/command that runs the migrations in the default Helm charts. Notably, because these should be a no-op, they also do not come with any rollback commands.
If you experience any issues, please let us know.
Flytekit
Python 3.11 is now officially supported.
Revamped Data subsystem
The data persistence layer was completely revamped. We now rely exclusively on fsspec to handle IO.
Most users will benefit from a more performant IO subsystem, in other words,
no change is needed in user code.
The data persistence layer has undergone a thorough overhaul. We now exclusively utilize fsspec for managing input and output operations.
For the majority of users, the improved IO subsystem provides enhanced performance, meaning that no modifications are required in their existing code.
This change opened the door for flytekit to rely on fsspec streaming capabilities. For example, let's say we want to stream a file, now we're able to do:
@task
def copy_file(ff: FlyteFile) -> FlyteFile:
new_file = FlyteFile.new_remote_file(ff.remote_path)
with ff.open("r", cache_type="simplecache", cache_options={}) as r:
with new_file.open("w") as w:
w.write(r.read())
return new_file
This feature is marked as experimental. We'd love feedback on the API!
Limited support for partial tasks
We can use functools.partial to "freeze"
some task arguments. Let's take a look at an example where we partially fix the parameter for a task:
@task
def t1(a: int, b: str) -> str:
return f"{a} -> {b}"
t1_fixed_b = functools.partial(t1, b="hello")
@workflow
def wf(a: int) -> str:
return t1_fixed_b(a=a)
Notice how calls to t1_fixed_b
do not need to specify the b
parameter.
This also works for MapTasks in a limited capacity. For example:
from flytekit import task, workflow, partial, map_task
@task
def t1(x: int, y: float) -> float:
return x + y
@workflow
def wf(y: List[float]):
partial_t1 = partial(t1, x=5)
return map_task(partial_t1)(y=y)
We are currently seeking feedback on this feature, and as a result, it is labeled as experimental for now.
Also worth mentioning that fixing parameters of type list is not currently supported. For example, if we try to register this workflow:
from functools import partial
from typing import List
from flytekit import task, workflow, map_task
@task
def t(a: int, xs: List[int]) -> str:
return f"{a} {xs}"
@workflow
def wf():
partial_t = partial(t, xs=[1, 2, 3])
map_task(partial_t)(a=[1, 2])
We're going to see this error:
❯ pyflyte run workflows/example.py wf
Failed with Unknown Exception <class 'ValueError'> Reason: Map tasks do not support partial tasks with lists as inputs.
Map tasks do not support partial tasks with lists as inputs.
Flyteconsole
Multiple bug fixes around waiting for external inputs.
Better support for dataclasses in the launch form.
flyteorg/flyteGitHub
04/06/2023, 5:30 PMGitHub
04/06/2023, 5:56 PMGitHub
04/06/2023, 6:16 PMFlyteFile
compatible with Annotated[..., HashMethod]
by @AdrianoKF in #1544
• move FlyteSchema deprecation warning to initialization method by @cosmicBboy in #1558
• add pod_template and pod_template_name arguments for ContainerTask by @flixr in #1515
• Pass locally defined scopes to RemoteClientConfigStore by @franco-bocci in #1553
• TypeTransformer for TensorFlow model by @samhita-alla in #1562
• Remove flytekit-fsspec plugin from default dockerfile by @wild-endeavor in #1561
• Device auth flow / Headless auth by @kumare3 in #1552
• support python 3.11 by @cosmicBboy in #1557
• url encode secret in client credentials flow by @wild-endeavor in #1566
• Python run multiple files by @pingsutw in #1559
• General Partial support in flytekit and multi-list support in flytekit by @kumare3 in #1556
• fix: Silence keyring warnings by changing to debug by @ggydush in #1568
• Support GCP secrets by @wild-endeavor in #1571
• Automatically remove unused import by @pingsutw in #1574
• Disallow partial lists in map tasks by @eapolinario in #1577
• Remove duplicate reporting logic by @wild-endeavor in #1578
• [Core feature] Convert List[Any] to a single pickle file by @Yicheng-Lu-llll in #1535
• Improve authoring structure documentation by @samhita-alla in #1572
New Contributors
• @bryanwweber made their first contribution in #1538
• @franco-bocci made their first contribution in #1553
• @ggydush made their first contribution in #1568
• @Yicheng-Lu-llll made their first contribution in #1535
Full Changelog: v1.4.2...v1.5.0
flyteorg/flytekitGitHub
04/06/2023, 9:02 PMautomountServiceAccountToken: false
set. Smoke tested in sandbox with and without automountServiceAccountToken: false
, with both the default shared process namespace watcher and kubeapi watcher respectively.
Type
☑︎ Bug Fix
☐ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
Moved initialization of k8s client from copilot root command to NewKubeAPIWatcher
function. Now it is only initialized when the kube-api
watcher is specified.
flyteorg/flytecopilot
✅ All checks have passed
2/2 successful checksGitHub
04/06/2023, 9:37 PM<https://github.com/flyteorg/flytecopilot/tree/master|master>
by jeevb
<https://github.com/flyteorg/flytecopilot/commit/12f9ff5370a929f2f2b986adbf203f90766dd50c|12f9ff53>
- Lazily initialize kubernetes client only when using kube api watcher (#56)
flyteorg/flytecopilotGitHub
04/06/2023, 10:34 PMGitHub
04/06/2023, 10:50 PMGitHub
04/07/2023, 12:03 AMGitHub
04/07/2023, 12:37 AMGitHub
04/07/2023, 12:39 AMctx = contextutils.WithRequestID(ctx, "request-123")
logger.Infof(ctx, "this happened")
"this happened"
will be tagged with the request id
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☑︎ Unit tests added
☑︎ Code documentation added
☑︎ Any pending items have an associated Issue
Tracking Issue
fixes flyteorg/flyte#3577
flyteorg/flytestdlib
Codecov: 67.96% (-0.21%) compared to 0dbe3c2
✅ 6 other checks have passed
6/7 successful checksGitHub
04/07/2023, 12:42 AMGitHub
04/07/2023, 2:09 AM<https://github.com/flyteorg/flytekit/tree/master|master>
by eapolinario
<https://github.com/flyteorg/flytekit/commit/e3cee8326c99ed34751752c922a5852ae38e43ac|e3cee832>
- Unify sqlalchemy Dockerfiles (#1585)
flyteorg/flytekitGitHub
04/07/2023, 2:11 AMTensorFlow 2.11.1
Release 2.11.1
Note: TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin.
• Security vulnerability fixes will no longer be patched to this Tensorflow version. The latest Tensorflow version includes the security vulnerability fixes. You can update to the latest version (recommended) or patch security vulnerabilities yourself steps. You can refer to the release notes of the latest Tensorflow version for a list of newly fixed vulnerabilities. If you have any questions, please create a GitHub issue to let us know.
This release also introduces several vulnerability fixes:
• Fixes an FPE in TFLite in conv kernel CVE-2023-27579
• Fixes a double free in Fractional(Max/Avg)Pool CVE-2023-25801
• Fixes a null dereference on ParallelConcat with XLA CVE-2023-25676
• Fixes a segfault in Bincount with XLA CVE-2023-25675
• Fixes an NPE in RandomShuffle with XLA enable CVE-2023-25674
• Fixes an FPE in TensorListSplit with XLA CVE-2023-25673
• Fixes segmentation fault in tfg-translate CVE-2023-25671
• Fixes an NPE in QuantizedMatMulWithBiasAndDequantize CVE-2023-25670
• Fixes an FPE in AvgPoolGrad with XLA CVE-2023-25669
• Fixes a heap out-of-buffer read vulnerability in the QuantizeAndDequantize operation CVE-2023-25668
• Fixes a segfault when opening multiframe gif CVE-2023-25667
• Fixes an NPE in SparseSparseMaximum CVE-2023-25665
• Fixes an FPE in AudioSpectrogram CVE-2023-25666
• Fixes a heap-buffer-overflow in AvgPoolGrad CVE-2023-25664
• Fixes a NPE in TensorArrayConcatV2 CVE-2023-25663
• Fixes a Integer overflow in EditDistance CVE-2023-25662
• Fixes a Seg fault inCVE-2023-25660tf.raw_ops.Print
• Fixes a OOB read in DynamicStitch CVE-2023-25659
• Fixes a OOB Read in GRUBlockCellGrad CVE-2023-25658
TensorFlow 2.11.0
Release 2.11.0
Breaking Changes
• Thebase class now points to the new Keras optimizer, while the old optimizers have been moved to thetf.keras.optimizers.Optimizer
namespace.tf.keras.optimizers.legacy
If you find your workflow failing due to this change, you may be facing one of the following issues:
• Checkpoint loading failure. The new optimizer handles optimizer state differently from the old optimizer, which simplifies the logic of checkpoint saving/loading, but at the cost of breaking checkpoint backward compatibility in some cases. If you want to keep using an old checkpoint, please change your optimizer to(e.g.<http://tf.keras.optimizer.legacy.XXX|tf.keras.optimizer.legacy.XXX>
).tf.keras.optimizer.legacy.Adam
• TF1 compatibility. The new optimizer,, does not support TF1 any more, so please use the legacy optimizertf.keras.optimizers.Optimizer
. We highly recommend migrating your workflow to TF2 for stable support and new features.<http://tf.keras.optimizer.legacy.XXX|tf.keras.optimizer.legacy.XXX>
• Old optimizer API not found. The new optimizer,, has a different set of public APIs from the old optimizer. These API changes are mostly related to getting rid of slot variables and TF1 support. Please check the API documentation to find alternatives to the missing API. If you must call the deprecated API, please change your optimizer to the legacy optimizer.tf.keras.optimizers.Optimizer
• Learning rate schedule access. When using a, the new optimizer'stf.keras.optimizers.schedules.LearningRateSchedule
property returns the current learning rate value instead of alearning_rate
object as before. If you need to access theLearningRateSchedule
object, please useLearningRateSchedule
.optimizer._learning_rate
• If you implemented a custom optimizer based on the old optimizer. Please set your optimizer to subclass. If you want to migrate to the new optimizer and find it does not support your optimizer, please file an issue in the Keras GitHub repo.<http://tf.keras.optimizer.legacy.XXX|tf.keras.optimizer.legacy.XXX>
• Errors, such as. The new optimizer requires all optimizer variables to be created at the firstCannot recognize variable...
orapply_gradients()
call. If your workflow calls the optimizer to update different parts of the model in multiple stages, please callminimize()
before the training loop.optimizer.build(model.trainable_variables)
• Timeout or performance loss. We don't anticipate this to happen, but if you see such issues, please use the legacy optimizer, and file an issue in the Keras GitHub repo.
The old Keras optimizer will never be deleted, but will not see any new feature additions. New optimizers (for example,) will only be implemented based on the newtf.keras.optimizers.Adafactor
base class.tf.keras.optimizers.Optimizer
•code is a legacy copy of Keras since the TensorFlow v2.7 release, and will be deleted in the v2.12 release. Please remove any import oftensorflow/python/keras
and use the public API withtensorflow.python.keras
orfrom tensorflow import keras
.import tensorflow as tf; tf.keras
Major Features and Improvements... (truncated) Changelog Sourced from tensorflow's changelog.
Release 2.11.1
Note: TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin.
• Security vulnerability fixes will no longer be patched to this Tensorflow version. The latest Tensorflow version includes the security vulnerability fixes. You can update to the latest version (recommended) or patch security vulnerabilities yourself steps. You can refer to the release notes of the latest Tensorflow version for a list of newly fixed vulnerabilities. If you have any questions, please create …flyteorg/flytekit GitHub Actions: docs GitHub Actions: Docs Warnings ✅ 2 other checks have passed 2/4 successful checks
GitHub
04/07/2023, 4:25 AMGitHub
04/07/2023, 10:26 AM<https://github.com/flyteorg/flytesnacks/tree/master|master>
by samhita-alla
<https://github.com/flyteorg/flytesnacks/commit/3bb8e2e1ce26a754568a2a8ab9d80fb972456b7b|3bb8e2e1>
- Add reference launch plan example (#977)
flyteorg/flytesnacksGitHub
04/07/2023, 11:08 AMGitHub
04/07/2023, 2:54 PM<https://github.com/flyteorg/flyteplugins/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flyteplugins/commit/a3f73343d883c8ae39b22757f90f4f3bb290d50a|a3f73343>
- Inject container resource during BuildRawContainer (#335)
flyteorg/flytepluginsGitHub
04/07/2023, 2:55 PMGitHub
04/07/2023, 5:22 PM<https://github.com/flyteorg/flytepropeller/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flytepropeller/commit/986f014ae69505353833c20a38d4a58094b800cc|986f014a>
- moved controller runtime start out of webhook Run function (#546)
flyteorg/flytepropellerGitHub
04/07/2023, 6:03 PM