GitHub
05/02/2023, 9:39 PM<https://github.com/flyteorg/community/tree/main|main>
by davidmirror-ops
<https://github.com/flyteorg/community/commit/1a98a02be4070478d7c98d62eadfde98f5263574|1a98a02b>
- Update adopters-meetup.md
flyteorg/communityGitHub
05/02/2023, 10:26 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flytekit/commit/c541be9c9a45119f2f62edb4ce3ec6e5be7aeb21|c541be9c>
- Skip the image building process if the check for its existence fails (#1614)
flyteorg/flytekitGitHub
05/02/2023, 10:37 PMGitHub
05/03/2023, 1:29 AM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/baf0b894f6776b76aa2af4aeafb407a7a8198e0e|baf0b894>
- Enable torch elastic training (torchrun) (#1603)
flyteorg/flytekitGitHub
05/03/2023, 4:47 AMGitHub
05/03/2023, 1:29 PMGitHub
05/03/2023, 3:55 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by ursucarina
<https://github.com/flyteorg/flyteconsole/commit/5b368890caf02ca8394432b828cf88db380aa32d|5b368890>
- chore: guard against /tasks failing (#750)
flyteorg/flyteconsoleGitHub
05/03/2023, 4:58 PM24K ./lib/python3.9/site-packages/pyspark/sql/avro
12K ./lib/python3.9/site-packages/pyspark/sql/pandas/_typing/protocols
24K ./lib/python3.9/site-packages/pyspark/sql/pandas/_typing
104K ./lib/python3.9/site-packages/pyspark/sql/pandas/__pycache__
264K ./lib/python3.9/site-packages/pyspark/sql/pandas
1.6M ./lib/python3.9/site-packages/pyspark/sql
301M ./lib/python3.9/site-packages/pyspark
=====================================
• Add a docker file in /flytekitplugins-spark
• Update GH workflow to build spark image
• Use default executor path and applications path when using default spark image
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
image▾
import datetime
import random
from operator import add
import flytekit
from flytekit import Resources, task, workflow
from flytekitplugins.spark import Spark
from flytekit.image_spec.image_spec import ImageSpec
spark_image = ImageSpec(registry="pingsutw")
@task(
task_config=Spark(
# this configuration is applied to the spark cluster
spark_conf={
"spark.driver.memory": "1000M",
"spark.executor.memory": "1000M",
"spark.executor.cores": "1",
"spark.executor.instances": "2",
"spark.driver.cores": "1",
}
),
limits=Resources(mem="2000M"),
cache_version="1",
container_image=spark_image,
)
def hello_spark(partitions: int) -> float:
print("Starting Sparkfk wifth Partitions: {}".format(partitions))
n = 100000 * partitions
sess = flytekit.current_context().spark_session
count = (
sess.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
)
pi_val = 4.0 * count / n
print("Pi val is :{}".format(pi_val))
return pi_val
def f(_):
x = random.random() * 2 - 1
y = random.random() * 2 - 1
return 1 if x**2 + y**2 <= 1 else 0
@task(cache_version="1", container_image=spark_image)
def print_every_time(value_to_print: float, date_triggered: datetime.datetime) -> int:
print("My printed value: {} @ {}".format(value_to_print, date_triggered))
return 1
@workflow
def wf(triggered_date: datetime.datetime = datetime.datetime.now()) -> float:
"""
Using the workflow is still as any other workflow. As image is a property of the task, the workflow does not care
about how the image is configured.
"""
pi = hello_spark(partitions=50)
print_every_time(value_to_print=pi, date_triggered=triggered_date)
return pi
if __name__ == "__main__":
print(f"Running {__file__} main...")
print(
f"Running my_spark(triggered_date=datetime.datetime.now()){wf(triggered_date=datetime.datetime.now())}"
)
Tracking Issue
NA
Follow-up issue
NA
flyteorg/flytekit
Codecov: 54.54% of diff hit (target 70.10%)
Codecov: 70.10% (-0.01%) compared to baf0b89
✅ 28 other checks have passed
28/30 successful checksGitHub
05/03/2023, 6:37 PM_with_override(resource=...
and @task(requests=...)
, they are injected as the node resource override in buildNodeSpec
Expected behavior
For the case of @task(requests=...)
, the resource override on the node should be empty, and the resource is injected from tasktemplate at BuildRawContainer
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/03/2023, 6:39 PMTaskExecutionMetadata
(which maybe set on ExecutionContext
within flytepropeller) to support setting environment variables at workflow runtime.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☑︎ Code documentation added
☐ Any pending items have an associated Issue
Complete description
^^^
Tracking Issue
flyteorg/flyte#2447
Follow-up issue
NA
flyteorg/flyteplugins
✅ All checks have passed
4/4 successful checksGitHub
05/03/2023, 6:54 PMEnvironmentVariables
field to the ExecutionConfig
which propagates through to the taskExecutionMetadata
so that variables defined here may be applied to all tasks Flyte starts.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☑︎ Code documentation added
☐ Any pending items have an associated Issue
Complete description
^^^
Tracking Issue
flyteorg/flyte#2447
Follow-up issue
NA
flyteorg/flytepropeller
GitHub Actions: Build & Push Flytepropeller Image
GitHub Actions: Goreleaser
GitHub Actions: Bump Version
✅ 11 other checks have passed
11/14 successful checksGitHub
05/03/2023, 7:54 PM<https://github.com/flyteorg/flyteplugins/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flyteplugins/commit/1e0faabcc0980c56b6679dafc00abac6783737a5|1e0faabc>
- Adding support for environment variables set on execution (#344)
flyteorg/flytepluginsGitHub
05/03/2023, 7:56 PMGitHub
05/03/2023, 8:23 PM<https://github.com/flyteorg/flytepropeller/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flytepropeller/commit/5b50d88ba825060d4201955348942800bac80f50|5b50d88b>
- Added support for EnvironmentVariables on ExecutionConfig (#558)
flyteorg/flytepropellerGitHub
05/03/2023, 9:08 PMGitHub
05/03/2023, 10:55 PMGitHub
05/03/2023, 11:19 PMGitHub
05/04/2023, 4:12 AMpyflyte run --env {"foo": "bar"} ...
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☐ Code completed
☐ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
^^^
Tracking Issue
flyteorg/flyte#2447
Follow-up issue
NA
flyteorg/flytekit
Codecov: 71.18% (-0.04%) compared to e44b802
Codecov: 21.73% of diff hit (target 71.22%)
✅ 28 other checks have passed
28/30 successful checksGitHub
05/04/2023, 7:35 AMDISABLE_AUTH=1
.
Expected behavior
login can be disabled, either by setting an env var, or via some other mechanism.
Additional context to reproduce
1. Start flyteconsole locally connecting to flyteadmin
2. Try opening a project on flyteconsole
3. A login redirect is observed
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/04/2023, 8:20 AMtrimmedErrMessageLen
so that error message in list of workflow execution is sufficiently informative.
Type
☑︎ Bug Fix
☐ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
Complete description
The current value of trimmedErrMessageLen
limit the error message shown in the workflow execution list page too much that it become uninformative. Users have to go to the corresponding execution details to get the full error message even for a short error message.
This PR increases the limit of error message length to 10240 so that most error message can be shown properly.
Screen.Recording.2023-05-04.at.4.15.39.PM.mov
Follow-up issue
NA
flyteorg/flyteadmin
GitHub Actions: End2End Test / End to End tests
GitHub Actions: Integration Test / Integration tests
✅ 8 other checks have passed
8/10 successful checksGitHub
05/04/2023, 11:33 AM<https://github.com/flyteorg/flyte/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flyte/commit/04bf39c13e18a3a04ab919d45ac56ac68933a151|04bf39c1>
- [RFC] ArrayNode (#3446)
flyteorg/flyteGitHub
05/04/2023, 11:33 AMGitHub
05/04/2023, 3:15 PMGitHub
05/04/2023, 6:14 PMTraceback (most recent call last):
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_interceptor.py", line 274, in continuation
response, call = self._thunk(new_method).with_call(
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_interceptor.py", line 301, in with_call
return self._with_call(request,
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_interceptor.py", line 290, in _with_call
return call.result(), call
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_channel.py", line 379, in result
raise self
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_interceptor.py", line 274, in continuation
response, call = self._thunk(new_method).with_call(
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_channel.py", line 1043, in with_call
return _end_unary_response_blocking(state, call, True, None)
File "/home/runner/.local/lib/python3.10/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.NOT_FOUND
details = "missing entity of type TASK with identifier project:"flytetester" domain:"development" name:"workflows.bayesian_optimization_example.concurrent_trials" version:"cd16b9c4-27e6-44c1-9e8a-5542f5fa0ca4" "
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"missing entity of type TASK with identifier project:\"flytetester\" domain:\"development\" name:\"workflows.bayesian_optimization_example.concurrent_trials\" version:\"cd16b9c4-27e6-44c1-9e8a-5542f5fa0ca4\" ", grpc_status:5, created_time:"2023-05-03T22:35:43.500233657+00:00"}"
>
The code that generated this error is in the following gist, registration does seem to work correctly; execution fails when a workflow containing a dynamic task is located.
When viewing the executed workflow with the UI; the tasks actually do complete. However when interacting with the admin api endpoint:
https://hosted_environment.example/v1/tasks/project_name/domain_name/workflows.bayesian_optimization_example.concurrent_trials/cd16b9c4-27e6-44c1-9e8a-5542f5fa0ca4
the following error message can be seen:
{
"error": "missing entity of type TASK with identifier project:\"flytetester\" domain:\"development\" name:\"workflows.bayesian_optimization_example.suggest_points\" version:\"cd16b9c4-27e6-44c1-9e8a-5542f5fa0ca4\" ",
"code": 5,
"message": "missing entity of type TASK with identifier project:\"flytetester\" domain:\"development\" name:\"workflows.bayesian_optimization_example.suggest_points\" version:\"cd16b9c4-27e6-44c1-9e8a-5542f5fa0ca4\" "
}
This remote workflow was registered to the flyte cluster using flytekit remote.
Expected behavior
all tasks are discoverable by the flyte cluster, and remote.wait() waits successfully until the execution completes, as would be the case with non-dynamic task based workflows.
Additional context to reproduce
if need to find a flytekit remote example, there's one in https://github.com/flyteorg/flytekit-python-template.
You will need a hosted flyte cluster, and an App (Operator) that has access to said cluster.
git clone <https://github.com/flyteorg/flytekit-python-template> && cd flytekit-python-template
git checkout zeryx/bug/dynamic_remote_execute_failure
pip install -r requirements.txt
python integration.py
python integration.py --host YOURCLUSTER.DOMAIN --client_id YOUR_APP_OPERATOR_ID --client_secret YOUR_APP_OPERATOR_SECRET --image-hostname <http://ghcr.io/zeryx/flytekit-python-template|ghcr.io/zeryx/flytekit-python-template> --image_suffix pr-1
This should attempt to register the bayesian-optimization to your cluster, and execute it.
If successful; you will recieve the output response from the model, otherwise an error.
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/04/2023, 7:48 PMintegration.py
script and runIntegration.yaml
CI workflow.
When a PR is made against main, the runIntegration workflow will trigger.
This workflow will automatically generate new images if the code has changed in any particular template.
It it will then execute integration.py to automatically deploy and execute (with default inputs) all template workflows on a particular cluster.
If the registration and executions succeed, then the workflow returns true - a new tool for end-to-end integration of flyte changes.
built originally on the following branch:
zeryx#1
And the action successfully triggered; as can be seen here: https://github.com/zeryx/flytekit-python-template/actions/runs/4886507399
With the following outlining the ghcr.io image repository that was leveraged for this personal fork testing.
https://github.com/users/zeryx/packages/container/package/flytekit-python-template
Considerations
• Dynamic Workflows cannot be executed with default parameters, which as of v1 this only accepts default parameters; means Dynamic Workflows are not supported.
• Dynamic Tasks have a bug associated with them: flyteorg/flyte#3639, which does not allow for Remote Executions of normal workflows that contain dynamic tasks to be executed. Until this issue is resolved, Dynamic Tasks are not supported.
flyteorg/flytekit-python-template
✅ All checks have passed
4/4 successful checksGitHub
05/04/2023, 8:02 PM<https://github.com/flyteorg/flyteidl/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flyteidl/commit/5a3a44f13415bf5ed8ac57cc5814b7277a85825f|5a3a44f1>
- Refactor kf-operator plugins configs and support setting different specs for different replica groups (#386)
flyteorg/flyteidlGitHub
05/04/2023, 8:02 PMPS:
replicas: 1
template:
spec:
containers:
resources:
limits:
cpu: "1"
Worker:
replicas: 2
template:
spec:
containers:
resources:
limits:
<http://nvidia.com/gpu|nvidia.com/gpu>: 1
memory: "10G"
However, this is not currently supported in Flyte.
Goal: What should the final outcome look like, ideally?
Users should be able to override the resources specified in task definition by providing extra resources configs in task config in TfJob.
@task(
task_config=TfJob(
worker: {num=2, requests=Resources(cpu="1", mem="2Gi"), limits=Resources(cpu="1", mem="4Gi")},
ps: {num=1, requests=Resources(cpu="1", mem="2Gi"), limits=Resources(cpu="1", mem="4Gi"),
)
cache=True,
cache_version="1.0",
)
def mnist_tensorflow_job(hyperparameters: Hyperparameters) -> training_outputs:
Describe alternatives you've considered
We can make the resources field in task function to accept a new type TFJobResources, and implement different handling for related backend plugins. However, this requires lots of code changes and undermine consistencies of task definitions.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/04/2023, 8:02 PMflyteidl/plugins/
to flyteidl/plugins/kubeflow/
2. Create flyteidl/plugins/kubeflow/common.proto
that contains the definition RestartPolicy, RunPolicy and CleanPodPolicy
3. Add xxxReplicaSpec
in each job type to allow settings of replicas
, image
, resources
and restart_policy
Tracking Issue
fixes flyteorg/flyte#3308
Follow-up issue
flyteorg/flyteidl
✅ All checks have passed
13/13 successful checksGitHub
05/04/2023, 8:09 PMGitHub
05/04/2023, 8:39 PM