GitHub
05/16/2023, 3:11 PM24h
default deadline on the node when constructing a default workflow. We recently set the default workflow and node deadlines to 0s
because users unexpectedly saw executions terminated based on deadlines. We should address this similarly.
Type
☑︎ Bug Fix
☐ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☑︎ Code documentation added
☐ Any pending items have an associated Issue
Complete description
^^^
Tracking Issue
flyteorg/flyte#3642
Follow-up issue
NA
flyteorg/flyteadmin
✅ All checks have passed
2/2 successful checksGitHub
05/16/2023, 3:47 PMremoteClusterConfig
connecting to a remote Ray cluster B.
During Propeller startup, we hit a problem:
ray: [PluginInitializationFailed] Error getting informer for %!s(\u003cnil\u003e), caused by: no matches for kind \"RayJob\" in version \"<http://ray.io/v1alpha1\|ray.io/v1alpha1\>""
Installing the ray crd in cluster A solved this problem. However, we would expect it to be enough for it to just be installed in cluster B.
After fixing that, we also hit another issue during startup where a permission is needed for Propeller service account to list rayjobs in cluster A, where this permission should only be needed in cluster B.
W0516 13:51:19.779009 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167: failed to list *v1alpha1.RayJob: <http://rayjobs.ray.io|rayjobs.ray.io> is forbidden: User "<service-account>" cannot list resource "rayjobs" in API group "<http://ray.io|ray.io>" at the cluster scope
E0516 13:51:19.779088 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167: Failed to watch *v1alpha1.RayJob: failed to list *v1alpha1.RayJob: <http://rayjobs.ray.io|rayjobs.ray.io> is forbidden: User "<service-account>" cannot list resource "rayjobs" in API group "<http://ray.io|ray.io>" at the cluster scope
The problem seem to be that the plugin manager doesn't use the correct k8 client (e.g., the k8 client in the plugin).
Expected behavior
The plugin manager should use the custom kubernetes client from the plugin if it exists.
However at many places it doesn't, e.g.,:
1.https://github.com/flyteorg/flytepropeller/blob/9a4ea000af6bb7b959daa00f26abea7c2e3262e7/pkg/controller/nodes/task/k8s/plugin_manager.go#L654
2.https://github.com/flyteorg/flytepropeller/blob/9a4ea000af6bb7b959daa00f26abea7c2e3262e7/pkg/controller/nodes/task/k8s/plugin_manager.go#L546
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/16/2023, 4:48 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flyteadmin/commit/a853dac16e057b5763ea7b5bd3eff85fa08053c6|a853dac1>
- Add oauth http proxy for external server & Extract email from azure claim (#553)
flyteorg/flyteadminGitHub
05/16/2023, 5:27 PMRUNNING
Screenshot 2023-04-13 at 9 11 06 AM▾
Screenshot 2023-04-13 at 9 11 12 AM▾
GitHub
05/16/2023, 5:51 PMGitHub
05/16/2023, 6:15 PMGitHub
05/16/2023, 6:43 PMimage▾
GitHub
05/16/2023, 6:57 PM<https://github.com/flyteorg/flytekit-python-template/tree/main|main>
by zeryx
<https://github.com/flyteorg/flytekit-python-template/commit/8009256ddf727be229f2c32b7d0e0760b33dc467|8009256d>
- fixed the mnist examples so that they execute correctly with integration.py
flyteorg/flytekit-python-templateGitHub
05/16/2023, 6:58 PM<https://github.com/flyteorg/flytekit-python-template/tree/main|main>
by zeryx
<https://github.com/flyteorg/flytekit-python-template/commit/80ca9258bb9dfc7f1b9715693b76c58c3eff16aa|80ca9258>
- renamed the mnist_training_example back to its original form
flyteorg/flytekit-python-templateGitHub
05/16/2023, 7:05 PMFLYTECTL_CONFIG
env var pointing to a config file made for the sandbox/demo; however the teardown command does not tell users to unset the env var. So very often, users end up with having FLYTECTL_CONFIG
pointing to an invalid config file, and that makes any other flytectl commands not being executed correctly towards a production setup.
Provide a possible output or UX example
Teardown command could print out something like: please run unset FLYTECTL_CONFIG
to clean up the environment variable.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/16/2023, 7:27 PMGitHub
05/16/2023, 7:29 PMGitHub
05/16/2023, 9:54 PMGitHub
05/16/2023, 10:26 PMGitHub
05/16/2023, 10:33 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by ByronHsu
<https://github.com/flyteorg/flytekit/commit/3e62aabba8ca4b1da3d8ce0e244fde5bb91ef6a7|3e62aabb>
- Improve variable names (#1642)
flyteorg/flytekitGitHub
05/16/2023, 11:40 PMGitHub
05/16/2023, 11:53 PMCLIENT_CREDENTIALS
, which remains as the default
flyteorg/flytekit-python-template
GitHub Actions: integration_tests
GitHub Actions: build_images
✅ 2 other checks have passed
2/4 successful checksGitHub
05/17/2023, 12:57 AMGitHub
05/17/2023, 3:01 AMRunning into a weird issue where when I submit a job using FlyteRemote that has optional args (in this example, Optional[str], when I try to rerun it using the UI Rerun Button, the field doesn't get populated.
But if I launch the job using the UI, it does get populated in the rerun panel.
From OSS slack: https://flyte-org.slack.com/archives/CP2HDHKE1/p1684277073202399
Expected behavior
On relaunch, the launch form should respect optional params
Additional context to reproduce
Unknown
Screenshots
image (7)▾
image (6)▾
GitHub
05/17/2023, 3:39 AM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/01830e4a2d52613947838fc0e6fe43325688d5f6|01830e4a>
- Address resolution (#1567)
flyteorg/flytekitGitHub
05/17/2023, 3:54 AMDict[str, Union[List[List[int]], np.ndarray]]
. The error message indicates that the Union type does not match with the List type.
AssertionError: this should be a list and it is not: <class 'list'> vs union_type {\n variants {\n collection_type {\n collection_type {\n simple: INTEGER\n }\n }\n structure {\n tag: \"Typed List\"\n }\n }\n variants {\n blob {\n format: \"NumpyArray\"\n }\n structure {\n tag: \"Numpy Array\"\n }\n }\n}
When I set the type to FlytePickle
to force flytekit to pickle the data, I get an error message saying that the dictionary should be specified as the type.
raise AssertionError(\nAssertionError: this should be a Dictionary type and it is not: <class 'dict'> vs blob {\n format: \"PythonPickle\"\n}
Expected behavior
Since the dictionary isn't JSON serializable, it has to be pickled by default.
Additional context to reproduce
I don't have a reproducible code snippet. The data I'm trying to annotate looks as follows:
{'stride': [[232751, 0, 0]], 'input_features': array([[[ 0.02830195, -0.06521094, -0.14700186, ..., -0.6676489 ,
-0.6676489 , -0.6676489 ],
[-0.18206358, -0.22672772, -0.35251796, ..., -0.6676489 ,
-0.6676489 , -0.6676489 ],
[-0.45926154, -0.4596578 , -0.60333264, ..., -0.6676489 ,
-0.6676489 , -0.6676489 ],
...,
[-0.6676489 , -0.6676489 , -0.6676489 , ..., -0.6676489 ,
-0.6676489 , -0.6676489 ],
[-0.6676489 , -0.6676489 , -0.6676489 , ..., -0.6676489 ,
-0.6676489 , -0.6676489 ],
[-0.6676489 , -0.6676489 , -0.6676489 , ..., -0.6676489 ,
-0.6676489 , -0.6676489 ]]], dtype=float32)}
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/17/2023, 4:36 AMSeedProjects
is not set, there will be no default project in the console. This change may impact backward compatibility and could potentially create hurdles for new users. Specifically, those following the startup guide to build a local cluster and run sample files from the Flytesnacks repository may encounter difficulties.
For example, A new user might run:
# build the local cluster and start the flyte
cd /home/ubuntu
# install jq
sudo apt-get -y install jq
# install flytekit
pip install flytekit
export PATH=$PATH:~/.local/bin
# install flytectl
git clone <https://github.com/flyteorg/flytectl.git>
cd flytectl
sudo bash ./install.sh -b /usr/local/bin v0.6.17
cd ..
export FLYTECTL_CONFIG=/root/.flyte/config-sandbox.yaml
# start k3scluster, create pod for postgres, minio and dashboard
flytectl demo start --dev --image pingsutw/sandbox-lite-test
# connect to k3scluster
export KUBECONFIG=$KUBECONFIG:/root/.flyte/k3s/k3s.yaml
kubectl get pod -n flyte # check
# build and run all other the flyte components
git clone <https://github.com/flyteorg/flyte.git>
cd flyte
# replace with your repo
# go mod edit -replace <http://github.com/flyteorg/flyteplugins=github.com/flyteorg/flyteplugins@v1.0.37|github.com/flyteorg/flyteplugins=github.com/flyteorg/flyteplugins@v1.0.37>
go mod tidy
sudo make compile
flyte start --config flyte_local.yaml
then, when the new user try to test the sample files under Flytesnacks repository after PR3631:
cd /home/ubuntu
export PATH=$PATH:~/.local/bin
export FLYTECTL_CONFIG=/root/.flyte/config-sandbox.yaml
export KUBECONFIG=$KUBECONFIG:/root/.kube/config:/root/.flyte/k3s/k3s.yaml
# run an sample to test
git clone <https://github.com/flyteorg/flytesnacks>
cd flytesnacks/cookbook
pip install -r core/requirements.txt
# python3 core/flyte_basics/hello_world.py
# pyflyte run core/flyte_basics/hello_world.py my_wf
pyflyte run --remote core/flyte_basics/hello_world.py my_wf
Will get error becasue flytesnacks project does not exist:
2023/05/17 04:24:07 /root/go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/callbacks.go:134 record not found
[0.817ms] [rows:0] SELECT * FROM "projects" WHERE "projects"."identifier" = 'flytesnacks' LIMIT 1
{"json":{"src":"task_manager.go:70"},"level":"debug","msg":"Task [resource_type:TASK project:\"flytesnacks\" domain:\"development\" name:\"core.flyte_basics.hello_world.say_hello\" version:\"0Dj6bF9kDPwqJuKCrijznA==\" ] failed validation with err:
So, this PR add back the flytesnacks
if SeedProjects
is not set.
Check all the applicable boxes
☐ I updated the documentation accordingly.
☐ All new and existing tests passed.
☐ All commits are signed-off.
Screenshots
Checks: with out setting `SeedProjects`:
image▾
image▾
GitHub
05/17/2023, 4:48 AMfromLink
to router state
to use existing logic to match other workflow tables.
Follow-up issue
NA
flyteorg/flyteconsole
✅ All checks have passed
2/2 successful checksGitHub
05/17/2023, 9:20 AMTypeError: MyException.__init__() missing 1 required positional argument: 'foo'
from flytekit import task
class MyException(Exception):
def __init__(self, message, foo):
pass
@task
def fail_task():
raise MyException("a", "b")
if __name__ == "__main__":
fail_task()
instead of correctly propagating the Exception
Expected behavior
The Exception should be propagated correctly
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/17/2023, 12:02 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flyteadmin/commit/062641f9772d81d83b4c9ffa766e5a656b5c54bc|062641f9>
- Remove single task execution default timeout (#564)
flyteorg/flyteadminGitHub
05/17/2023, 12:46 PMGitHub
05/17/2023, 3:09 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by ursucarina
<https://github.com/flyteorg/flyteconsole/commit/3e8f517bde0dd3a5554d42022167e05febc67f54|3e8f517b>
- fix: task recent runs should filter by version (#759)
flyteorg/flyteconsoleGitHub
05/17/2023, 3:18 PM<https://github.com/flyteorg/flyte/tree/master|master>
by davidmirror-ops
<https://github.com/flyteorg/flyte/commit/7a8f2f5607dbe4707ff2b514b0665bf547dae921|7a8f2f56>
- Add proposal for community groups (#3619)
flyteorg/flyteGitHub
05/17/2023, 3:29 PMGitHub
05/17/2023, 3:57 PM.with_overrides(...)
of course, so this is overriding the platform resources with an empty set.
I think we can fix this by updating the defaultResources here to be defaultContainerSpec.Resources
like is now done in the container helper code. Then we update this line to use ResourceCustomizationModeMergeExistingResources which will merge the container resources with overrides and then apply the defaults if none are set.
Expected behavior
Resource requests and limits should be correctly applied.
Additional context to reproduce
The following Dask task:
@task(
task_config=Dask(
workers=WorkerGroup(
number_of_workers=10,
),
),
cache_version="1",
cache=True,
)
def hello_dask_2(size: int) -> float:
# Dask will implicitly create a Client in the background by calling Client(). When executing
# remotely, this Client() will use the deployed ``dask`` cluster.
array = da.random.random(size)
return float(array.mean().compute())
results in empty requests and resources.
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyte