er Ksy
04/11/2023, 11:49 PMer Ksy
04/11/2023, 11:55 PMSerge Rubinstein
04/12/2023, 7:43 AMFranco Bocci
04/12/2023, 8:18 AMEduardo Matus
04/12/2023, 2:09 PMflyteadmin:
replicaCount: 3
# -- IAM role for SA: <https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html>
serviceAccount:
# -- If the service account is created by you, make this false, else a new service account will be created and the iam-role-flyte will be added
# you can change the name of this role
create: true
annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::{{ .Values.userSettings.accountNumber }}:role/flyte.flyteadmin
service:
type: NodePort
resources:
limits:
ephemeral-storage: 200Mi
requests:
cpu: 50m
ephemeral-storage: 200Mi
memory: 200Mi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
topologyKey: <http://kubernetes.io/hostname|kubernetes.io/hostname>
initialProjects:
- flytesnacks
- ribs
notifications:
type: aws
region: "{{ .Values.userSettings.accountRegion }}"
publisher:
topicName: "arn:aws:sns:{{ .Values.userSettings.accountRegion }}:{{ .Values.userSettings.accountNumber }}:flyte_notification"
processor:
queueName: "flyte_notification"
accountId: "{{ .Values.userSettings.accountNumber }}"
emailer:
subject: "{{ `{{` }} domain {{ `}}` }}/{{ `{{` }} launch_plan.name {{ `}}` }} has '{{ `{{` }} phase {{ `}}` }}'"
sender: "<mailto:flyte-notification@mydomain.com|flyte-notification@mydomain.com>"
body: >
Execution {{ `{{` }} workflow.project {{ `}}` }}/{{ `{{` }} workflow.domain {{ `}}` }}/{{ `{{` }} workflow.name {{ `}}` }}/{{ `{{` }} name {{ `}}` }} has {{ `{{` }} phase {{ `}}` }}.
{{ `{{` }} error {{ `}}` }}
Something is missing?Akash
04/12/2023, 2:11 PMDaniel Xenes
04/12/2023, 3:09 PMTraceback (most recent call last):
File "/usr/local/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/flytekit/bin/entrypoint.py", line 497, in fast_execute_task_cmd
_download_distribution(additional_distribution, dest_dir)
File "/usr/local/lib/python3.9/site-packages/flytekit/tools/fast_registration.py", line 111, in download_distribution
FlyteContextManager.current_context().file_access.get_data(additional_distribution, os.path.join(destination, ""))
File "/usr/local/lib/python3.9/site-packages/flytekit/core/data_persistence.py", line 303, in get_data
raise FlyteAssertion(
flytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://my-s3-bucket/flytesnacks/development/TLGPVQIJ7UHT2DCEZO3ZBNK77U======/fast0a28e459bf705dea2398e8d5270d09d1.tar.gz> to /root/ (recursive=False).
Has anyone encountered something like this before? Seems like either the container or the workflow can’t access the tarEric Song
04/12/2023, 4:35 PMNan Qin
04/12/2023, 8:27 PMtask-module
is missing from the generated command args (see below). So executing the task leads to ValueError: Empty module name
args:[24 items
0:"pyflyte-fast-execute"
1:"--additional-distribution"
2:"<s3://protopia-stained-glass-shop-stage-01/flytesnacks/development/JAWX3GGV5URP7U>..."
3:"--dest-dir"
4:"{{ .dest_dir }}"
5:"--"
6:"pyflyte-execute"
7:"--inputs"
8:"{{.input}}"
9:"--output-prefix"
10:"{{.outputPrefix}}"
11:"--raw-output-data-prefix"
12:"{{.rawOutputDataPrefix}}"
13:"--checkpoint-path"
14:"{{.checkpointOutputPrefix}}"
15:"--prev-checkpoint"
16:"{{.prevCheckpointPrefix}}"
17:"--resolver"
18:"flytekit.core.python_auto_container.default_task_resolver"
19:"--"
20:"task-module"
21:""
22:"task-name"
23:"my_task"
]
Dominik Fleischmann
04/13/2023, 10:50 AMJay Phan
04/13/2023, 2:55 PMkubectl -n flyte port-forward service/flyteadmin 8080:81
Forwarding from 127.0.0.1:8080 -> 8089
Forwarding from [::1]:8080 -> 8089
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Handling connection for 8080
Then I try to create a new project using flytecltl create project but got this error:
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [root] updated. No update handler registered.","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"viper.go:400"},"level":"debug","msg":"Config section [admin] updated. Firing updated event.","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"client.go:63"},"level":"info","msg":"Initialized Admin client","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"auth_interceptor.go:67"},"level":"debug","msg":"Request failed due to [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]. If it's an unauthenticated error, we will attempt to establish an authenticated context.","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"auth_interceptor.go:72"},"level":"debug","msg":"Request failed due to [Unauthenticated]. Attempting to establish an authenticated connection and trying again.","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"token_source_provider.go:148"},"level":"warning","msg":"Failed fetching from cache. Will restart the flow. Error: no token found in the cache","ts":"2023-04-13T11:06:33-04:00"}
{"json":{"src":"auth_flow_orchestrator.go:77"},"level":"info","msg":"Opening the browser at <https://localhost:30081/oauth2/authorize?client_id=flytectl>\u0026redirect_uri=http%3A%2F%2Flocalhost%3A53593%2Fcallback\u0026response_type=code\u0026scope=offline+all\u0026code_challenge=0LjChZP0Ue1pBqIO70DXdLu4yDBLALSifR5TVQhlp3s\u0026code_challenge_method=S256\u0026nonce=OXpndnp0ZDloY3ZneGY3MnB6azVtenpuZmZrbjVtZnY\u0026state=anBydGNiZjV3aHdxeGt2ZmZrbWh0cTU3cmdycHJzZGo","ts":"2023-04-13T11:06:33-04:00"}
Any idea what specific auth error this is and how to fix it?Eduardo Matus
04/13/2023, 3:15 PMLaura Lin
04/13/2023, 3:55 PMsecrets need to be specified by naming them in the format <SECRET_GROUP>:<SECRET_KEY>
there's a colon there in the naming scheme but when I try to configure a secret in AWS Secret name must contain only alphanumeric characters and the characters /_+=.@-
Radhakrishna Sanka
04/13/2023, 4:40 PMer Ksy
04/13/2023, 5:24 PMer Ksy
04/13/2023, 5:27 PMGreg Gydush
04/13/2023, 9:28 PMLiam
04/14/2023, 12:57 AM@task
def incremental_update(additional_value: int):
previous_total = ???
if previous_total is None:
return additional_value
else:
return previous_total + additional value
Rahul Mehta
04/14/2023, 1:35 AMflytectl
or flyte remote)? We'd like to inject some metadata from CI into the workflow application code (in a way that labels or annotations don't immediately solve)
We could theoretically hack around this by using the downward API to mount metadata.labels
or metadata.annotations
as a volume, but being able to directly set env var(s) seems more idiomatic/breaks fewer abstractions. Curious if anyone else has had to deal with thisSeungTaeKim
04/14/2023, 6:29 AMDerek Yu
04/14/2023, 8:54 AMSamuel Bentley
04/14/2023, 12:43 PMcontainers with unready status: [f0de017c01c7140d8a4c-n0-0]|Back-off pulling image "<http://cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0|cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0>"
Any ideas?Mathias Andersen
04/14/2023, 12:45 PMpyflyte backfill -p cronjob-example -d development --from-date "2023-04-14 13:20:00" --to-date "now" --do-not-execute example_schedule v10
On this workflow interface:
def cronjob_example_workflow(kickoff_time: datetime) -> str:
and launch plan:
example_launch_plan = LaunchPlan.get_or_create(
name="example_schedule",
workflow=example_workflow,
schedule=CronSchedule(
schedule="*/5 * * * *", # Following schedule runs every min
kickoff_time_input_arg="kickoff_time", # macro magic
),
)
And I get this error:
Missing input `kickoff_time` type `simple: DATETIME
I would expect kickoff time to be passed by backfill, given the schedule and time window.
Calling @Ketan (kumare3) I think you are the master mind 😉
Full trace:
Traceback (most recent call last):
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/bin/pyflyte", line 8, in <module>
sys.exit(main())
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/clis/sdk_in_container/pyflyte.py", line 82, in invoke
raise e
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/clis/sdk_in_container/pyflyte.py", line 78, in invoke
return super().invoke(ctx)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/clis/sdk_in_container/backfill.py", line 158, in backfill
entity = remote.launch_backfill(
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/remote/remote.py", line 1781, in launch_backfill
wf, start, end = create_backfill_workflow(start_date=from_date, end_date=to_date, for_lp=lp, parallel=parallel)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/remote/backfill.py", line 82, in create_backfill_workflow
next_node = wf.add_launch_plan(for_lp, t=next_start_date)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/core/workflow.py", line 569, in add_launch_plan
return self.add_entity(launch_plan, **kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/core/workflow.py", line 496, in add_entity
n = create_node(entity=entity, **kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/core/node_creation.py", line 93, in create_node
outputs = entity(**kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/remote/remote_callable.py", line 54, in __call__
return self.compile(ctx, *args, **kwargs)
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/remote/entities.py", line 806, in compile
return create_and_link_node_from_remote(
File "/home/andersen/.local/share/virtualenvs/flyte-cronjob-example-ig5KVMXO/lib/python3.8/site-packages/flytekit/core/promise.py", line 860, in create_and_link_node_from_remote
raise _user_exceptions.FlyteAssertion("Missing input `{}` type `{}`".format(k, var.type))
flytekit.exceptions.user.FlyteAssertion: Missing input `kickoff_time` type `simple: DATETIME
Greg Gydush
04/14/2023, 4:19 PMRahul Mehta
04/14/2023, 6:22 PMvarsha Parthasarathy
04/14/2023, 11:29 PMRuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[dn0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [k8s-array]:
Operation cannot be fulfilled on pods "f22e1b0a871194088843-n3-0-dn0-0-57-1": the object has been modified; please apply your changes to the latest version and try again
Blair Anson
04/15/2023, 7:58 AMpyflyte run --remote
command fails with Handshake failed with fatal error SSL_ERROR_SSL
.
$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'
{"asctime": "2023-04-15 16:54:36,923", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:36,950", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:36,954", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:37,937", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-15 16:54:38,379", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-15 16:54:38,408", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-15 16:54:38,696", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:38,697", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
E0415 16:54:39.685038207 177107 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
E0415 16:54:40.191239374 177107 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
Failed with Exception: Reason: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
details: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8088: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
Debug string UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8088: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER {created_time:"2023-04-15T16:54:40.193233866+09:00", grpc_status:14}
I understand this error usually occurs when the .flyte/config.yaml
and env variable config is not correct. I have checked that but I must be missing something obvious.
Here is my setup...
Remote cluster is AWS EKS running in a VPC
Flyte was installed following instructions in https://docs.flyte.org/en/latest/deployment/deployment/cloud_simple.html
Local ports are proxied to these flyte services...
kubectl -n flyte port-forward service/flyte-backend-flyte-binary-grpc 8089:8089 &
kubectl -n flyte port-forward service/flyte-backend-flyte-binary-http 8088:8088 &
Env vars...
$ echo $FLYTECTL_CONFIG
/home/blair/.flyte/config.yaml
$ echo $KUBECONFIG
:/home/blair/.kube/config
.flyte/config.yaml
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:8088
authType: Pkce
insecure: false
logger:
show-source: true
level: 0
g. coleman
04/15/2023, 9:36 AMChirayu Gupta
04/15/2023, 1:29 PM[1/1] currentAttempt done. Last Error: USER::containers with unready status: [f1292d6636de043288f1-n0-0]|Back-off pulling image "<http://cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0|cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0>"
On further debugging, I came to know this due to Container is failed to pull the image -
<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>
Here are the Kubernetes event logs -
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 54s default-scheduler Successfully assigned default/py39-cacher to ac11a6c8fb0e
Normal Pulling 15s (x3 over 54s) kubelet Pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>"
Warning Failed 15s (x3 over 54s) kubelet Failed to pull image "<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>": rpc error: code = Unknown desc = failed to pull and unpack image "<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>": failed to resolve reference "<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>": failed to do request: Head "<https://ghcr.io/v2/flyteorg/flytekit/manifests/py3.9-latest>": x509: certificate signed by unknown authority
Warning Failed 15s (x3 over 54s) kubelet Error: ErrImagePull
Normal BackOff 2s (x3 over 54s) kubelet Back-off pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>"
Warning Failed 2s (x3 over 54s) kubelet Error: ImagePullBackOff
Blair Anson
04/15/2023, 5:44 PMopta apply -c env.yaml
I receive the following error. Any suggestions on how to get around this, as I can't locate where metrics-server version is set...
│ Error: could not download chart: chart "metrics-server" version "5.11.3" not found in <https://charts.bitnami.com/bitnami> repository
│
│ with module.k8sbase.helm_release.metrics_server,
│ on ../../../../../../.opta/modules/aws_k8s_base/tf_module/metrics_server.tf line 1, in resource "helm_release" "metrics_server":
│ 1: resource "helm_release" "metrics_server" {