HaoboGu
05/21/2022, 9:47 AM[1/1] currentAttempt done. Last Error: USER::containers with unready status: [fba452891990446a7a49-n0-0]|Back-off pulling image "cosine:2ce90d3ad559b4e5b1981af8726f4d19eeedc835"
Is there any way I can use to debug what's happened?Attila Nagy
05/23/2022, 10:00 PMStefan Avesand
05/24/2022, 1:17 PMJonathan Lamiel
05/24/2022, 9:27 PMingress:
separateGrpcIngress: true
annotations:
<http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: nginx
It create 2 ingresses in my cluster and I can connect to the console in HTTP. I can also connect to admin with a port-forward… But I can’t connect to admin using flytectl using the same url as for the console.
I really use:
flytectl config init --host='xxxxx' --insecure
But I got:
Error: rpc error: code = Unavailable desc = connection closed
I’m sure it’s something stupid but I can’t figure out whatRuksana Kabealo
05/25/2022, 2:09 AMMĂĽcahit
05/25/2022, 2:28 PMcluster_resource_manager.templates
helm value contains all the default k8s resources for a namespace(including namespace itself and ServiceAccount which we use to specify an IAM role)
2- We can specify a default IAM role for the ServiceAccounts.
3- User creates a project using flytectl create project
4- flyteadmin creates namespace, serviceaccount and all other resoruces under cluster_resource_manaager.templates
helm value.
Our projects need different default IAM roles and we want our users to not have to go to the created namespace and create a ServiceAccount then specify it for each execution.
So, is there any way to create projects with different default IAM roles/ServiceAccounts?Stephen Fromm
05/25/2022, 3:11 PMShihgian Lee
05/26/2022, 6:57 PMRupsha Chaudhuri
05/27/2022, 8:32 PMJake Yoon
05/30/2022, 7:07 AM_bash.blastx
, but it displays below logs.
Can somebody let me know what's going on here? It's status stays unknown
forever.
c":"handlers.go:237"},"level":"debug","msg":"Running authentication gRPC interceptor","ts":"2022-05-30T16:01:59+09:00"}
{"json":{"src":"handlers.go:193"},"level":"debug","msg":"gRPC server info in logging interceptor [1a329842-ff6f-4268-8349-edd25113438c]method [/flyteidl.service.AdminService/CreateExecution]\n","ts":"2022-05-30T16:01:59+09:00"}
{"json":{"src":"execution_manager.go:808"},"level":"debug","msg":"Launching single task execution with [resource_type:TASK project:\"flytesnacks\" domain:\"development\" name:\"_bash.blastx\" version:\"v0.3.82\" ]","ts":"2022-05-30T16:01:59+09:00"}
{"json":{"exec_id":"a4jj4dgl6lk24nbn75vt","src":"execution_manager.go:381"},"level":"warning","msg":"Failed to fetch override values when assigning task resource default values for [resource_type:WORKFLOW project:\"flytesnacks\" domain:\"development\" name:\".flytegen._bash.blastx\" version:\"v0.3.82\" ]: Resource [{Project:flytesnacks Domain:development Workflow:.flytegen._bash.blastx LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2022-05-30T16:01:59+09:00"}
{"json":{"exec_id":"a4jj4dgl6lk24nbn75vt","src":"execution_manager.go:385"},"level":"debug","msg":"Assigning task requested resources for [resource_type:WORKFLOW project:\"flytesnacks\" domain:\"development\" name:\".flytegen._bash.blastx\" version:\"v0.3.82\" ]","ts":"2022-05-30T16:01:59+09:00"}
{"json":{"src":"queues.go:43"},"level":"debug","msg":"refreshing execution queues","ts":"2022-05-30T16:02:33+09:00"}
{"json":{"exec_id":"a4jj4dgl6lk24nbn75vt","src":"queues.go:73"},"level":"warning","msg":"Failed to fetch override values when assigning execution queue for [{ResourceType:WORKFLOW Project:flytesnacks Domain:development Name:.flytegen._bash.blastx Version:v0.3.82 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}] with err: Resource [{Project:flytesnacks Domain:development Workflow:.flytegen._bash.blastx LaunchPlan: ResourceType:EXECUTION_QUEUE}] not found","ts":"2022-05-30T16:02:33+09:00"}
{"json":{"exec_id":"a4jj4dgl6lk24nbn75vt","src":"queues.go:109"},"level":"info","msg":"found no matching queue for [{ResourceType:WORKFLOW Project:flytesnacks Domain:development Name:.flytegen._bash.blastx Version:v0.3.82 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]","ts":"2022-05-30T16:02:33+09:00"}
{"json":{"exec_id":"a4jj4dgl6lk24nbn75vt","src":"execution_manager.go:529"},"level":"info","msg":"getting the workflow execution config from application configuration","ts":"2022-05-30T16:02:33+09:00"}
Thanks in advance.
P.S. I managed to launch the flyte on on premise k8s cluster
using keycloak as auth backend, if it is relevant to anything.Rupsha Chaudhuri
06/01/2022, 5:01 PMRoberto Ruiz
06/01/2022, 7:58 PMLimits:
cpu: 1
memory: 200Mi
Requests:
cpu: 1
memory: 200Mi
How can I increase this? My helm yaml already has
task_resource_defaults:
task_resources:
defaults:
cpu: 1000m
memory: 1000Mi
storage: 1000Mi
limits:
storage: 2000Mi
Rupsha Chaudhuri
06/02/2022, 6:03 PMKatrina P
06/02/2022, 9:23 PMBack-off pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.0.3|ghcr.io/flyteorg/flytekit:py3.9-1.0.3>
as it seems that image tag doesn't exist. Is anyone else having this issue with running that getting started page?Rupsha Chaudhuri
06/03/2022, 5:01 PMflytectl config init --host={SOME_HOSTNAME}:30080 --storage
Error: unknown flag: --storage
ERRO[0000] unknown flag: --storage
Aleksei Potov
06/03/2022, 10:35 PMflytectl sandbox start
) on one machine, and I'm trying to run an example wf from different machine (I've updated endpoint
in ~/.flyte/config.yaml
). pyflyte run --remote example.py:wf --n 500 --mean 42 --sigma 21
succeds and I get a link to the console. However when inspecting the execution I see this error:
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f31016626d79e40b2a4c-n0-0] terminated with exit code (1). Reason [Error]. Message:
_execute(cmd=cmd, s3_cfg=self.s3_cfg)
File "/usr/local/lib/python3.8/site-packages/flytekit/extras/persistence/s3_awscli.py", line 51, in _update_cmd_config_and_execute
return subprocess.check_call(anonymous_cmd, env=env)
File "/usr/local/lib/python3.8/site-packages/flytekit/tools/subprocess.py", line 26, in check_call
raise Exception(
Exception: Called process exited with error code: 1. Stderr dump:
b'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\n'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 495, in fast_execute_task_cmd
_download_distribution(additional_distribution, dest_dir)
File "/usr/local/lib/python3.8/site-packages/flytekit/tools/fast_registration.py", line 102, in download_distribution
file_access.get_data(additional_distribution, destination)
File "/usr/local/lib/python3.8/site-packages/flytekit/core/data_persistence.py", line 427, in get_data
raise FlyteAssertion(
flytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://my-s3-bucket/36/flytesnacks/development/VGJSBXLCAE3JIKQYKY2O3N5IMQ======/scriptmode.tar.gz> to . (recursive=False).
Original exception: Called process exited with error code: 1. Stderr dump:
b'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\n'
.
Jonathan Lamiel
06/05/2022, 2:42 PM[1/1] currentAttempt done. Last Error: USER::task execution timeout [5m0s] expired
error. The reason is that the pod was trying to mount a secret volume that didn’t exist (there was a typo in the secret name)
• The problem is that the deployment on K8s of the those tasks was still available after those 5 minutes, and that for hours, keeping the 3Go for themselves. Making the other tasks waiting.
• Ultimately, after that deployments was “removed” the other tasks were picked up
I would expect Flyte to terminate the deployments right after the first error no? Freeing the ressource usage for other task?
Any clue?Abhinav Ayalur
06/06/2022, 7:07 PMArshak Ulubabyan
06/09/2022, 10:17 AMpyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2"
command, I’m getting an error:
Traceback (most recent call last):
File "/opt/anaconda3/envs/flytekit_test/bin/pyflyte", line 8, in <module>
sys.exit(main())
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/flytekit/clis/sdk_in_container/run.py", line 473, in _run
wf = remote.register_script(
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/flytekit/remote/remote.py", line 536, in register_script
upload_location, md5_bytes = fast_register_single_script(
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/flytekit/tools/script_mode.py", line 117, in fast_register_single_script
upload_location = create_upload_location_fn(content_md5=md5)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
return super(SynchronousFlyteClient, self).create_upload_location(
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/flytekit/clients/raw.py", line 40, in handler
return fn(*args, **kwargs)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/flytekit/clients/raw.py", line 834, in create_upload_location
return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/opt/anaconda3/envs/flytekit_test/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1654765336.255436000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3218,"referenced_errors":[{"created":"@1654765336.255435000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":165,"grpc_status":14}]}"
Rupsha Chaudhuri
06/09/2022, 5:38 PMRuksana Kabealo
06/09/2022, 7:32 PMRupsha Chaudhuri
06/09/2022, 8:20 PM@dynamic
task that calls a @task
task that's expected to return a dict. Instead I'm getting a flytekit.core.promise.Promise
object back. I also don't see any of the logs from that task.. nor do I see the called task executing on the console. What's weird is that the same called task works as expected when invoked from other tasks in the same workflow.
Can someone help me understand why a task would not execute and return a Promise instead?Neal Feierabend
06/09/2022, 10:28 PMEvan Sadler
06/13/2022, 3:59 PMRupsha Chaudhuri
06/13/2022, 6:03 PMSérgio de Melo Barreto Junior
06/13/2022, 6:03 PMTao He
06/14/2022, 9:29 AMTao He
06/14/2022, 9:31 AMTao He
06/14/2022, 9:31 AMYaroslav
06/14/2022, 10:26 AMpyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
but it was failing on my local environment https://flyte-org.slack.com/files/U037VET9014/F03KJK10Y10/untitled.txt
Flytectl was up & running itself. @Yuvraj suggested running flytectl config init
and it helped.
I wonder, is this smth. that needs to be added on a getting_started page?