Younes El Hjouji
03/24/2023, 11:57 AMBosco Raju
03/24/2023, 4:14 PMFROM python:3.8-buster
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
ARG tag
ARG wandb_api_key
ARG wandb_username
ENV FLYTE_INTERNAL_IMAGE $tag
ENV WANDB_API_KEY $wandb_api_key
ENV WANDB_USERNAME $wandb_username
# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli
RUN apt-get update && apt-get install -y curl
ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
# Install Python dependencies
COPY requirements.txt /root/.
RUN pip install -r /root/requirements.txt
# Copy the actual code
COPY src/ /root/src/
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[ab8dm4nttv49wgvn59kg-n0-0] terminated with exit code (1). Reason [Error]. Message:
rtlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'site-packages.flytekit'
Traceback (most recent call last):
File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 513, in fast_execute_task_cmd
subprocess.run(cmd, check=True)
File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://senn-ai-mlops-flyte/metadata/propeller/flytesnacks-development-ab8dm4nttv49wgvn59kg/n0/data/inputs.pb>', '--output-prefix', '<s3://senn-ai-mlops-flyte/metadata/propeller/flytesnacks-development-ab8dm4nttv49wgvn59kg/n0/data/0>', '--raw-output-data-prefix', '<s3://senn-ai-mlops-flyte/data/sk/ab8dm4nttv49wgvn59kg-n0-0>', '--checkpoint-path', '<s3://senn-ai-mlops-flyte/data/sk/ab8dm4nttv49wgvn59kg-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://senn-ai-mlops-flyte/mo/flytesnacks/development/NRI2T5OSZMCFXKR4CUNLWO7MSM======/fastab502ef8d4ae75b6b5497a94633e8642.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'site-packages.flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'src.workflows.hello_world', 'task-name', 'say_hello']' returned non-zero exit status 1.
MUhwjR9VF3K2QJhL2VvNGoFY
03/24/2023, 8:58 PM$ docker pull <http://cr.flyte.org/flyteorg/flytescheduler-release:v1.4.3|cr.flyte.org/flyteorg/flytescheduler-release:v1.4.3>
5e5f900059c8: Download complete
0f96f76f982d: Download complete
88dae1745799: Downloading [> ] 0B/2.343kB
failed to copy: httpReadSeeker: failed open: failed to do request: Get "<https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:88dae1745799f4078f859b8ada55bc1f1757d10d9c97cdeb62f88cfc827c3a1f?se=2023-03-24T21%3A05%3A00Z&sig=rMhv3oYchaYycF260vI0pdx6pENdYsKpph9eXak%2BMPw%3D&sp=r&spr=https&sr=b&sv=2019-12-12>": x509: certificate is valid for *.<http://githubassets.com|githubassets.com>, <http://githubassets.com|githubassets.com>, not <http://pkg-containers.githubusercontent.com|pkg-containers.githubusercontent.com>
seunggs
03/24/2023, 11:36 PMTaeef Najib
03/25/2023, 3:37 AMcsv
file. But I got this error:
Message:
[Errno 2] No such file or directory: 'data.csv'
User error.
What would be the workaround?mykyta luzan
03/27/2023, 8:17 AMraise FlyteAssertion(
flytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://blabs-flyte-mgmt-metadata/dispute-resolution/development/BP7IZUPWYESUYFGRO52LE7HBDU======/scriptmode.tar.gz> to /root/ (recursive=False).
Original exception: Access Denied
We don’t have any auth yet for flyte. I assume I just haven’t specified s3 bucket endpoint, access_key_id, and secret_access_key. Where should I do this? In my .flyte/config.yaml
? Or on k8s/values.yaml
side via devops support? If so what is the template?
storage:
access-key: <SECRET>
auth-type: iam
disable-ssl: false
endpoint: <s3://blabs-flyte-mgmt-metadata>
region: eu-central-1
secret-key: <SECRET>
I don’t understand how to use this page https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.configuration.S3Config.htmlSrinivas Venkattaramanujam
03/27/2023, 4:16 PMjustin hallquist
03/27/2023, 4:19 PMError: rpc error: code = InvalidArgument desc = invalid name format: 370eb303160612
David Espejo (he/him)
03/27/2023, 5:35 PMvarsha Parthasarathy
03/27/2023, 8:09 PMFrank Shen
03/27/2023, 8:32 PMEna Škopelja
03/28/2023, 12:05 PMStructuredDataset
that's failing if I turn on cache_serialize
with this error:
[3/3] currentAttempt done. Last Error: SYSTEM::Traceback (most recent call last):
File "/opt/venv/lib/python3.9/site-packages/flytekit/exceptions/scopes.py", line 165, in system_entry_point
return wrapped(*args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/flytekit/core/base_task.py", line 572, in dispatch_execute
raise TypeError(
Failed to convert return value for var o0 for function {my task} with error <class 'pyarrow.lib.ArrowInvalid'>: ("Could not convert ('... ... (580 characters truncated) ... ...',) with type Row: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column sequence with type object')
The sequence column of the dataframe I return is a string (no non-standard characters).
Any idea on what could be causing this? Regular (no cache_serialize
) cache works just fine.Prashant Jain
03/28/2023, 12:18 PMAndrew Korzhuev
03/28/2023, 2:16 PMflyteidl 1.3.12 depends on protobuf<5.0.0 and >=4.21.1
sagemaker 2.133.0 depends on protobuf<4.0 and >=3.1
Since 1.3 FlyteKit has version conflict with Sagemaker API and Tensorflow as well by deprecating protobuf 3 making it impossible to use those libraries togetherHIMANSHU JOSHI
03/28/2023, 5:03 PMSamuel Bentley
03/28/2023, 7:24 PME0328 20:15:47.445709000 140704578418304 <http://ssl_transport_security.cc:1495]|ssl_transport_security.cc:1495]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
I'm using Python 3.9, grpcio & grpc-status 1.48.2, pyOpenSSL 23.0.0 and against a local flyte-sandbox (latest version)
Can anyone help? I'm trying to get this working as a proof-of-concept for wider use in my compnay and need to demonstrate that this worksCraig Amundsen
03/28/2023, 10:38 PMGreg Gydush
03/29/2023, 12:01 AMHIMANSHU JOSHI
03/29/2023, 12:00 PMpanic: Failed to sync cluster resources : no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "Role" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>", no matches for kind "RoleBinding" in version "<http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>"
goroutine 1 [running]:
main.main()
/go/src/github.com/flyteorg/flyteadmin/cmd/main.go:13 +0x91
Victor Gustavo da Silva Oliveira
03/29/2023, 12:15 PM'ReferenceTask' object has no attribute '_task_function'
... Can someone help me? I would be very appreciatedAgath Emmanuel
03/29/2023, 1:18 PMSrinivas Venkattaramanujam
03/29/2023, 4:28 PMDavid Muraco
03/29/2023, 4:55 PMSrinivas Venkattaramanujam
03/29/2023, 4:56 PMvarsha Parthasarathy
03/29/2023, 5:02 PMArthur Lindoulsi
03/30/2023, 8:39 AMif training_args.force_a100_gpus:
return train(input_args).with_overrides(
pod_template=PodTemplate(
pod_spec=V1PodSpec(affinity=V1Affinity(
node_affinity=V1NodeAffinity(
required_during_scheduling_ignored_during_execution=V1NodeSelector(
node_selector_terms=[
V1NodeSelectorTerm(match_expressions=[
V1NodeSelectorRequirement(
key="<http://cloud.google.com/gke-accelerator|cloud.google.com/gke-accelerator>", operator="In",
values=["nvidia-tesla-a100"])])]
))),
restart_policy='never',
containers=[V1Container(name='primary',
image='{{.image.imagename.fqn}}:{{.image.imagename.version}}',
resources=V1ResourceRequirements(limits={"<http://nvidia.com/gpu|nvidia.com/gpu>": '1'},
requests={"memory": "...",
"cpu": "..."})
)])
),
container_image=None,
requests=None,
)
Container name for the train task was "<executionID>-<workflowID>-0-dn2-0"Andrés Gómez Ferrer
03/30/2023, 1:42 PMContainerError
inside a task using flytekit python?
https://docs.flyte.org/projects/flyteidl/en/latest/protos/docs/core/core.html#ref-flyteidl-core-containererror
For example, if I want to raise/return NON_RECOVERABLE
or RECOVERABLE
depends of the task logicDavid Espejo (he/him)
03/30/2023, 3:37 PMNan Qin
03/30/2023, 4:42 PMVisak
03/30/2023, 6:31 PMTooLarge: Event message exceeds maximum gRPC size limit, caused by [rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6020936 vs. 4194304)].
How can I address this? where can I increase this size limit and how large is the recommended size be?