Marcus Isnard
07/15/2022, 6:17 PMflytekit.exceptions.user.FlyteAssertion: Failed to put data from /tmp/tmpby0anszu/script_mode.tar.gz to <http://localhost:30084/my-s3-bucket/ff/flytesnacks/development/MOAHDD5B6MWAWQCHZQVZWQX6UU%3D%3D%3D%3D%3D%3D/scriptmode.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20220715%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220715T180620Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-md5%3Bhost&X-Amz-Signature=d807a4f73adf126a579aa7dcbe1800e8654559f37c8728500b61d20b067d804e> (recursive=False).
Original exception: HTTPConnectionPool(host='localhost', port=30084): Max retries exceeded with url: /my-s3-bucket/ff/flytesnacks/development/MOAHDD5B6MWAWQCHZQVZWQX6UU%3D%3D%3D%3D%3D%3D/scriptmode.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20220715%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220715T180620Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-md5%3Bhost&X-Amz-Signature=d807a4f73adf126a579aa7dcbe1800e8654559f37c8728500b61d20b067d804e (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f440c5cd910>: Failed to establish a new connection: [Errno 111] Connection refused'))
Taeef Najib
07/17/2022, 10:57 PMflytectl upgrade
or flytectl version
my terminal returns this error:
'flytectl' is not recognized as an internal or external command
That’s where I’m stuck. Do you have any suggestions? 😞
BTW, I used:
curl -sL <https://ctl.flyte.org/install> | bash
and got this:
flyteorg/flytectl info checking GitHub for latest tag
flyteorg/flytectl info found version: 0.6.4 for v0.6.4/Linux/x86_64
flyteorg/flytectl info installed ./bin/flytectl
My concern is whether flytectl can be used on Windows or not! Because someone else tried to reproduce this error using a Windows machine and received the same error. Does any Windows user could use flytectl?karthikraj
07/20/2022, 4:02 AMTaeef Najib
07/23/2022, 4:55 PMpyflyte run --remote model.py:diabetes_xgboost_model
The first task (split_traintest_dataset) fails and I get this error:
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[ffeb1a0e5b4f44d81a8c-n0-0] terminated with exit code (1). Reason [Error]. Message:
cked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/root/model.py", line 14, in <module>
import joblib
ModuleNotFoundError: No module named 'joblib'
Traceback (most recent call last):
File "/usr/local/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/flytekit/bin/entrypoint.py", line 506, in fast_execute_task_cmd
subprocess.run(cmd, check=True)
File "/usr/local/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-ffeb1a0e5b4f44d81a8c/n0/data/inputs.pb>', '--output-prefix', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-ffeb1a0e5b4f44d81a8c/n0/data/0>', '--raw-output-data-prefix', '<s3://my-s3-bucket/kh/ffeb1a0e5b4f44d81a8c-n0-0>', '--checkpoint-path', '<s3://my-s3-bucket/kh/ffeb1a0e5b4f44d81a8c-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://my-s3-bucket/bc/flytesnacks/development/NBHGKLLB7JMWQJQQLZNV66UGGU======/scriptmode.tar.gz>', '--dynamic-dest-dir', '.', '--resolver', 'flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'model', 'task-name', 'split_traintest_dataset']' returned non-zero exit status 1.
.
`Any idea what's wrong? I faced a similar issue while trying the MNIST digits classification project.Rupsha Chaudhuri
07/25/2022, 7:21 AMflytectl update launchplan -p flyteexamples -d development {{ name_of_lp }} --activate
it's erroring out
launch plan version wasn't passed
Calvin Leather
07/26/2022, 7:20 PM[0]: code:"ResourceDeletedExternally" message:"resource not found, name [e2e-workflows-development/fb2xnzxy-n2-0-0]. reason: pods \"fb2xnzxy-n2-0-0\" not found"
We then checked control plan logs and they suggested the pod was being evicted due to memory pressure (137 = k8s OOM status code):
"containerStatuses": [
{
"name": "fb2xnzxy-n2-0-0",
"state": {
"terminated": {
"exitCode": 137,
....
However when we look at grafana, we see that memory used is really low, way below requests/limits... however, we found that the memory cache was quite high. We then found a k8s issue about memory cache being incorrectly counted as "used" memory by kubelet when it looks at memory pressure.
Note quite a flyte issue, more of a k8s issue, but the log was a bit mysterious and we're still figuring out resolution.Taeef Najib
07/27/2022, 3:13 AMflytekit
using:
pip install flytekit
tried to reinstall it, used conda to install it, removed it and then installed it again using pip. But when I use flytekit status
on my vscode terminal I get this error:
'flytekit' is not recognized as an internal or external command,
operable program or batch file.
I've added these PATHS to my environment variables [the image is attached]
Can anyone please give me a clue? Why is flytekit
or any of its features like pyflyte
is not recognized on my terminal (either vscode or cmd)? Any help would be really appreciated. Thanks.Ekku Jokinen
07/27/2022, 2:17 PMEduardo Apolinario (eapolinario)
07/27/2022, 6:35 PMflytectl update task-resource-attribute
and the task resource defaults defined in the helm chart values file.
Just so we can dive deeper on this, can you tell us a bit about what you tried? We're probably missing some documentation around this area.Ailin Yu
07/27/2022, 11:40 PMERROR: Permission to flyteorg/flyteplugins.git denied to niliayu.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Bruno Freitas
07/28/2022, 12:33 PMAndrew Achkar
07/28/2022, 5:54 PMSagemakerBuiltinAlgorithmsTask
and I would like to retrieve the TrainingJobName
, which the backend plugin has access to.Nada Saiyed
07/28/2022, 7:09 PMsagemaker_custom_training
plugin and when the SageMaker TrainingJob is launched it looks at a flytekit_sagemaker_runner.py
script as an ENTRYPOINT
. Where can i find this script?Rupsha Chaudhuri
07/28/2022, 7:55 PMArnaud Melin
07/29/2022, 8:36 AMflytectl
won’t work - is there any known issue installing flyte on a M1 Apple chip ? thanks a lot for your hel !Tom Szumowski
07/29/2022, 6:11 PMbasic_workflow.py
, but with the t2
task decorator changed to:
@task(container_image="python:3.7")
When I run it, I get this error:
[f902b0296c1a94ed4ade-n1-0] terminated with exit code (128). Reason [StartError]. Message:
failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "pyflyte-fast-execute": executable file not found in $PATH: unknown.
Is it possible to use any arbitrary image as a task there? Or does the image need to follow a specific build process that includes pyflyte-fast-execute
?
Thank you!Tom Szumowski
08/01/2022, 5:35 PMbasic_workflow.py
to import a module function instead of have it defined directly. Here is the workflow code:
basic_workflow.py
(modified):
import typing
from typing import Tuple
from example import example_fn
from flytekit import task, workflow
@task
def t1(a: int) -> typing.NamedTuple("OutputsBC", t1_int_output=int, c=str):
# return a + 2, "world"
return example_fn(a)
@task
def t2(a: str, b: str) -> str:
return b + a
@workflow
def module_wf(a: int, b: str) -> Tuple[int, str]:
x, y = t1(a=a)
d = t2(a=y, b=b)
return x, d
(notice from example import example_fn
)
example.py
(the imported module):
from typing import Tuple
def example_fn(a: int) -> Tuple[int, str]:
return a + 2, "world"
When I execute:
pyflyte run --remote basic_workflow.py module_wf --a 5 --b hello
I get the error:
ModuleNotFoundError: No module named 'example'
Mike Zhong
08/02/2022, 12:57 PMTom Szumowski
08/02/2022, 7:30 PMprint
statements in my tasks, but not <http://logging.info|logging.info>
.
I was wondering what's needed to set up logging for GCP (GKE).
I found this document, but wasn't sure what settings would be appropriate for GCP. FIgured I'd check here first before diving too deep.
https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/configure_logging_links.html#sphx-glr-auto-deployment-configure-logging-links-py
Thank you!Andrew Achkar
08/03/2022, 2:56 PMpyflyte package
command (following this guide basically) I am running into a strange issue where the serialized task definitions end up with a resolver path like
--resolver site-packages.flytekit.core.python_auto_container.default_task_resolver
Note the leading site-packages
. Anyone know why this might be getting added / how to resolve it?Hanno Küpers
08/08/2022, 12:40 PMAWS_ACCESS_KEY_ID
as environment variable in a task I get the following error (here, I tried it with the example workflow from the documentation, https://docs.flyte.org/en/latest/getting_started/index.html, just adding os.environ["AWS_ACCESS_KEY_ID"]="secret_value"
to a task). It seems that the environment variable is picked up when artifacts are stored on minio? Does anybody know how to resolve it? What am I missing here?
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[fc0d281b2d8144efcb5d-n0-0] terminated with exit code (1). Reason [Error]. Message:
cess exited with error code: 1. Stderr dump:
b'upload failed: ../tmp/flyte-0ws144nx/sandbox/local_flytekit/engine_dir/error.pb to <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-fc0d281b2d8144efcb5d/n0/data/0/error.pb> An error occurred (AccessDenied) when calling the PutObject operation: Access Denied.\n'
Traceback (most recent call last):
File "/usr/local/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/flytekit/bin/entrypoint.py", line 507, in fast_execute_task_cmd
subprocess.run(cmd, check=True)
File "/usr/local/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-fc0d281b2d8144efcb5d/n0/data/inputs.pb>', '--output-prefix', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-fc0d281b2d8144efcb5d/n0/data/0>', '--raw-output-data-prefix', '<s3://my-s3-bucket/test/hx/fc0d281b2d8144efcb5d-n0-0>', '--checkpoint-path', '<s3://my-s3-bucket/test/hx/fc0d281b2d8144efcb5d-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://my-s3-bucket/b6/flytesnacks/development/A7HCCVU2345H3DD7M6S5QIAJ2U======/scriptmode.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'example_workflow', 'task-name', 'generate_normal_df']' returned non-zero exit status 1.
.
Mike Zhong
08/09/2022, 6:16 PMefs
volume in my eks
cluster as a shared mount for ReadWriteMany
. I have looked at this example and see the V1Volume
being passed into the pod spec. I’m looking through these docs and see that you can create a V1Volume
from a persistent_volume_claim
. So in theory, I can just create a PV and PVC in my cluster, include them in my pod spec, and attach the spec to a flyte task. But I noticed that PVCs are namespace specific and flyte uses the project-domain
namespace for tasks/workflows that are executing. Two questions.
1. Are PVCs the right solution here and if so, how can I dynamically create PVCs for my project-domain
s? Is this something flyte could be configured to do for us or would we be responsible for ensuring any referenced PVCs and PVs exist.
2. What other options are available for mounting shared persistent volumes to my flyte tasks?Chris Antenesse
08/11/2022, 4:39 PMflytectl
config set and be able to interact with the cluster via the command line with a minimum configuration before doing those two things.
my minimal config looks like this
admin:
endpoint: dns:///admin.flyte.us3.predictap.com
insecure: true
my flytectl version
output is this
{
"App": "flytectl",
"Build": "62b86f6",
"Version": "0.6.7",
"BuildTime": "2022-08-11 11:33:31.781304 -0500 CDT m=+0.021446651"
}%
i think my config is good enough, i ran flytectl config validate
chrisantenesse@Chriss-MacBook-Pro-2 ~ % flytectl config validate
Couldn't find a config file.
Validated config file successfully.
but when i do something like flytectl get projects
, i’m getting this
{"json":{},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unavailable desc = connection closed","ts":"2022-08-11T11:35:12-05:00"}
{"json":{},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-08-11T11:35:12-05:00"}
{"json":{},"level":"info","msg":"Initialized Admin client","ts":"2022-08-11T11:35:12-05:00"}
Error: rpc error: code = Unavailable desc = connection closed
{"json":{},"level":"error","msg":"rpc error: code = Unavailable desc = connection closed","ts":"2022-08-11T11:35:12-05:00"}
i jumped on both flyteadmin pods shell and was able to confirm that traffic coming through the ingress was actually hitting the pods. basically ran watch -n1 netstat -anp
and watched a connection get established. i also ran tcpdump
locally and watched my local machine make the outbound request, etc. so i dont think this is due to a misconfiguration in the k8s side of things.
i ran a kubectl logs …
on both pods, but never saw entries from the flyteadmin server indicating that something happened (good or bad).
im new to the flyte world and appreciate the help in advance!allen
08/11/2022, 6:29 PMflytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v1
to run the example flyte wf on flyte in my AWS cluster, but
root issue: Flyteadmin is having an issue registering workflows, it’s giving me: 400, request id: 4f2d416c-94bb-40dd-9972-e89e7d9cb0db and base container: s3://<my s3 bucket name>","ts":"2022-08-11T14:20:02-04:00"}
what I think is missing: I think this is an access issue. I have an S3 bucket with an IAM managed policy allowing access to all S3 operations, and then I associate the policy with the Flyte system role I created here https://docs.flyte.org/en/latest/deployment/aws/manual.html#flyte-system-role. I think that’s all the necessary setup needed; but I’m unclear how Flyteadmin actually associates the role as we don’t specify that anywhere. Does anyone know how to resolve this or any other tips?Chris Poptic
08/11/2022, 10:36 PM.py
file (e.g. download_data.py, preprocess_data.py, train_model.py, eval_model.py, etc).
Currently I have wrangled these scattered .py scripts into somewhat of a workflow using a Makefile such that each step in the pipeline can be executed through a make
command (e.g. make download_data
, make_preprocess_data
, etc).
The target of each Makefile step calls a .sh
shell script that executes the .py
file for that step.
The command make run_entire_pipeline
calls each of the ~7 steps in sequence, as a rudimentary (linear) DAG.
Obviously this rough pipeline misses a few benefits such as caching earlier steps such that they do not need to be executed if they've already been performed (e.g. no need to re-download data on a subsequent model training pipeline run if the data has already been downloaded on an earlier run of the pipeline and if there have been no changes in that data).
What is the best way to migrate this Make-based workflow into a Flyte-based workflow? Specifically is there a way to map each .py
scripts to a @task
when building a @workflow
pipeline in Flyte? I learned about the Flyte "Script mode", and it sounds somewhat akin to what I'm trying to do, but I'm totally new to Flyte. Thanks for any help and direction.
I'm working with very large digital pathology whole slide image (WSI) images, BTW. Does Flyte support inputs of the WSI variety? I.e. .mrxs
, .tiff
, .czi
, .jpeg
, .png
, etc?Eric Hsiao
08/12/2022, 9:09 PMkubectl-n flytesnacks-development create secret generic common-secrets --from-literal=TEST_SECRET=blah
kubectl -n flytesnacks-development get secret/common-secrets -o json | jq '.data | to_entries | map(.value= (.value | @base64d))'
>> [
{
"key": "TEST_SECRET",
"value": "blah"
}
]
Within the flytesnacks-development namespace, I'm running a task like this
@task(secret_requests=[Secret(group="common-secrets", key="TEST_SECRET")])
def print_secret() -> str:
secrets = current_context().secrets
return secrets.get("common-secrets", "TEST_SECRET")
However, this fails with the following
Unable to find secret for key TEST_SECRET in group common-secrets in Env Var:_FSEC_COMMON-SECRETS_TEST_SECRET and FilePath: /root/secrets/common-secrets/test_secret
From looking at the code in the the SecretManager, it looks like it only checks the ENV variable or a file path (which does not exist because I'm using k8s secrets as an additional provider). Am I missing something? I've checked that the k8s secret is in the same namespace as the task being runJames Brady
08/17/2022, 1:49 PMJames Brady
08/17/2022, 2:27 PMallen
08/17/2022, 10:38 PMsandbox
to run some basic workflows, and I’m trying to inspect the contents of a FlyteDirectory
output, which is located at <s3://my-s3-bucket/vd/f9697eef1ae7e4e76a7b-n0-0/fc26fb47001a716d23f2da1eb06baed2>"
. However, when I try: <http://localhost:30084/my-s3-bucket/vd/f9697eef1ae7e4e76a7b-n0-0/fc26fb47001a716d23f2da1eb06baed2%22>
I get access denied. How do I see my outputs! Many thanks!James Brady
08/18/2022, 8:37 AMserialize
make target in these deployment docs. Is there a Makefile we're expected to use?