seunggs
09/12/2022, 3:00 AMRequest failed with status code 500 failed to create workflow in propeller namespaces <project-name> not found
. The project already shows up in the flyte console and I was able to package/register the workflow, but I can’t seem to execute it?seunggs
09/12/2022, 3:02 AMPOST /api/v1/projects
) with the following payload:seunggs
09/12/2022, 3:03 AMproject: {
id,
name,
description
}
Andrew Achkar
09/12/2022, 3:54 PMpylint
on a python module contains TypeVar('tar.gz')
which is used in specifying a FlyteFile
. Details in 🧵Ailin Yu
09/14/2022, 6:22 PM<http://prometheus.io/path=/stats/prometheus|prometheus.io/path=/stats/prometheus>
<http://prometheus.io/port=15020|prometheus.io/port=15020>
<http://prometheus.io/scrape=true|prometheus.io/scrape=true>
And I’ve confirmed prometheus is scraping the pods with those annotations, I’m not seeing any Flyte provided metrics, so the dashboards are not populating. Is there some config I need to enable in flytepropeller in order for metrics to be emitted?James Evers
09/14/2022, 6:38 PM/dev/shm
. what's the best approach to bumping this default in the sandbox deployment?
/usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,
2022-09-14 18:10:43,190 WARNING services.py:1882 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=0.10gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2022-09-14 18:10:46,611 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at [1m[32m<http://127.0.0.1:8265> [39m[22m
[2022-09-14 18:10:57,864 E 1 1] <http://core_worker.cc:149|core_worker.cc:149>: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
Justin Tyberg
09/14/2022, 11:48 PMflyte
namespace
• flytectl running in foo
namespace
If I run flytectl from within the pod in foo
namespace, and hit the external endpoint (through the ingress), it works
bin/flytectl get projects \
--admin.endpoint dns:///EXTERNAL_FQDN:443 \
--admin.authType ClientSecret \
--admin.clientId flytepropeller \
--admin.clientSecretLocation /etc/secrets/client_secret
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
----- ------ -----------------
| ID | NAME | DESCRIPTION |
----- ------ -----------------
| dpp | dpp | dpp description |
----- ------ -----------------
1 rows
However, if I try to hit the internal grpc endpoint, I get nada. No output.
bin/flytectl get projects \
> --admin.endpoint dns:///flyteadmin.flyte.svc.cluster.local:81 \
> --admin.authType ClientSecret \
> --admin.clientId flytepropeller \
> --admin.insecure true \
> --admin.clientSecretLocation /etc/secrets/client_secret
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
echo $?
0
🤔KS Tarun
09/15/2022, 9:36 AMJames Evers
09/15/2022, 5:53 PMseunggs
09/16/2022, 2:35 AMseunggs
09/16/2022, 2:35 AM[4/4] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[a29fz6979nbn4dxsr7km-n0-3] terminated with exit code (1). Reason [Error]. Message:
Traceback (most recent call last):
File "/opt/venv/bin/pyflyte-execute", line 8, in <module>
sys.exit(execute_task_cmd())
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 470, in execute_task_cmd
_execute_task(
File "/opt/venv/lib/python3.8/site-packages/flytekit/exceptions/scopes.py", line 160, in system_entry_point
return wrapped(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 348, in _execute_task
_handle_annotated_task(ctx, _task_def, inputs, output_prefix)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 291, in _handle_annotated_task
_dispatch_execute(ctx, task_def, inputs, output_prefix)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 80, in _dispatch_execute
logger.debug(f"Starting _dispatch_execute for {task_def.name}")
AttributeError: 'function' object has no attribute 'name'
seunggs
09/16/2022, 4:18 AM/workflows
REST API endpoint - I see there’s an archive option in the dashboard but I don’t see a PUT endpoint for /workflows
- is this expected or am I missing something?Leong Shing Chew
09/16/2022, 2:16 PMKatrina P
09/16/2022, 4:32 PMGeorge D. Torres
09/16/2022, 5:10 PMZachary Kimble
09/16/2022, 5:22 PMflytectl sandbox start
I created a secrets file acr-secrets-flytesnacks-development.yaml
apiVersion: v1
data:
.dockerconfigjson: ***Base64 encoded json***
kind: Secret
metadata:
name: acr-pull-credentials
namespace: flyte-development
type: <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>
I applied the secret in the project-domain namespace with
kubectl -n flytesnacks-development apply -f secrets/acr-secrets-flytesnacks-development.yaml
I patched the default service account within the project-domain namespace :
kubectl -n flytesnacks-development patch serviceaccount default -p '{"imagePullSecrets": [{"name": "acr-pull-credentials"}]}'
I run my workflow with
pyflyte run --remote --image ***.<http://azurecr.io/databricks_workflow:latest|azurecr.io/databricks_workflow:latest> databricks_wf.py databricks_workflow --sql 'select...'
The pods get stuck in pending and describe shows the following issue:
Failed to pull image "***.<http://azurecr.io/databricks_workflow:latest|azurecr.io/databricks_workflow:latest>": rpc error: code = Unknown desc = Error response from daemon: Head "https://*****.<http://azurecr.io/v2/databricks_workflow/manifests/latest|azurecr.io/v2/databricks_workflow/manifests/latest>": unauthorized: authentication required, visit <https://aka.ms/acr/authorization> for more information.
Placing the decoded secret in .docker/config.json allows docker pull
to work locally. Thanks for a any suggestions!seunggs
09/16/2022, 5:33 PMflyte-executor
with the flyte-user-role
(which has full s3 access) attached as an annotation and running Flyte executions with this service account, but it’s giving me PutObject access denied error. This service account is in the project+domain namespace. What am I doing wrong?seunggs
09/16/2022, 5:37 PMName: flyte-executor
Namespace: myproject-development
Labels: <http://app.kubernetes.io/managed-by=pulumi|app.kubernetes.io/managed-by=pulumi>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxx:role/flyte-user-role
Image pull secrets: gcr-json-key
...
Alex Bain
09/16/2022, 10:00 PMpriorityClassName
(and perhaps preemptionPolicy
) on all the k8s pods for Flyte (and Spark executor) task pods that run in my EKS cluster. Could you give me a suggestion as to how to accomplish this?Sebastian
09/19/2022, 7:10 AMwf
and launch plan like
launch_plan.LaunchPlan.get_or_create(
workflow=wf,
name="my_lp",
schedule=CronSchedule("0 0 ? * * *"),
default_inputs={...},
fixed_inputs={...},
)
how do I schedule this after registering it alongside my workflows with pyflyte serialize
and flytectl register
? I think the relevant docs are supposed to be https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_update_launchplan.html but running the specified commands fails with launch plan prod failed to update due to rpc error: code = NotFound desc
. This command requires a version which I have none so it might be because of this. How should I proceed?
Some side note feedback: It feels to me like activating launch plans is more naturally done in the web UI. However, trying to open my launch plan errors with
Error: Minified React error #31; visit <https://reactjs.org/docs/error-decoder.html?invariant=31&args[]=object%20with%20keys%20%7Brole_session_name%2C%20region_name%2C%20role_arn%7D&args[]=> for the full message or use the non-minified dev environment for full errors and additional helpful warnings.
Which probably comes from me using using struct arguments (role_session_name, region_name, role_arn are in a dataclass) but it nevertheless seems like there is a bug in the launch plan rendering.KS Tarun
09/19/2022, 8:57 AM[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[a5dqkgsjw2gf59jzhpgn-n1-0] terminated with exit code (1). Reason [OOMKilled]. Message:
tar: Removing leading `/' from member names
Traceback (most recent call last):
File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 507, in fast_execute_task_cmd
subprocess.run(cmd, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-staging-a5dqkgsjw2gf59jzhpgn/n1/data/inputs.pb>', '--output-prefix', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-staging-a5dqkgsjw2gf59jzhpgn/n1/data/0>', '--raw-output-data-prefix', '<s3://my-s3-bucket/test/lh/a5dqkgsjw2gf59jzhpgn-n1-0>', '--checkpoint-path', '<s3://my-s3-bucket/test/lh/a5dqkgsjw2gf59jzhpgn-n1-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://my-s3-bucket/gp/flytesnacks/staging/NTHIUJ632K3UNCS775EH722XVM======/fast31220425e74d24c7973f254bb8ecf02f.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'get_io_f7', 'task-name', 'get_idv']' died with <Signals.SIGKILL: 9>.
Robin Eklund
09/19/2022, 11:30 AMModuleNotFoundError: No module named 'workflows'
.
These are the commands i am running:
# build docker image
docker build -f path/to/Dockerfile -t ${TAG} .
# deploy to ECR
docker push <docker image>
# package
pyflyte --pkgs workflows package -f --image <docker image>
# register
flytectl register files --project <project> --domain <domain> --archive flyte-package.tgz --version <version>
# generate launchplan
flytectl get launchplan --project <project> --domain <domain> workflows.example.my_wf --latest --execFile execution_spec.yaml
# execute workflow
flytectl create execution --project <project> --domain <domain> --execFile execution_spec.yaml
this is the execution_spec.yaml
iamRoleARN: ""
inputs: {}
kubeServiceAcct: ""
targetDomain: ""
targetProject: ""
version: v6
workflow: workflows.example.my_wf
this are the files:
.
├── docker_build_and_tag.sh
├── Dockerfile
├── execution_spec.yaml
├── flyte.config
├── flyte-package.tgz
├── Makefile
├── README.md
├── requirements.test.txt
├── requirements.txt
└── workflows
├── example.py
├── __init__.py
All these files and folders are copied into /root in the docker image.
Do anyone have some good way of debugging this? my initial thought was to start the docker container and then execute whatever Flyte is executing, but not sure what that is. Or if someone already now know what i am missing?Sebastian
09/19/2022, 12:58 PMflyte/workflows/{dev|prod}
but I can't find a way to tell flyte where to look for resources when serializing (!?) apart from the default flyte/workflows
. What is the intended way to do this please?Ketan (kumare3)
Matheus Moreno
09/19/2022, 8:33 PMreturn
statement confirm that), but it hangs at the end and never actually finishes. We were able to reproduce it in our remote server and locally. On the remote server, none of the prints (or logs) are being shown on Stackdriver. What could be happening?Robert Everson
09/19/2022, 9:03 PMMartin Hwasser
09/20/2022, 12:40 PMMike Carley
09/20/2022, 2:19 PMCalvin Leather
09/20/2022, 4:53 PMJames Evers
09/21/2022, 4:54 PMflytectl sandbox start --source .
), but when i try to build another image in the sandbox (flytectl sandbox exec -- <docker build statement
), the pod fails to pull the image even though i can see that its been successfully build via flytectl sandbox exec -- docker image ls
. anyone have any idea where i might be going wrong?