Hey guys , how do i add new packages to the project in flyte if it is running in aws deployment, for example I have created a project and ran a workflow using pyflyte command, it ran properly, now I want to add a new library lets say opencv to the task and workflow , where do I install the opencv library ? I tried using pip and installed it but when i ran the workflow it says package not found, then I figured out the project is running inside a docker image, now how does a user add the library to the image or is there any other way to fix this problem?
Or you can also write a Dockerfile, build an image and send it to the relevant pyflyte command.
when I use image spec and run pyflyte run --remote --image imagespec.yaml new.py standard_scale_workflow --values '[1.0, 2.0, 3.0, 4.0, 5.0]' I get the below error Failed with Unknown Exception <class 'Exception'> Reason: Builder envd is not registered. Builder envd is not registered. can you please let me know how to resolve it
You need to pip install
yes installed it, but the error still continues .
You shouldn't be seeing that error if you install the plugin. You can also specify imagespec in the Python file itself. Can you try that way?
cv2_image_spec = ImageSpec( base_image = "cr.flyte.org/flyteorg/flytekit:py3.10-1.9.0", packages=["opencv-python"], env={"Debug": "True"} ) if cv2_image_spec.is_container(): import cv2 @task def mean(values: List[float]) -> float: print(cv2.version) return sum(values) / len(values) this is my code and I am running the below command pyflyte run --remote new.py standard_scale_workflow --values '[1.0, 2.0, 3.0, 4.0, 5.0]' and I get the same error given below Failed with Unknown Exception <class 'ModuleNotFoundError'> Reason: No module named 'cv2' No module named 'cv2' Please help me out
or can you tell me in the imagespec.yaml what exactly needs to be added # imageSpec.yaml python_version: 3.10 registry: pingsutw packages: - sklearn - opencv-python env: Debug: "True" what do i put in place of registry should I leave it the same?
No module named 'cv2'
, can you install opencv in you local environment?
If I install opencv in local env I am not able to find the module when I do pyflyte run --remote .
I tried creating a new conda env and installed flytekitplugins-envd and then when i run pyflyte run --remote --image imagespec.yaml new.py standard_scale_workflow --values '[1.0, 2.0, 3.0, 4.0, 5.0]' It is building a docker image but I get this error Failed with Unknown Exception <class 'Exception'> Reason: failed to run command envd build --path /tmp/flyte-s5m5ghk_/sandbox/local_flytekit/e3c3143d14bce47e61050188b2791fa7 --platform linux/amd64 --output type=image,name=pingsutw/flytekit:t93nYMZ9tvO68GDt0g1xRg..,push=true with error b'time="2023-09-05T092544Z" level=fatal msg="failed to create the builder: failed to create buildkit client: failed to bootstrap the buildkitd: failed to create container: Error response from daemon: invalid mount config for type \\"bind\\": bind source path does not exist: /home/ngupta/.config/envd"\n' failed to run command envd build --path /tmp/flyte-s5m5ghk_/sandbox/local_flytekit/e3c3143d14bce47e61050188b2791fa7 --platform linux/amd64 --output type=image,name=pingsutw/flytekit:t93nYMZ9tvO68GDt0g1xRg..,push=true with error b'time="2023-09-05T092544Z" level=fatal msg="failed to create the builder: failed to create buildkit client: failed to bootstrap the buildkitd: failed to create container: Error response from daemon: invalid mount config for type \\"bind\\": bind source path does not exist: /home/ngupta/.config/envd"\n' I guess if you help me out with this then my work will be done
This looks more like a Docker-related issue to me.
can you provide me any links to tutorials where I can run actual ml pipelines like downloading the data , preprocessing, training and evaluating, I would like to know about the data types for tf models , torch models etc in the tasks return values.
Any tutorial link I can get?
I just shared the link above.
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes. [f6d65c22f831647dca1e-n0-0] terminated with exit code (247). Reason [OOMKilled]. Message: tar: Removing leading `/' from member names do you have any idea why I get this message while running a task , the task is to download mnist data? the task is failing by the way.
I know it is because of resource limit but I have 900GB of memory but how do I set the memory for a project ?
[1/1] currentAttempt done. Last Error: UNKNOWN::Outputs not generated by task execution and the ouputs I am getting the above error don't know why the output is a tuple of numpy arrays
Oh. Can you share the code?
never mind I fixed it, thanks ,so is there any way I can mount a pvc to a task, as I have a task which is creating a model and I want to save it so that I can access it in another task, I am on aws and running the flyte inside it/
if you could let me know how pvc can be attached to the flyte tasks , it would really be helpful
You can use pod template or the pod plugin.
Hey Samitha, may I know how to pass a private image when running a pyflyte run command , below is the command I am running right now which as a public image pyflyte run --remote --project flytetester --domain development --image dkubex123/my_flyte_image:latest mnist.py mnist_workflow I want to know in case dkubex123/my_flyte_image:latest was private then in that case how do i run the command? how do i pass the registry password and username or any other option
Once I create a secret where do I configure it , I mean any specific pod
You will need to configure the secrets in the backend as mentioned in the guide. Aren't you able to?
in the guide it is not mentioned where to configure,can you guide please
https://docs.flyte.org/projects/cookbook/en/latest/auto_examples/development_lifecycle/private_images.html#configure-imagepullsecrets You will need to add imagepullsecrets to the default or a custom service account and use the same while triggering an execution. Or you can add them to the pod template. Here's an example: https://flyte-org.slack.com/archives/CP2HDHKE1/p1687941857393889 (this doesn't have imagepullsecrets though)
db-pass Opaque 1 12d flyte-admin-secrets Opaque 4 12d flyte-pod-webhook Opaque 3 12d flyte-secret-auth Opaque 1 12d sh.helm.release.v1.flyte.v1 helm.sh/release.v1 1 12d so these are the secrets in flyte namespace do you think adding the docker hub user and secret in anyone get the work done?
What secrets do you have in the flytesnacks namespace?
I figured it out I have a doubt when i give cloud watch logs ins aws and when the task runs successfully and when i click on the logs option in the flyte ui I get redirected to aws cloud watch console but I get the below error • There was an error getting log events. • The specified log stream does not exist. I tried in multiple setups and the issue still persists please help
Did you double check the template URI?
what do you want me to check exactly? It is asking for me to prompt the loggroup name userSettings: accountNumber: accountRegion: dbPassword: rdsHost: bucketName: logGroup: in the logGroup name i give the aws cloud watch name task_logs: plugins: logs: kubernetes-enabled: false # -- One option is to enable cloudwatch logging for EKS, update the region and log group accordingly # You can even disable this cloudwatch-enabled: true # -- region where logs are hosted cloudwatch-region: "{{ .Values.userSettings.accountRegion }}" # -- cloudwatch log-group cloudwatch-log-group: "{{ .Values.userSettings.logGroup }}"
Samitha, is there anyway I can change the task limits in the configmap dynamically from the task task_resource_defaults.yaml: | task_resources: defaults: cpu: 1000m memory: 15Gi storage: 15Gi limits: cpu: 2 gpu: 1 memory: 1Gi storage: 20Gi I want to set memory over here inside the configmap dynamically
I don't think that's possible. But you can set limits at the task level.
Hi, how do i specify a package from an image to a workflow, I have created a file and I am using it for tasks but how do i do it for workflows?
Workflow is a DSL. What do you want to do exactly?
Hey Samitha, @Samhita Alla any idea why I am getting below error when i run pyflyte inside an aws cluster RPC Failed, with Status: StatusCode.INTERNAL details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity status code: 403, request id: 2c17015d-85be-4570-8e30-a8be2b75be3f Debug string UNKNOWN:Error received from peer ipv410.100.178.20881 {created_time:"2023-09-27T072019.126215328+00:00", grpc_status:13, grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 2c17015d-85be-4570-8e30-a8be2b75be3f"} ?
Looks like an AWS permissions issue
do you think it can be an issue with s3 bucket?
Yeah, I believe so.
It'll have to do with the roles you assigned.
@Samhita Alla, How do we actually view the output of workflow, I mean inside the project and inside the workflow I am able to see the tasks and it's output , but in workflow I am returning some values how do I see that from console? @workflow def optimize_model(): best_accuracy = optimize_hyp() best_params = {"n_estimators": 10, "max_depth": 5, "min_samples_split": 0.2} # Set to 0 as it's not used in this example final_model_accuracy = train_model(n_estimators=best_params["n_estimators"], max_depth=best_params["max_depth"], min_samples_split=best_params["min_samples_split"]) return best_params, best_accuracy, final_model_accuracy how do i see the return parameters in flyte console?
You can view the outputs of a workflow in the "View inputs and outputs" link present in the navigation bar.
I get UNKNOW status for the tasks , and I am unable to debug what is the issue, can you please let me know when and why does tasks go into unknown status ? @Samhita Alla
Have you checked the propeller and admin logs?
yes when i enable ray in plugins it is happening
Are head and worker nodes spinning up?
no they are not but I am using normal ray instead of flyte ray plugin, anyhow I wanted to know how do i make sure a task inside workflow goes into execution only after another task in executed, because I want the output of one task to go into another ? Anyway I can do that?
I fixed it, thanks, but how do I convert Promise(node:n1.o0) to str I mean a task is returning a string and I can see the same in console , but when I call it in workflow it is returning Promise(node:n1.o0). Below is the code can you help me fix it @workflow def optimize_model() -> Tuple[float, str, List[int]]: best_accuracy, best_params = optimize_hyp() # Train the best model with best hyperparameters run_id = train_best_model(best_params=best_params) print(run_id) # Specify the MLflow model URI model_uri = f"runs:/a4519e1f7e00433886646a2bfb51600f/best_random_forest_model" # Sample data for inference (for the Iris dataset) data = [[5.1, 3.5, 1.4, 0.2]] # Perform inference using the ray_inference task inference_results = ray_inference(model_uri=model_uri, data=data) This is the workflow and run_id is what giving me the promise instead of str below is the task code for run_id return best_accuracy, run_id, inference_results @task(requests=Resources(cpu="2", mem="1Gi")) def train_best_model(best_params: Dict[str, Any]) -> str: import mlflow import mlflow.sklearn from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier import numpy as np iris = load_iris() X, y = iris.data, iris.target # Initialize and train a Random Forest Classifier with the best hyperparameters clf = RandomForestClassifier(n_estimators=best_params["n_estimators"], max_depth=best_params["max_depth"], min_samples_split=best_params["min_samples_split"], random_state=42) accuracy = model_accuracy(n_estimators=best_params["n_estimators"], max_depth=best_params["max_depth"], min_samples_split=best_params["min_samples_split"]) # Fit the model clf.fit(X, y) # Log the model to MLflow with mlflow.start_run(run_name="BestRandomForestModel") as run: mlflow.sklearn.log_model(clf, "best_random_forest_model") mlflow.log_metric("accuracy", accuracy) run_id = mlflow.active_run().info.run_id return str(run_id)
Task outputs in a workflow are promises. You need to send them to another task to materialize the promises.
any example on how we can do that, can you refer me any link
I fixed it thanks
Hey @Samhita Alla is there anyway we can add resource values in flyte-binary for task section, I see in flyte-core that we can configure task resources like storage memory as well as cpu and gpu.
You can add
section here:
        cpu: 1                                                                                                                                                                                              
        memory: 4Gi                                                                                                                                                                                         
        storage: 5Gi                                                                                                                                                                                        
        cpu: 16                                                                                                                                                                                                                                                                                                                                                                                          
        memory: 16Gi                                                                                                                                                                                         
        storage: 20Gi
Hey @Samhita Alla for the below code in flyte I get a certain error import flytekit from flytekit import task, workflow, Resources from typing import List, Tuple @task(requests=Resources(gpu="1", cpu="2", mem="1Gi"),container_image="822795565729.dkr.ecr.us-west-2.amazonaws.com/prime-analysis:prime-analysis-reksi-flyte_pipeline-0.0.10") def sleep(): import time time.sleep(3600) @workflow def toy_workflow(): sleep() I run it using pyflyte run --remote --project cpa-test toy_pipeline.py toy_workflow the error is /opt/nvidia/nvidia_entrypoint.sh: line 67: exec: pyflyte-fast-execute: not found I installed flyte and related packages. pyflyte-fast-execute is in PATH, but it still fails. I am not sure how to debug it, pod fails to start. can you please help me?
is it something because of the gpu's?
This error usually crops up when the architecture the image is built on isn't the same as the architecture that a container is spun up on. Can you use buildkit to specify the architecture while building your Docker image? You could also use image spec to simplify this process.