Help please. My first `pyflyte run --remote` command fails with Handshake failed with fatal error `...
b

Blair Anson

over 2 years ago
Help please. My first
pyflyte run --remote
command fails with Handshake failed with fatal error
SSL_ERROR_SSL
.
$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'

{"asctime": "2023-04-15 16:54:36,923", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:36,950", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:36,954", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:37,937", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-15 16:54:38,379", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-15 16:54:38,408", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-15 16:54:38,696", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:38,697", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
E0415 16:54:39.685038207  177107 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]>       Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
E0415 16:54:40.191239374  177107 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]>       Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
Failed with Exception: Reason: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
	details: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8088: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
	Debug string UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8088: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER {created_time:"2023-04-15T16:54:40.193233866+09:00", grpc_status:14}
I understand this error usually occurs when the
.flyte/config.yaml
and env variable config is not correct. I have checked that but I must be missing something obvious. Here is my setup... Remote cluster is AWS EKS running in a VPC Flyte was installed following instructions in https://docs.flyte.org/en/latest/deployment/deployment/cloud_simple.html Local ports are proxied to these flyte services...
kubectl -n flyte port-forward service/flyte-backend-flyte-binary-grpc 8089:8089 &
kubectl -n flyte port-forward service/flyte-backend-flyte-binary-http 8088:8088 &
Env vars...
$ echo $FLYTECTL_CONFIG
/home/blair/.flyte/config.yaml

$ echo $KUBECONFIG
:/home/blair/.kube/config
.flyte/config.yaml
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///localhost:8088
  authType: Pkce
  insecure: false
logger:
  show-source: true
  level: 0
Hi all! Can you please advise me an example with hello world workflow on Flyte where at least two ta...
i

illarion Disabled

over 3 years ago
Hi all! Can you please advise me an example with hello world workflow on Flyte where at least two tasks executed on their own containers? With single python workflow code on flyte/workflow/example.py and such task option:
@task(container_image="<http://registry.name.com/project/image:tag|registry.name.com/project/image:tag>")
I am trying to build this example by myself on my local Flyte sandbox, but getting error messages:
ModuleNotFoundError: No module named 'flyte'
My workflow from example.py
@task(container_image="<http://registry.name.com/project/image:tag|registry.name.com/project/image:tag>")
def some_data_generation() -> PythonPickledFile:
    with open(BASE_FILE_PATH) as file:
        some_descriptors = json.load(file)

    some_set = generate_some_set(some_descriptors[0])
    with open(PICKLE_PATH, 'wb') as handle:
        pickle.dump(some_set, handle)
        return PICKLE_PATH

@task(container_image="<http://registry.name.com/project/image2:tag|registry.name.com/project/image2:tag>")
def load_pickle_dump(dump_file_path: PythonPickledFile) -> set:
    with open(dump_file_path, 'rb') as handle:
        return pickle.load(handle)

@workflow
def my_wf() -> set:
    dump_file_path = some_data_generation()
    return load_pickle_dump(dump_file_path=dump_file_path)

if __name__ == "__main__":
    a = my_wf()
    print(f"Running my_wf() {a}")
What was localised: 1. Workflow works fine without containers (without @task(container_image= ) 2. Images works fine if workflow contain only one task with Docker image where included Flyte Workflow folder with this file (example.py) 3. Problem appears if i build Docker Image for first task without Flyte workflow files inside (but with initial data for first files to skip downloading). 4. I am sure - i can skip second task at all from this test case, problem should appear. Root cause - i have Docker image without Flute workflow (example.py) but it seems that this code is required inside the container to be executed. I do not understand how can i split example.py between two tasks if it should be executed actually outside the tasks (because this is a workflow, if should contain tasks inside it according to example) Error:
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[j3nl8bafcr-n0-0] terminated with exit code (1). Reason [Error]. Message: 
thon3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 460, in execute_task_cmd
    _execute_task(
  File "/opt/venv/lib/python3.8/site-packages/flytekit/exceptions/scopes.py", line 160, in system_entry_point
    return wrapped(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 327, in _execute_task
    _task_def = resolver_obj.load_task(loader_args=resolver_args)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/core/python_auto_container.py", line 189, in load_task
    task_module = importlib.import_module(task_module)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'flyte'
As i see - it caused by Docker Container without a Flyte workflow. But if i will place workflow in first task - how it should be correctly splitted between two tasks? Should not include code for first task inside the second task image/container? Am i on correct way at all? Can you please advise a simple short workflow example on Python with two tasks on separate containers?
Hi All, I am seeing this error when I am trying to run flyte on k8s. What am I missing? ```{"json":...
a

Abhinay Dronavally

over 2 years ago
Hi All, I am seeing this error when I am trying to run flyte on k8s. What am I missing?
{"json":{},"level":"warning","msg":"Failed to create cluster resources for namespace [flytesnacks-development] with err: Failed to read config template dir [flytesnacks-development] for namespace [] with err: open : no such file or directory","ts":"2023-05-24T09:40:15Z"}
{"json":{},"level":"warning","msg":"Failed to create cluster resources for namespace [flytesnacks-staging] with err: Failed to read config template dir [flytesnacks-staging] for namespace [] with err: open : no such file or directory","ts":"2023-05-24T09:40:15Z"}
{"json":{},"level":"warning","msg":"Failed to create cluster resources for namespace [flytesnacks-production] with err: Failed to read config template dir [flytesnacks-production] for namespace [] with err: open : no such file or directory","ts":"2023-05-24T09:40:15Z"}
{"json":{},"level":"warning","msg":"Failed cluster resource creation loop with: Failed to read config template dir [flytesnacks-development] for namespace [] with err: open : no such file or directory, Failed to read config template dir [flytesnacks-staging] for namespace [] with err: open : no such file or directory, Failed to read config template dir [flytesnacks-production] for namespace [] with err: open : no such file or directory","ts":"2023-05-24T09:40:15Z"}
{"json":{},"level":"error","msg":"Failed to initialize certificates for Secrets Webhook. client rate limiter Wait returned an error: context canceled","ts":"2023-05-24T09:40:20Z"}
{"json":{},"level":"panic","msg":"Failed to start Propeller, err: failed to create FlyteWorkflow CRD: <http://customresourcedefinitions.apiextensions.k8s.io|customresourcedefinitions.apiextensions.k8s.io> is forbidden: User \"system:serviceaccount:test-apps:test-flyte-role\" cannot create resource \"customresourcedefinitions\" in API group \"<http://apiextensions.k8s.io|apiextensions.k8s.io>\" at the cluster scope","ts":"2023-05-24T09:40:20Z"}
Hi, I am trying out flyte and I cannot get through one error. Maybe I am doing something very poorly...
j

Jakub Peschel

over 2 years ago
Hi, I am trying out flyte and I cannot get through one error. Maybe I am doing something very poorly so sorry for that if that is the case. I am following the Getting started page and I wanted to try create local cluster and feed it with workflows. When I try to run it locally, it passes without any issue:
(venv) jpeschel@kinnan:~/Workplace/flyte-demo$ pyflyte run flytedemo.py training_workflow --hyperparameters '{"C": 0.1}'
LogisticRegression(C=0.1, max_iter=3000)
But if I start new local cluster:
(venv) jpeschel@kinnan:~/Workplace/flyte-demo$ ./bin/flytectl demo start
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags. 
🧑‍🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
delete existing sandbox cluster [y/n]: 
y
🐋 Going to use Flyte v1.7.0 release with image <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1ae254f8683699b68ecddc89d775fc5d39cc3d84|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1ae254f8683699b68ecddc89d775fc5d39cc3d84> 
🐋 pulling docker image for release <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1ae254f8683699b68ecddc89d775fc5d39cc3d84|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1ae254f8683699b68ecddc89d775fc5d39cc3d84>
🧑‍🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
context flyte-sandbox already exist. Overwriting it
context modified for "flyte-sandbox" and switched over to it.
+-----------------------------------+---------------+-----------+
|              SERVICE              |    STATUS     | NAMESPACE |
+-----------------------------------+---------------+-----------+
| k8s: This might take a little bit | Bootstrapping |           |
+-----------------------------------+---------------+-----------+
+-----------------------------------+---------------+-----------+
|              SERVICE              |    STATUS     | NAMESPACE |
+-----------------------------------+---------------+-----------+
I don't get the expected output from Getting started page and when I try to send the workflow on the cluster I get this error:
(venv) jpeschel@kinnan:~/Workplace/flyte-demo$ pyflyte run --remote flytedemo.py training_workflow --hyperparameters '{"C": 0.1}'
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
        details: failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:30080: Socket closed
        Debug string UNKNOWN:failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:30080: Socket closed {created_time:"2023-06-19T09:17:08.725217881+02:00", grpc_status:14}
I tried to check whether the problem is caused by closed ports but netstat showed that port is opened:
(venv) jpeschel@kinnan:~/Workplace/flyte-demo$ sudo netstat -ntlp
[sudo] password for jpeschel: 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      101329/docker-proxy 
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      3430/systemd-resolv 
tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN      4241/cupsd          
tcp        0      0 0.0.0.0:30080           0.0.0.0:*               LISTEN      101277/docker-proxy 
tcp        0      0 0.0.0.0:30001           0.0.0.0:*               LISTEN      101302/docker-proxy 
tcp        0      0 0.0.0.0:30000           0.0.0.0:*               LISTEN      101316/docker-proxy 
tcp        0      0 0.0.0.0:30002           0.0.0.0:*               LISTEN      101290/docker-proxy 
tcp6       0      0 ::1:631                 :::*                    LISTEN      4241/cupsd          
tcp6       0      0 127.0.0.1:63342         :::*                    LISTEN      15418/java
As well as ping from nmap:
(venv) jpeschel@kinnan:~/Workplace/flyte-demo$ nmap -p 30080 127.0.0.1
Starting Nmap 7.80 ( <https://nmap.org> ) at 2023-06-19 10:06 CEST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000080s latency).

PORT      STATE SERVICE
30080/tcp open  unknown
I am at the ubuntu 22.0.4.2 LTS, I have 11th Gen Intel® Core™ i7-11850H @ 2.50GHz × 16 and 32GiB of memory, which should be more than sufficient for this demo. Is there something that I didn't do that is required?