Masa Nakamura
01/31/2023, 12:43 AMpyflyte run --remote
command? I was wondering if I could add --env-file
option like docker run
command.Fi fi
01/31/2023, 7:58 AM"name": "flytekit", "levelname": "WARNING", "message": "Unsupported Type <class 'feast.repo_config.RepoConfig'> found, Flyte will default to use PickleFile as the transport. Pickle can only be used to send objects between the exact same version of Python, and we strongly recommend to use python type that flyte support."}
this is the code
import os
import boto3
import logging
from flytekit import task
from feast.infra.offline_stores.file import FileOfflineStoreConfig
from feast.infra.online_stores.sqlite import SqliteOnlineStoreConfig
from feast.repo_config import RepoConfig
logger = logging.getLogger(__file__)
# execution on demo cluster
os.environ["FEAST_S3_ENDPOINT_URL"] = ENDPOINT = "<http://192.168.1.117:30002>"
os.environ["AWS_ACCESS_KEY_ID"] = "minio"
os.environ["AWS_SECRET_ACCESS_KEY"] = "miniostorage"
# flytectl config set s3.endpoint_url http://<minio_host>:<minio_port>
# flytectl config set s3.access_key <minio_access_key>
# flytectl config set s3.secret_key <minio_secret_key>
bucket_name = "my-s3-bucket"
registry_path = "data/registry.db"
online_store_path = "data/online_store.db"
@task
def create_bucket(
bucket_name: str, registry_path: str, online_store_path: str
) -> RepoConfig:
client = boto3.client(
"s3",
aws_access_key_id="minio",
aws_secret_access_key="miniostorage",
use_ssl=False,
endpoint_url="<http://192.168.1.117:30002>",
)
return RepoConfig(
registry=f"<s3://my-s3-bucket/data/online_store.db>",
project="my_project",
provider="local",
offline_store=FileOfflineStoreConfig(),
online_store=SqliteOnlineStoreConfig(path=online_store_path),
entity_key_serialization_version=2,
)
Christian Stevandy
01/31/2023, 9:31 AMSujith Samuel
01/31/2023, 10:49 AMSujith Samuel
01/31/2023, 12:11 PMSujith Samuel
01/31/2023, 12:11 PMSeth Baer
01/31/2023, 2:55 PMKhor Shu Heng
02/01/2023, 5:27 AMplugin execution, caused by: failed to execute handle for plugin [container]: [InternalError] failed to create resource, caused by: Internal error occurred: failed calling webhook "<http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>": failed to call webhook: Post "<https://flyte-pod-webhook.flyte.svc:443/mutate--v1-pod?timeout=10s>": x509: certificate has expired or is not yet valid: current time 2023-02-01T05:22:42Z is after 2022-11-24T05:12:16Z
Khor Shu Heng
02/01/2023, 5:28 AMKhor Shu Heng
02/01/2023, 5:31 AMMohd Shahid Khan Afridi
02/01/2023, 5:32 AMAdedeji Ayinde
02/01/2023, 8:15 AMflytekit.exceptions.scopes.FlyteScopedUserException: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 3) (localhost executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
INFO:py4j.clientserver:Closing down clientserver connection
Heidi Hurst
02/01/2023, 1:48 PMflytekit.exceptions.user.FlyteAssertion: Input was not specified for: optional_int of type simple: INTEGER
. Issues #1312 [BUG][flytekit] Default input argument not being considered in PythonInstanceTask and #3046 both seem related. Has anyone else seen this before? (flytekit 1.3.0)Bernhard Stadlbauer
02/01/2023, 2:54 PMinterruptible=True
). The task then fails with
Last Error: USER::Pod was terminated in response to imminent node shutdown.
The last log from the node where this task is run on is:
Deleting node <node-id> because it does not exist in the cloud provider
So we assume the node is reclaimed by GCP.
Does anyone happen to know whether there might be a difference in how interruptible task failures are handled in subworkflows?Frank Shen
02/01/2023, 7:46 PMUse cases:
ENV = dev, beta, or prod
Custom constant e.g. PERSIST_BUCKET_PREFIX = 'hbomax-datascience-deployment', -> s3_bucket = f'{PERSIST_BUCKET_PREFIX}-{ENV}'
Frank Shen
02/01/2023, 7:46 PMRupsha Chaudhuri
02/02/2023, 6:54 AMinterruptible=False
flag.. which ends up dragging the task on much longer than it should (since it retries). Is there something different for map tasks?Frank Shen
02/02/2023, 5:34 PMVinícius Sosnowski
02/02/2023, 5:56 PMRupsha Chaudhuri
02/02/2023, 10:03 PMrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.20.71.119:81: connect: connection refused"
Not sure what component it’s trying to connect to… and not sure what’s causing the flakiness… Replicas are starting and stopping somewhat arbitrarily. Any insight would be appreciatedDavid Espejo (he/him)
02/02/2023, 10:41 PMThomas Blom
02/03/2023, 2:53 AM--fast
in conjunction with --image
to do a fast register of local code that will be "overlayed" onto to the specified image (as I understand it). But in my trials I find that:
1. --fast is not a recognized option, eventually realized it is a command (without dashes), are the docs are wrong?
2. Then it complains "No such option --image". Again, are docs are wrong? Or I'm misreading something.
I was trying to modify a command that works without "fast" into one that runs with "fast" to see if I can avoid building and pushing a new image (the image I specify in this case already exists). The command looks like this, where I've inserted $FAST_REG into a command that otherwise works to serialize against a given image:
pyflyte --pkgs plaster.genv2 serialize $FAST_REG --image <my-ecr-registry-path>/<my-image-name>:<tag> --local-source-root . workflows -f /tmp/_pb_outputs
...where the $FAST_REG is where I'm attempting to insert "fast" and run my local code overlayed with what is in the image I'm specifying. I'm probably misunderstanding something, and the docs aren't helping me.
Thanks for any tips!Yubo Wang
02/03/2023, 6:45 AMewam
02/03/2023, 10:19 AMewam
02/03/2023, 10:23 AMewam
02/03/2023, 10:37 AMewam
02/03/2023, 10:45 AM@dynamic
decorator. Is it true that I can use it to:
• run a thread that e.g. looks for new data
• if new data is found:
◦ spawn a new subgraph
▪︎ rebuild the dataset
▪︎ cancel the current training
▪︎ (re)start training task
?ewam
02/03/2023, 11:51 AM@dynamic
workflow python function?Seth Baer
02/03/2023, 3:53 PMcontext deadline exceeded
rpc error. It looks like this:
The error:
------------------------------------------------------------------------ --------- ---------------------------------------------------
| /tmp/register4200568562/16_workflow.cx_topic_model_workflow_2.pb | Failed | Error registering file due to rpc error: code = |
| | | DeadlineExceeded desc = context deadline exceeded |
------------------------------------------------------------------------ --------- ---------------------------------------------------
In the doc linked below, it's mentioned that the flytectl wait deadline is 15 seconds:
https://docs.flyte.org/en/latest/community/troubleshoot.html#troubles-with-flytectl-commands-with-auth-enabled
We looked around but weren't able to find the setting to increase that wait time. Any guidance here would be much appreciated!! Thanks so much
Also, here's the packaging success message (we have a 37-node wf in this specific message, but it's the same registration issue):
Packaging app_module.workflow.cx_topic_model_workflow -> 36_app_module.workflow.cx_topic_model_workflow_3.pb
Successfully packaged 37 flyte objects into /root/flyte-package.tgz
Andrew Achkar
02/03/2023, 7:32 PM