Slackbot
10/03/2023, 9:20 AMStefano
10/03/2023, 5:30 PMBroder Peters
10/04/2023, 8:19 AMflyteadmin-token
secret from this step kubectl get secrets -n flyte | grep flyteadmin-token
in https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#user-and-control-plane-deployment.
I deployed the clusters using helm with v1.9.0
From the logs I didn't see any real hinting errors to why it would be missing:
helm upgrade flyte flyteorg/flyte-core --version $FLYTE_VERSION -f $FLYTE_BASE_CONFIG_FILE -f $FLYTE_EKS_CONFIG_FILE -f $FLYTE_DATA_PLANE_CONFIG_FILE --install --create-namespace -n $FLYTE_NAMESPACE \
--set userSettings.accountId=$ACCOUNT_ID --set userSettings.accountRegion=$REGION --set userSettings.certificateArn=$ACM_CERTIFICATE_ARN --set userSettings.bucketName=$S3_CLUSTER_BUCKET \
--set userSettings.dbPassword=$RDS_PASSWORD --set userSettings.rdsHost=$RDS_HOST --set userSettings.rdsDb=$RDS_DATABASE --set userSettings.rdsUsername=$RDS_USERNAME \
--set userSettings.iamSystemRole=$FLYTE_SYSTEM_ROLE --set configmap.admin.admin.endpoint=$CONTROL_PLANE_ENDPOINT_ADDRESS \
--set configmap.admin.admin.insecure=false --set secrets.adminOauthClientCredentials.clientSecret=$CONTROL_PLANE_OAUTH_CLIENT_SECRET
Release "flyte" does not exist. Installing it now.
coalesce.go:223: warning: destination for flyte-core.flyteadmin.additionalContainers is a table. Ignoring non-table value ([])
NAME: flyte
LAST DEPLOYED: Wed Oct 4 05:05:06 2023
NAMESPACE: flyte
STATUS: deployed
REVISION: 1
TEST SUITE: None
Raimundo Manterola
10/04/2023, 7:15 PMJoe Hartshorn
10/11/2023, 12:37 PMBrian O'Donovan
10/11/2023, 6:03 PMCody Scandore
10/11/2023, 9:56 PM<dns://my-flyte-url.com>
just fine, and connect with a FlyteRemote
instance as well. I am having an issue when trying to connect from inside the EKS cluster, however, since using that endpoint would try to leave the cluster before returning (and the origin IP address is not on the load balancer allowlist).Stefano
10/12/2023, 12:36 PMLaura Lin
10/12/2023, 5:22 PMGreg Linklater
10/12/2023, 8:43 PMRahul Mehta
10/16/2023, 1:32 PMLaura Lin
10/24/2023, 6:10 PMflyte-user-role
but what if I want project A to use flyte-user-role-A and project B to use flyte-user-role-B?Guy Arad
10/25/2023, 8:45 AMeks-starter.yaml
Victor Delépine
10/26/2023, 4:38 PMGuy Arad
10/30/2023, 6:03 PM<http://cr.flyte.org/flyteorg/flytekit:py3.10-1.10.1b0|cr.flyte.org/flyteorg/flytekit:py3.10-1.10.1b0>
and I don't know where the list of available images.
2. I wasn't able to configure nodes with a private subnet as described in the walkthrough.
I created the roles described under 01. I then moved to create a cluster. This step assumes "API server endpoint access - public and private". I'm not entirely sure what this is referring to. An EKS cluster can be configured for "public and private". But this can only be set after the cluster is created.
I created a VPC with 2 public and 2 private subnets, and used them when creating the EKS cluster via the given command line.
NODEGROUP - I followed the guide but the nodegroups were NOT created successfully. If I selected all the 4 subnets upon creation, the nodegroup was trying to randomly use 1 when creating nodes. The nodes created on the private subnet couldn't join the EKS cluster. I moved to using only the public subnets and the nodes were starting OK.
The next steps of creating the bucket and databases went through OK. On to the deployment part -
I downloaded the starter yaml (which was missing the database.username
) and deployed Flyte. It was successful, placing 1 pod on the public node.
I noticed that liveness probe failed and then noticed the configuration was wrong (in the chart itself in your repo) - specifying "http" as the port, instead of 8088. I added this configuration on my end. Lastly, I ran port forwarding (needed to run 2 instances because http and grpc are defined as separate services) and successfully accessed Flyte console and triggered a workflow. However, the pod wasn't able to schedule because it requested 2 cpu - sounds like way too much! how can it be configured per task? if at all?
I increased the node size and then was able to schedule but now I'm receiving "containers with unready status: [f830d928e559a4ba7be9-n0-0]|Back-off pulling image "cr.flyte.org/flyteorg/flytekit:py3.10-1.10.1b0" error.Rahul Mehta
11/02/2023, 3:54 PMprojectQuotaMemory
isn't set in the helm chart, is there a default that Flyte falls back to? Trying to debug why we aren't scaling up to accommodate more concurrent executions in our environment, and notably we don't have any CRAs set at the moment. Is there any other default value in the chart which might limit concurrency?Laura Lin
11/03/2023, 4:00 AMEnsure that the propeller has the correct service account for Athena.
meansAlain GALDEMAS
11/04/2023, 12:21 PMBrian Tang
11/08/2023, 8:00 AMs3
and gcs
are the only accepted provider
for object storage. any advice on how we can deploy Flyte in Huawei cloud? has anyone done it before?
my understanding is, flyte-binary requires postgres db and 2 object storage buckets. almost everything else is kubernetes-native?Amadeusz Lisiecki
11/08/2023, 1:59 PMflyte-binary
) on EKS.
I want to customise the creation of projects
and domains
with configuration (like iam_role
).
I found projects and domains mentioned in 2 places:
https://github.com/flyteorg/flyte/blob/6b5994ff1d529416113b790897367a7c847c4650/charts/flyte-binary/values.yaml#L21-L22
seedProjects:
- flytesnacks
and
https://github.com/flyteorg/flyte/blob/6b5994ff1d529416113b790897367a7c847c4650/charts/flyte-binary/eks-production.yaml#L32-L43
inline:
cluster_resources:
customData:
- production:
- defaultIamRole:
value: <FLYTE_USER_IAM_ARN>
- staging:
- defaultIamRole:
value: <FLYTE_USER_IAM_ARN>
- development:
- defaultIamRole:
value: <FLYTE_USER_IAM_ARN>
How should I configure my values.yaml
if I want to achieve this?
project1:
sandbox:
defaultIamRole: role1
dev:
defaultIamRole: role2
project2:
dev:
defaultIamRole: role3
hfurkanvural
11/09/2023, 5:10 PMseparateGrpcIngress: true
, so nginx controller can handle http and http2. However using same fqdn does not make nginx-ingress-controller happy really.. I m getting Warning Rejected 8m28s nginx-ingress-controller All hosts are taken by other resources
error on one of the ingress objects (depending on which one is applied first, it changes.) Apparently nginx does not support multiple ingress objects for the same host/fqdn. When I try to deploy with only one ingress object, then grpc protocol is not supported. So, I was wondering if anyone made it work with some kind of magical configuration.
PS: I tried mergeable-ingress-types
as well, but no luck :(Ethan Brown
11/15/2023, 7:05 PMAnu
11/17/2023, 1:12 AMflytectl demo start
.
What I would like to do now is to have my devcontainer (running in Docker) run a workflow against the remote cluster running on my host machine.
This is the config I have within the devcontainer:
[.flyte/config.yaml]
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///host.docker.internal:30080
insecure: true
logger:
show-source: true
level: 0
When I run the command:
pyflyte -c .flyte/config.yml run --remote example_workflow.py training_workflow --hyperparameters '{"C": 0.1}'
I get the following errors:
Failed with Unknown Exception <class 'requests.exceptions.ConnectionError'> Reason: HTTPConnectionPool(host='localhost', port=30002): Max retries exceeded with url: /my-s3-bucket/flytesnacks/development/Z7RBA57NTP5424RGXLLVDVDS6Y%3D%3D%3D%3D%3D%3D/script_mode.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20231117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231117T011145Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-md5%3Bhost&X-Amz-Signature=08a3aecceb4590508dc68093b1b1973644bffb92bcaf2b96eb074bc458a5fe14 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fffd85d06a0>: Failed to establish a new connection: [Errno 111] Connection refused'))
HTTPConnectionPool(host='localhost', port=30002): Max retries exceeded with url: /my-s3-bucket/flytesnacks/development/Z7RBA57NTP5424RGXLLVDVDS6Y%3D%3D%3D%3D%3D%3D/script_mode.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20231117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231117T011145Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-md5%3Bhost&X-Amz-Signature=08a3aecceb4590508dc68093b1b1973644bffb92bcaf2b96eb074bc458a5fe14 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fffd85d06a0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Anyone know what I can do here?
Thank youStefano
11/28/2023, 2:30 PMFaysal Ishtiaq
11/28/2023, 4:06 PMconfiguration:
database:
username: <DB_USERNAME>
password: <DB_PASSWORD>
host: <RDS_HOST_DNS>
dbname: postgres
storage:
metadataContainer: <BUCKET_NAME>
userDataContainer: <USER_DATA_BUCKET_NAME>
provider: s3
providerConfig:
s3:
region: "eu-west-1"
authType: "iam"
inline:
plugins:
k8s:
inject-finalizer: true
default-env-vars:
- AWS_METADATA_SERVICE_TIMEOUT: 5
- AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20
storage:
cache:
max_size_mbs: 100
target_gc_percent: 100
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: "<FLYTE_BACKEND_IAM_ARN>"
configmap:
inline:
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- K8S-ARRAY
- ray
default-for-task-types:
- container: container
- container_array: K8S-ARRAY
- ray:ray
This is my terraform
resource "helm_release" "flyte_binary" {
repository = "<https://flyteorg.github.io/flyte>"
chart = "flyte-binary"
name = "flyte-binary"
namespace = "flyte"
create_namespace = true
recreate_pods = true
depends_on = [helm_release.karpenter, null_resource.karpenter_provisioner_deployment]
values = [
file("${path.module}/../helm-chart-values/flyte-binary.yaml")
]
set {
name = "configuration.database.username"
value = module.rds.rds_username
}
set {
name = "configuration.database.password"
value = module.rds.rds_password
}
set {
name = "configuration.database.host"
value = trimsuffix(module.rds.rds_endpoint, ":5432")
}
set {
name = "configuration.storage.metadataContainer"
value = module.s3_flyte_storage_backend.s3_bucket[0]
}
set {
name = "configuration.storage.userDataContainer"
value = module.s3_flyte_user_data.s3_bucket[0]
}
set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = aws_iam_role.flyte_service_account_assumed_role.arn
}
}
And I am getting these errors:
$ kubectl logs -n flyte flyte-binary-fd46cc7f8-wdc6l
Defaulted container "flyte" out of: flyte, wait-for-db (init)
time="2023-11-28T16:25:53Z" level=info msg="Using config file: [/etc/flyte/config.d/000-core.yaml /etc/flyte/config.d/001-plugins.yaml /etc/flyte/config.d/002-database.yaml /etc/flyte/config.d/003-storage.yaml /etc/flyte/config.d/012-database-secrets.yaml /etc/flyte/config.d/100-inline-config.yaml]"
{"json":{"src":"start.go:184"},"level":"panic","msg":"Failed to start Admin, err: database migration failed: ERROR: relation \"description_entities\" does not exist (SQLSTATE 42P01)","ts":"2023-11-28T16:25:54Z"}
panic: (*logrus.Entry) 0xc000f50e70
goroutine 119 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc000f50d90, 0x0, {0xc000160e80, 0x7d})
/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x45b
github.com/sirupsen/logrus.(*Entry).Log(0xc000f50d90, 0x0, {0xc000eb9e68?, 0x1?, 0x1?})
/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x4f
github.com/sirupsen/logrus.(*Entry).Logf(0xc000f50d90, 0x0, {0x317b93d?, 0x0?}, {0xc0018aa520?, 0x0?, 0x0?})
/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:338 +0x85
github.com/sirupsen/logrus.(*Entry).Panicf(0x42bab98?, {0x317b93d?, 0x416667?}, {0xc0018aa520?, 0x2a95c00?, 0x42bab01?})
/go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:376 +0x34
github.com/flyteorg/flyte/flytestdlib/logger.Panicf({0x42bab98?, 0xc000d1c400?}, {0x317b93d, 0x1e}, {0xc0018aa520, 0x1, 0x1})
/flyteorg/build/flytestdlib/logger/logger.go:188 +0x64
github.com/flyteorg/flyte/cmd/single.glob..func4.1()
/flyteorg/build/cmd/single/start.go:184 +0xcc
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:72 +0xa5
Could you please help me solving this issue?Ethan Brown
11/29/2023, 12:18 AMDhruv Malik
11/29/2023, 2:16 PMconfiguration:
storage:
metadataContainer: <BUCKET_NAME>
userDataContainer: <USER_DATA_BUCKET_NAME>
provider: s3
providerConfig:
s3:
region: "<AWS_REGION>"
authType: "iam"
Jan Fiedler
11/30/2023, 2:16 PMpyflyte register
. My understanding was:
hitting pyflyte register --> submitting the workflow artifacts over grcp to flyteadmin --> flyteadmin stores the artifacts in the object storage (minio in my case) --> done. (Is this correct ?)David Espejo (he/him)
11/30/2023, 6:23 PMFlyte the Hard Way
to make it more reliable in terms of setting the required permissions, especially for the workers:
https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/03-roles-service-accounts.md
Thanks to feedback from @Alexandra D and @Guy Arad
Any question/issue you may have, please let us know.Ethan Brown
11/30/2023, 10:47 PM/flyte
instead -- will I run into any particular issues with being able to properly configure all the various external clients (like propeller in a remote data plane)? Will the console run and generate urls correctly still?