freezing-shampoo-67249
03/15/2024, 8:44 AM@dataclass_json
@dataclass
class SomeMapTaskInput:
a: FlyteFile
b: FlyteFile
c: FlyteFile
@task(
cache=True,
cache_version="0.0.1",
cache_serialize=False,
requests=...
limits=...
)
def map_task_input_preparation(
cfg: shared.OHLIConfigRightOfWay,
as: List[FlyteFile],
bs: List[FlyteFile],
c: FlyteFile,
) -> List[SomeMapTaskInput]:
...
return [SomeMapTaskInput(a, b, c) for a, b in zip(as, bs)]
flat-area-42876
03/15/2024, 8:58 AMThe problem is deterministic and occurs always in the same tasks across different executions.This makes me think this could be related to already exists errors
freezing-shampoo-67249
03/15/2024, 9:17 AM2024/03/14 23:40:03 /flyteorg/build/datacatalog/pkg/repositories/gormimpl/dataset.go:36 ERROR: duplicate key value violates unique constraint "datasets_pkey" (SQLSTATE 23505)
[2.037ms] [rows:0] INSERT INTO "datasets" ("created_at","updated_at","deleted_at","project","name","domain","version","uuid","serialized_metadata") VALUES ('2024-03-14 23:40:03.37','2024-03-14 23:40:03.37',NULL,'ohli-core','flyte_task-OHLI.workflows.right_of_way.right_of_way_input_preparation','development','0.0.5_0.0.2-fJQjzhDJ-G6gfv-8i','5b192671-b4b5-42fe-83eb-b868fd7232e9','<binary>')
What's a bit odd, though, is that I am seeing a similar log for every task that finished. So, it's not unique to the task at which the cache put failure occurs.freezing-shampoo-67249
03/15/2024, 9:32 AMfreezing-shampoo-67249
03/15/2024, 12:47 PMFailed to create artifact id: ...
(see screenshot below) followed by
Failed to write results to catalog for Task [{{{} [] [] 0xc000720000} 157 [] TASK ohli-core development OHLI.workflows.orthophoto.orthophoto_phase4_input_preparation local_a75f358f194a41cbcf09d8dff19c13801ed08b1f_2024-03-14-10-55-45 }]. Error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (26606474 vs. 4194304)
So, we are obviously running into a size limit. The datamodel that is logged in "Failed to create artifact id" is also massive.
Is there a limit we can adjust in the platform configuration?alert-oil-1341
03/15/2024, 1:10 PMserver.grpc.maxMessageSizeBytes
value higher.alert-oil-1341
03/15/2024, 1:14 PMfreezing-shampoo-67249
03/15/2024, 1:54 PMconfiguration:
inline:
server:
grpc:
maxMessageSizeBytes: 33554432
freezing-shampoo-67249
03/15/2024, 2:35 PMalert-oil-1341
03/15/2024, 2:51 PMalert-oil-1341
03/15/2024, 2:51 PMaverage-finland-92144
03/15/2024, 3:58 PMkubectl describe cm flyte-binary -n flyte
?freezing-shampoo-67249
03/18/2024, 8:37 AMapiVersion: v1
data:
000-core.yaml: |
admin:
clientId: flytepropeller
endpoint: localhost:8089
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
cluster_resources:
standaloneDeployment: false
templatePath: /etc/flyte/cluster-resource-templates
logger:
show-source: true
level: 3
propeller:
create-flyteworkflow-crd: true
webhook:
certDir: /var/run/flyte/certs
localCert: true
secretName: flyte-backend-flyte-binary-webhook-secret
serviceName: flyte-backend-flyte-binary-webhook
servicePort: 443
flyte:
admin:
disableClusterResourceManager: false
disableScheduler: false
disabled: false
seedProjects:
- flytesnacks
dataCatalog:
disabled: false
propeller:
disableWebhook: false
disabled: false
001-plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
plugins:
logs:
kubernetes-enabled: false
cloudwatch-enabled: false
stackdriver-enabled: false
templates:
- displayName: Logs
messageFormat: 0
templateUris:
- http://<host>:<port>/d/df43f8e0-6db3-4f36-92bf-7083547f9b18/logs?orgId=1&var-podName={{
.podName }}&var-containerName={{ .containerName }}&var-namespace={{ .namespace
}}&from={{ .podUnixStartTime }}000&to=now
- displayName: Resource Usage
messageFormat: 0
templateUris:
- http://<host>:<port>/d/6581e46e4e5c7ba40a07646395ef7b23/kubernetes-compute-resources-pod?orgId=1&refresh=10s&var-datasource=default&var-cluster=&var-namespace={{
.namespace }}&var-pod={{ .podName }}&from={{ .podUnixStartTime }}000&to={{ .podUnixFinishTime
}}000
k8s:
co-pilot:
image: "<http://cr.flyte.org/flyteorg/flytecopilot-release:v1.11.0|cr.flyte.org/flyteorg/flytecopilot-release:v1.11.0>"
k8s-array:
logs:
config:
kubernetes-enabled: false
cloudwatch-enabled: false
stackdriver-enabled: false
templates:
- displayName: Logs
messageFormat: 0
templateUris:
- http://<host>:<port>/d/df43f8e0-6db3-4f36-92bf-7083547f9b18/logs?orgId=1&var-podName={{
.podName }}&var-containerName={{ .containerName }}&var-namespace={{ .namespace
}}&from={{ .podUnixStartTime }}000&to=now
- displayName: Resource Usage
messageFormat: 0
templateUris:
- http://<host>:<port>/d/6581e46e4e5c7ba40a07646395ef7b23/kubernetes-compute-resources-pod?orgId=1&refresh=10s&var-datasource=default&var-cluster=&var-namespace={{
.namespace }}&var-pod={{ .podName }}&from={{ .podUnixStartTime }}000&to={{ .podUnixFinishTime
}}000
002-database.yaml: |
database:
postgres:
username: flyte
host: flyte-postgres-postgresql-hl.flyte
port: 5432
dbname: flyte
options: "sslmode=disable"
003-storage.yaml: |
propeller:
rawoutput-prefix: <s3://flyteuserdata-985a62c2-9998-4558-b0d9-4a3bc1b8464e/data>
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
disable_ssl: true
v2_signing: true
endpoint: http://<host>
auth_type: accesskey
container: flytemeta-985a62c2-9998-4558-b0d9-4a3bc1b8464e
004-auth.yaml: <placeholder>
100-inline-config.yaml: |
catalog-cache:
max-cache-age: 1416h
flyteadmin:
useOffloadedWorkflowClosure: true
plugins:
k8s:
default-pod-template-name: ohli-template
propeller:
max-output-size-bytes: 52428800
server:
grpc:
maxMessageSizeBytes: 33554432
storage:
limits:
maxDownloadMBs: 50
task_resources:
defaults:
cpu: 500m
gpu: 0
memory: 500Mi
limits:
cpu: 24
gpu: 1
memory: 48Gi
tasks:
task-plugins:
default-for-task-types:
- container: container
- container_array: k8s-array
- dask: dask
enabled-plugins:
- container
- sidecar
- k8s-array
- dask
kind: ConfigMap
metadata:
annotations:
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
creationTimestamp: "2024-03-13T15:57:07Z"
labels:
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte-backend
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyte-binary
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 1.16.0
<http://helm.sh/chart|helm.sh/chart>: flyte-binary-v1.11.0
<http://helm.toolkit.fluxcd.io/name|helm.toolkit.fluxcd.io/name>: flyte-backend
<http://helm.toolkit.fluxcd.io/namespace|helm.toolkit.fluxcd.io/namespace>: flyte
name: flyte-backend-flyte-binary-config
namespace: flyte
resourceVersion: "90523012"
uid: 4ad5249d-3154-4cfc-b7d0-469546977fa2
freezing-shampoo-67249
03/22/2024, 10:20 AM