SeungTaeKim
Nischel Kandru (Woven Planet)
L godlike
/etc/slurm/slurm.conf
NodeName=localhost Gres=gpu:1 CPUs=4 RealMemory=15006 Sockets=1 CoresPerSocket=2 ThreadsPerCore=2 State=UNKNOWN PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP
/etc/slurm/gres.conf
AutoDetect=nvml NodeName=localhost Name=gpu Type=tesla File=/dev/nvidia0 COREs=0
slurmd -C
Cody Scandore
pyflyte
Failed with Exception Code: SYSTEM:Unknown RPC Failed, with Status: StatusCode.INTERNAL details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity status code: 403, request id: 5efc9c88-fdcb-42ab-bea8-8de7a79101e9 Debug string UNKNOWN:Error received from peer {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 5efc9c88-fdcb-42ab-bea8-8de7a79101e9", grpc_status:13, created_time:"2023-06-14T11:16:05.730571-07:00"}
Rezwan Abir
Nan Qin
Aleksander Lempinen
An error occurred while calling o125.parquet. : java.nio.file.AccessDeniedException: s3://<bucket>/<path>: getFileStatus on s3://<bucket>/<path>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;
spark-config-default: # We override the default credentials chain provider for Hadoop so that # it can use the serviceAccount based IAM role or ec2 metadata based. # This is more in line with how AWS works - spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain" - spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: "2" - spark.kubernetes.allocation.batch.size: "50" - spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl" - spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A" - spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A" - spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A" - spark.hadoop.fs.s3a.multipart.threshold: "536870912" - spark.blacklist.enabled: "true" - spark.blacklist.timeout: "5m" - spark.task.maxfailures: "8"
Anna Cunningham
FlyteRemote.sync
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.RESOURCE_EXHAUSTED details = "Received message larger than max (4762259 vs. 4194304)" debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.3.75:81 {grpc_message:"Received message larger than max (4762259 vs. 4194304)", grpc_status:8, created_time:"2022-08-23T23:23:18.247266112+00:00"}"
Katrina P
Nicholas Roberson
flytekit.remote
Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.