Hi, I have a ContianerTask as shown below ```my-task = ContainerTask( metadata=TaskMetadata(cach...
g

Gaurav Kumar

about 2 years ago
Hi, I have a ContianerTask as shown below
my-task = ContainerTask(
    metadata=TaskMetadata(cache=True, cache_version="1.0"),
    name="my-task",
    image="my-image",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(inDir=str),
    outputs=kwtypes(out=str),
    requests=Resources(gpu="1"),
    limits=Resources(gpu="1"),
    command=[
        "/bin/bash",
    ],
    arguments=[
        "-c",
        "echo \"out\" > /var/outputs/out; ... other commands"
        ],
   ....
)
I wanted to cache the task, for which I found that I had to put inputs/outputs even though I don’t need them. So, I just a string “out” in
/var/outputs/out
as shown in the
arguments
and put a string in the
inDir
as below while calling the task.
@workflow
def aeb_sanity_workflow(data: Dict):
    ## -----------------------------------------------------------------------------
    .......
    my_task_promise = my-task(inDir="some string")
    ........
This was working for me with earlier version of Flyte mentioned below
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>
However, I use the master version of flyte patched with https://github.com/flyteorg/flyte/pull/3256 and manually built in
docker/sandbox-bundled
using
make build-gpu
because I needed gpu support in sandbox. I’m seeing that with this latest version, I saw two issues which were not there with
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>
tag: v1.8.1
1. For the above mentioned ContainerTask, It’s throwing errors saying output doesn’t exist after workflow execution. I haven’t changed a single line of code in in the
my-task
except the latest flyte image. 2. Also, for the task that needs GPU, since, the image size is huge ~24 GB, k8 node came under disk pressure, and severals pods were evicted.
> kubectl describe pod <gpu-pod>
  Warning  Evicted              8m10s (x3 over 9m30s)  kubelet            The node was low on resource: ephemeral-storage.
  Warning  ExceededGracePeriod  8m (x3 over 9m20s)     kubelet            Container runtime did not kill the pod within specified grace period.
  Normal   Pulled               7m59s                  kubelet            Successfully pulled image "my-gpu-image" in 8m40.26240502s
  Normal   Created              7m59s                  kubelet            Created container primary
  Normal   Started              7m58s                  kubelet            Started container primary
  Normal   Killing              7m58s                  kubelet            Stopping container primary
  Warning  Evicted              7m30s                  kubelet            The node was low on resource: ephemeral-storage. Container primary was using 13516Ki, which exceeds its request of 0.
> kubectl describe nodes <>
Warning  FreeDiskSpaceFailed      52m                    kubelet                failed to garbage collect required amount of images. Wanted to free 110758122291 bytes, but freed 155692522 bytes
  Warning  ImageGCFailed            52m                    kubelet                failed to garbage collect required amount of images. Wanted to free 110758122291 bytes, but freed 155692522 bytes
  Warning  FreeDiskSpaceFailed      47m                    kubelet                failed to garbage collect required amount of images. Wanted to free 111138763571 bytes, but freed 0 bytes
  Warning  ImageGCFailed            47m                    kubelet                failed to garbage collect required amount of images. Wanted to free 111138763571 bytes, but freed 0 bytes
  Warning  EvictionThresholdMet     7m56s (x3 over 11m)    kubelet                Attempting to reclaim ephemeral-storage
  Normal   NodeNotReady             7m49s                  node-controller        Node 1fefe346c083 status is now: NodeNotReady
  Normal   NodeHasSufficientMemory  7m47s (x3 over 57m)    kubelet                Node 1fefe346c083 status is now: NodeHasSufficientMemory
  Normal   NodeHasDiskPressure      7m47s (x2 over 11m)    kubelet                Node 1fefe346c083 status is now: NodeHasDiskPressure
I didn’t observe these issues in this image
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>
tag: v1.8.1
Is there a way to invalidate auth token? We have 2 separate environment and use google auth. Current...
p

Pradithya Aria Pura

almost 4 years ago
Is there a way to invalidate auth token? We have 2 separate environment and use google auth. Currently, if users switch environment, they will have authentication issue
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2022-01-05T11:44:53+08:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [root] updated. No update handler registered.","ts":"2022-01-05T11:44:53+08:00"}
{"json":{"src":"viper.go:400"},"level":"debug","msg":"Config section [admin] updated. Firing updated event.","ts":"2022-01-05T11:44:53+08:00"}
{"json":{"src":"auth_flow_orchestrator.go:37"},"level":"debug","msg":"got a response from the refresh grant for old expiry 2022-01-05 11:54:53.262787 +0800 +08 with new expiry 2022-01-05 11:54:53.262787 +0800 +08","ts":"2022-01-05T11:44:54+08:00"}
{"json":{"src":"client.go:54"},"level":"info","msg":"Initialized Admin client","ts":"2022-01-05T11:44:54+08:00"}
Launch plan plan_scorer_data_pipeline.workflows.launchplan.plan_scorer_pipeline_workflow_schedule failed to get updated due to rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken
Error: rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken
{"json":{"src":"main.go:13"},"level":"error","msg":"rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken","ts":"2022-01-05T11:44:54+08:00"}
Any workaround for this?
Hello, I was exploring on Kubernetes Spark job and i tried to implement it by following this <Docum...
c

Chandramoulee K V

almost 3 years ago
Hello, I was exploring on Kubernetes Spark job and i tried to implement it by following this Documentation . This is done in a EKS setup. I have created a custom docker image for spark as specified in the documentation, (only thing i did was i commented the following out in the docker file
# Copy the makefile targets to expose on the container. This makes it easier to register.
# Delete this after we update CI to not serialize inside the container
# COPY k8s_spark/sandbox.config /root
# Copy the actual code
# COPY k8s_spark/ /root/k8s_spark
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
# ARG tag
# ENV FLYTE_INTERNAL_IMAGE $tag
# Copy over the helper script that the SDK relies on
# RUN cp ${VENV}/bin/flytekit_venv /usr/local/bin/
# RUN chmod a+x /usr/local/bin/flytekit_venv
) I registered the sample pyspark workflow with the image and i am facing this issue:
failed
SYSTEM ERROR! Contact platform administrators.
When looking at the logs in aws i found that it was unable to load native-hadoop library warning could this be the cause of this issue any idea?
{"log":"22/11/24 07:03:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
","stream":"stderr","docker":{"container_id":"XXX"},"kubernetes":{"container_name":"YYY","namespace_name":"flytesnacks-development","pod_name":"ZZZ","pod_id":"AAA","namespace_id":"BBB","namespace_labels":{"kubernetes_io/metadata_name":"flytesnacks-development"}}}
Hi Community, is there any simple approach to verify the GRPC service of flyte admin works as expect...
x

Xuan Hu

almost 3 years ago
Hi Community, is there any simple approach to verify the GRPC service of flyte admin works as expected? I tried to deploy
flyte-core
helm chart on self-hosted kubernetes cluster but encounter certificate problem when trying to register a workflow remotely. The service is deployed with “Kubernetes Ingress Controller Fake Certificate” and all the ssl/tls related settings should be configured with default value of the template. I roughly looked through them, but did not find any obvious problem. BTW, the flyte console seems to work fine. When I try to
flytectl register
with client config
admin.insecure: false
(the default value by
flytectl config init
), it complains about
$ flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version latest
 ------------------------------------------------------------------ -------- ----------------------------------------------------
| NAME                                                             | STATUS | ADDITIONAL INFO                                    |
 ------------------------------------------------------------------ -------- ----------------------------------------------------
| /tmp/register2617257857/0_flyte.workflows.example.say_hello_1.pb | Failed | Error registering file due to rpc error: code =    |
|                                                                  |        | Unavailable desc = connection error: desc =        |
|                                                                  |        | "transport: authentication handshake failed: x509: |
|                                                                  |        | "Kubernetes Ingress Controller Fake Certificate"   |
|                                                                  |        | certificate is not trusted"                        |
 ------------------------------------------------------------------ -------- ----------------------------------------------------
1 rows
Error: Connection Info: [Endpoint: dns:///flyte.XXX.com, InsecureConnection?: false, AuthMode: Pkce]: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: x509: "Kubernetes Ingress Controller Fake Certificate" certificate is not trusted"
After changing the
insecure
config to
true
, the error message becomes
$ flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version latest
 ------------------------------------------------------------------ -------- ----------------------------------------------------
| NAME                                                             | STATUS | ADDITIONAL INFO                                    |
 ------------------------------------------------------------------ -------- ----------------------------------------------------
| /tmp/register3222452968/0_flyte.workflows.example.say_hello_1.pb | Failed | Error registering file due to rpc error: code =    |
|                                                                  |        | Unavailable desc = connection closed before server |
|                                                                  |        | preface received                                   |
 ------------------------------------------------------------------ -------- ----------------------------------------------------
1 rows
Error: Connection Info: [Endpoint: dns:///flyte.XXX.com, InsecureConnection?: true, AuthMode: Pkce]: rpc error: code = Unavailable desc = connection closed before server preface received
Actually, I am not sure the problem is caused by inappropriate client config or server settings. So I suppose the first step is to check the GRPC service of flyte admin. Just let me know if you have any comments. Thanks in advance.
Hey, I am trying to configure Kafka to consume cloudevents. Seems like I was getting errors at kafka...
z

Zhiyi Li

almost 3 years ago
Hey, I am trying to configure Kafka to consume cloudevents. Seems like I was getting errors at kafka initialization.
{"json":{"src":"base.go:73"},"level":"fatal","msg":"caught panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?) [goroutine 1 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\<http://ngithub.com/flyteorg/flyteadmin/pkg/rpc/adminservice.NewAdminServer.func1()\n\t/go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/base.go:73|ngithub.com/flyteorg/flyteadmin/pkg/rpc/adminservice.NewAdminServer.func1()\n\t/go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/base.go:73> +0x88\npanic({0x224d580, 0xc0009c8300})\n\t/usr/local/go/src/runtime/panic.go:838 +0x207\<http://ngithub.com/flyteorg/flyteadmin/pkg/async/cloudevent.NewCloudEventsPublisher(|ngithub.com/flyteorg/flyteadmin/pkg/async/cloudevent.NewCloudEventsPublisher(>{0x2bd10a0, 0xc00005a018}, {0x1, {0xc000f2e460, 0x5}, {{0x0, 0x0}},
The weird thing is that we tried to use the same version sarama client to connect to the kafka instance in the same cluster, and it works. (Also I was able to success at this step, yet after I redeployed with eventTypes set to a single type it failed and could never get back to normal) The cloudevent config I used is the following:
cloudevents:
    enable: true
    kafka:
      brokers: kafkaip
      version:
        version:
          - 2
          - 2
          - 0
          - 0
    eventsPublisher:
      eventTypes:
      - all
      topicName: workflow-engine-test
    type: Kafka
Any clue what could be the problem?