Specifically in the context of ```run tests output=$ home ru Flyte #flytekit

Specifically in the context of …. ```run_tests_out...

echoing-translator-95395

06/27/2022, 4:49 PM

Specifically in the context of ….

Copy code

run_tests_output=$(/home/runner/work/flyte/flyte/boilerplate/flyte/end2end/end2end.sh /home/runner/work/flyte/flyte/.github/ci_config/config.yaml )
Traceback (most recent call last):
  File "./boilerplate/flyte/end2end/run-tests.py", line 11, in <module>
    from flytekit.remote import FlyteRemote
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 253, in <module>
    load_implicit_plugins()
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 247, in load_implicit_plugins
    discovered_plugins = entry_points(group="flytekit.plugins")
TypeError: entry_points() got an unexpected keyword argument 'group'

thankful-minister-83577

06/27/2022, 5:18 PM

thankful-minister-83577

06/27/2022, 5:18 PM

so this end2end thing is kinda dead - you know that right? you’re reviving it in a better form right?

echoing-translator-95395

06/27/2022, 5:19 PM

thankful-minister-83577

06/27/2022, 5:20 PM

just saying, we don’t run that script anymore

thankful-minister-83577

06/27/2022, 5:20 PM

irrelevant to your question i know

echoing-translator-95395

06/27/2022, 5:20 PM

echoing-translator-95395

06/27/2022, 5:20 PM

what’s in its place?

echoing-translator-95395

06/27/2022, 5:21 PM

what is doing nightly tests?

thankful-minister-83577

06/27/2022, 5:22 PM

in public… nothing unf, afaik

thankful-minister-83577

06/27/2022, 5:22 PM

which is why we’re hoping to revive this

thankful-minister-83577

06/27/2022, 5:22 PM

on our internal clusters, we’re still testing

echoing-translator-95395

06/27/2022, 5:22 PM

ex: https://github.com/flyteorg/flyte/issues/2421

thankful-minister-83577

06/27/2022, 5:22 PM

but we’d like to get the public ones going and in better shape again

thankful-minister-83577

06/27/2022, 5:23 PM

wrt your question though, this is what we’re doing https://importlib-metadata.readthedocs.io/en/latest/using.html#entry-points

thankful-minister-83577

06/27/2022, 5:23 PM

the

group

argument should be defined.

echoing-translator-95395

06/27/2022, 5:23 PM

I had worked to create/destroy clusters nightly [ or on demand for releases/other ]

👍 1

thankful-minister-83577

06/27/2022, 5:24 PM

if it’s not, is somehow the

entry_points

identifier getting clobbered somehow?

thankful-minister-83577

06/27/2022, 5:24 PM

yeah that would be ideal, thank you!

echoing-translator-95395

06/27/2022, 5:24 PM

and … was to use the existing stuff from genesis_device … to just get something running OSS, and then optimize from there.

echoing-translator-95395

06/27/2022, 5:24 PM

the create/destroy works fine

thankful-minister-83577

06/27/2022, 5:24 PM

perfect thank you.

echoing-translator-95395

06/27/2022, 5:25 PM

but, I don’t intend [ right now ] to recreate the testing infra … imagine that whatever you’re currently doing should work

thankful-minister-83577

06/27/2022, 5:25 PM

i believe the issue is (and it’s been a while since I touched this so I might be behind the curve) that if you follow the end2end script, it’ll eventually lead you here: https://github.com/flyteorg/flytetools/tree/master/flytetester/app/workflows

thankful-minister-83577

06/27/2022, 5:26 PM

all that code was written in a legacy API and is no longer operable

thankful-minister-83577

06/27/2022, 5:26 PM

honestly it probably should’ve already been deleted.

echoing-translator-95395

06/27/2022, 5:26 PM

that’s what I was wondering

thankful-minister-83577

06/27/2022, 5:26 PM

so the end2end script if it doesn’t already will need to be updated to basically do what we’re doing internally every night

thankful-minister-83577

06/27/2022, 5:26 PM

which is to run a collection of the flytesnacks cookbook examples

echoing-translator-95395

06/27/2022, 5:27 PM

exactly

thankful-minister-83577

06/27/2022, 5:27 PM

perfect

echoing-translator-95395

06/27/2022, 5:27 PM

haytham had shared genesis-device repo some months back

echoing-translator-95395

06/27/2022, 5:27 PM

[ the actual static code ]

echoing-translator-95395

06/27/2022, 5:27 PM

Sounds like that’s been updated since then?

thankful-minister-83577

06/27/2022, 5:28 PM

not really no…

thankful-minister-83577

06/27/2022, 5:28 PM

we update the flyte release versions but that’s about it

thankful-minister-83577

06/27/2022, 5:28 PM

the internal nightly testing stuff i think is in another repo

thankful-minister-83577

06/27/2022, 7:50 PM

@echoing-translator-95395 - the failure was this one right? https://github.com/flyteorg/flyte/runs/7077307934?check_suite_focus=true this is the only one i saw. the other failures seem to be different.

thankful-minister-83577

06/27/2022, 7:50 PM

the most recent error is

Copy code

Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream

echoing-translator-95395

06/27/2022, 7:51 PM

I hadn’t seen that most recent one …

Copy code

Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream

echoing-translator-95395

06/27/2022, 7:51 PM

Ah … https://github.com/flyteorg/flyte/runs/7078368046?check_suite_focus=true

echoing-translator-95395

06/27/2022, 7:51 PM

that’s coming from a different workflow

echoing-translator-95395

06/27/2022, 7:52 PM

“Functional test for sandbox image”

echoing-translator-95395

06/27/2022, 7:53 PM

FROM: https://github.com/flyteorg/flyte/blob/master/.github/workflows/functional_test.yaml

echoing-translator-95395

06/27/2022, 7:53 PM

( that’s from master branch … but looks like runs on PRs on any branch )

echoing-translator-95395

06/27/2022, 7:54 PM

That looks like it tests the sandbox image and snacks…. Not testing the cloud deployment, and snacks.

thankful-minister-83577

06/27/2022, 7:54 PM

wrt the entrypoints error though, can you help me try something? i’d like to add to the command

echoing-translator-95395

06/27/2022, 7:55 PM

i’m able to hop on a call — if helpful

thankful-minister-83577

06/27/2022, 7:55 PM

no it should be quick…

thankful-minister-83577

06/27/2022, 7:55 PM

pip show importlib-metadata

thankful-minister-83577

06/27/2022, 7:55 PM

i just want to see the output of that.

thankful-minister-83577

06/27/2022, 7:56 PM

what’s weird is that if you look at the “Setup Flytekit” section of the log, it’s not there

thankful-minister-83577

06/27/2022, 7:56 PM

that library i mean

echoing-translator-95395

06/27/2022, 7:56 PM

so basically just add that to the workflow so we can see what’s installed/running in the github action

thankful-minister-83577

06/27/2022, 7:57 PM

yeah

thankful-minister-83577

06/27/2022, 7:58 PM

the “importlib-metadata” library is not directly required by flytekit (even though it should be) but it’s already in multiple dependencies that are in setup.py. click and keyring both use it

echoing-translator-95395

06/27/2022, 7:59 PM

is running

echoing-translator-95395

06/27/2022, 7:59 PM

https://github.com/flyteorg/flyte/actions/runs/2571702888

thankful-minister-83577

06/27/2022, 7:59 PM

thanks

echoing-translator-95395

06/27/2022, 7:59 PM

https://github.com/flyteorg/flyte/commit/703c363341cba4702c7bd8573b102b8365cb5389

echoing-translator-95395

06/27/2022, 8:01 PM

https://github.com/flyteorg/flyte/runs/7080381782?check_suite_focus=true

echoing-translator-95395

06/27/2022, 8:02 PM

Copy code

Name: importlib-metadata
Version: 1.5.0
Summary: Read metadata from Python packages
Home-page: <http://importlib-metadata.readthedocs.io/>
Author: Barry Warsaw
Author-email: <mailto:barry@python.org|barry@python.org>
License: Apache Software License
Location: /usr/lib/python3/dist-packages
Requires: 
Required-by:

echoing-translator-95395

06/27/2022, 8:02 PM

at bottom of “Setup Flytekit”

thankful-minister-83577

06/27/2022, 9:02 PM

hmm

thankful-minister-83577

06/27/2022, 9:02 PM

sorry, was afk

thankful-minister-83577

06/27/2022, 9:02 PM

Copy code

$ pip show importlib-metadata
Name: importlib-metadata
Version: 4.11.3

is what i have locally

thankful-minister-83577

06/27/2022, 9:03 PM

that’s what it’s supposed to be… i have no idea why it’s so far back.

thankful-minister-83577

06/27/2022, 9:03 PM

let me make a PR to add to setup.py in fltyekit

echoing-translator-95395

06/27/2022, 9:04 PM

Copy code

pip install --upgrade importlib-metadata

echoing-translator-95395

06/27/2022, 9:05 PM

just tried adding that to the workflow … to see if that addresses the failure [ is not a good solution for production, though ] … setup.py much better.

thankful-minister-83577

06/27/2022, 9:05 PM

thanks.

echoing-translator-95395

06/27/2022, 9:18 PM

nice … we got a new error!

echoing-translator-95395

06/27/2022, 9:28 PM

I’m gonna head out to kiteboard pretty soon — so afk till late tonight. Don’t hesitate to re-trigger actions, and/or push things to that

opta-aws

branch, if makes sense or u r trying to explore things.

thankful-minister-83577

06/27/2022, 9:31 PM

all good go have fun!

great-school-54368

06/28/2022, 1:48 AM

@echoing-translator-95395 in genesis_device we don’t have any anything that is not public, Everything is available here https://github.com/flyteorg/boilerplate/tree/master/boilerplate/flyte/end2end In genesis_device we use same boilerplate for functional test, The only difference is our aws setup for upgrading flyte nightly.

👍 1

echoing-translator-95395

06/28/2022, 2:56 AM

@great-school-54368 FYI --> https://github.com/flyteorg/flyte/blob/opta-aws/.github/workflows/workflow.yml and things around that [ you’ll notice lots should look familiar ]. That can take care of nightly testing, and testing to ensure that the getting started/deployment [ on aws ] guide is consistently working. Not understanding why there is an issue with running that on an AWS EKS cluster with self-signed certs. @thankful-minister-83577 found one version incompatibility, and there easily might be more, which could be the culprit.

👍 1

echoing-translator-95395

07/15/2022, 8:32 PM

what’s the size and number of the machines in genesis-device cluster? From the opta stuff from the code I saw, it looks like the min/default [ 3 medium … ], trying to confirm if that’s the case?

thankful-minister-83577

07/15/2022, 9:01 PM

the internal cluster we use for testing?

thankful-minister-83577

07/15/2022, 9:02 PM

3 yeah

thankful-minister-83577

07/15/2022, 9:02 PM

feel free to make the functional test one bigger

thankful-minister-83577

07/15/2022, 9:02 PM

use 6

echoing-translator-95395

07/15/2022, 9:03 PM

Ya, 6 it is then. I want to rule out resource issues somehow blocking network connectivity. Ran with 15 nodes and still got issues.

echoing-translator-95395

07/15/2022, 9:03 PM

what instance_type?

thankful-minister-83577

07/15/2022, 9:03 PM

hmm

thankful-minister-83577

07/15/2022, 9:04 PM

t3.medium

echoing-translator-95395

07/15/2022, 9:04 PM

I think optas default was t3.medium

echoing-translator-95395

07/15/2022, 9:04 PM

👍

thankful-minister-83577

07/15/2022, 9:04 PM

let me sign into kubectl again

echoing-translator-95395

07/15/2022, 9:05 PM

on debugging … i think issue might have to do with a specific test. Currently is OK, and shutting down [ with a more limited set of tests ]: https://github.com/flyteorg/flyte/actions/runs/2678835918

echoing-translator-95395

07/15/2022, 9:05 PM

but had issue https://github.com/flyteorg/flyte/actions/runs/2678784708

thankful-minister-83577

07/15/2022, 9:06 PM

okay

thankful-minister-83577

07/15/2022, 9:06 PM

let me know when you want me to take a look

thankful-minister-83577

07/15/2022, 9:06 PM

have a meeting 3-4 but can hop on this otherwise

echoing-translator-95395

07/15/2022, 9:07 PM

my debug option seems to be process of elimination, keep adding workflows to be run, until determine which is messing things up, ex: https://github.com/flyteorg/flyte/blob/opta-aws/boilerplate/flyte/end2end/run-tests.py#L41-L55

echoing-translator-95395

07/15/2022, 9:09 PM

not sure where else to debug. At the moment there is a happy path, which breaks with a rather common error when LOTS/ALL the workflows are run. So, seems like need to keep adding/removing until figure out the culprits.

thankful-minister-83577

07/15/2022, 9:09 PM

sorry that’s slow.

thankful-minister-83577

07/15/2022, 9:09 PM

sorry you have to deal with that

thankful-minister-83577

07/15/2022, 9:09 PM

let me know next time it’s happening and i can hop onto kubectl and poke around

thankful-minister-83577

07/15/2022, 9:10 PM

maybe there’ll be something obvious

echoing-translator-95395

07/15/2022, 9:10 PM

I can rerun with the ‘whole’ list … so, could kick that off in a minute, which means cluster would be up in ~30.

thankful-minister-83577

07/15/2022, 9:11 PM

echoing-translator-95395

07/15/2022, 9:13 PM

ps. seems to report the same error as: https://flyte-org.slack.com/archives/CP2HDHKE1/p1657905793892719 ( which is why I say is a ‘common’ error ).

echoing-translator-95395

07/15/2022, 9:19 PM

that’s mostly a way of saying - once determining the cause - adding some better error messaging might be helpful.

168 Views

Open in Slack

Previous Next