https://flyte.org logo
a

austin

06/27/2022, 4:49 PM
Specifically in the context of ….
Copy code
run_tests_output=$(/home/runner/work/flyte/flyte/boilerplate/flyte/end2end/end2end.sh /home/runner/work/flyte/flyte/.github/ci_config/config.yaml )
Traceback (most recent call last):
  File "./boilerplate/flyte/end2end/run-tests.py", line 11, in <module>
    from flytekit.remote import FlyteRemote
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 253, in <module>
    load_implicit_plugins()
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 247, in load_implicit_plugins
    discovered_plugins = entry_points(group="flytekit.plugins")
TypeError: entry_points() got an unexpected keyword argument 'group'
y

Yee

06/27/2022, 5:18 PM
hi
so this end2end thing is kinda dead - you know that right? you’re reviving it in a better form right?
a

austin

06/27/2022, 5:19 PM
?
y

Yee

06/27/2022, 5:20 PM
just saying, we don’t run that script anymore
irrelevant to your question i know
a

austin

06/27/2022, 5:20 PM
ah
what’s in its place?
what is doing nightly tests?
y

Yee

06/27/2022, 5:22 PM
in public… nothing unf, afaik
which is why we’re hoping to revive this
on our internal clusters, we’re still testing
a
y

Yee

06/27/2022, 5:22 PM
but we’d like to get the public ones going and in better shape again
wrt your question though, this is what we’re doing https://importlib-metadata.readthedocs.io/en/latest/using.html#entry-points
the
group
argument should be defined.
a

austin

06/27/2022, 5:23 PM
I had worked to create/destroy clusters nightly [ or on demand for releases/other ]
👍 1
y

Yee

06/27/2022, 5:24 PM
if it’s not, is somehow the
entry_points
identifier getting clobbered somehow?
yeah that would be ideal, thank you!
a

austin

06/27/2022, 5:24 PM
and … was to use the existing stuff from genesis_device … to just get something running OSS, and then optimize from there.
the create/destroy works fine
y

Yee

06/27/2022, 5:24 PM
perfect thank you.
a

austin

06/27/2022, 5:25 PM
but, I don’t intend [ right now ] to recreate the testing infra … imagine that whatever you’re currently doing should work
y

Yee

06/27/2022, 5:25 PM
i believe the issue is (and it’s been a while since I touched this so I might be behind the curve) that if you follow the end2end script, it’ll eventually lead you here: https://github.com/flyteorg/flytetools/tree/master/flytetester/app/workflows
all that code was written in a legacy API and is no longer operable
honestly it probably should’ve already been deleted.
a

austin

06/27/2022, 5:26 PM
that’s what I was wondering
y

Yee

06/27/2022, 5:26 PM
so the end2end script if it doesn’t already will need to be updated to basically do what we’re doing internally every night
which is to run a collection of the flytesnacks cookbook examples
a

austin

06/27/2022, 5:27 PM
exactly
y

Yee

06/27/2022, 5:27 PM
perfect
a

austin

06/27/2022, 5:27 PM
haytham had shared genesis-device repo some months back
[ the actual static code ]
Sounds like that’s been updated since then?
y

Yee

06/27/2022, 5:28 PM
not really no…
we update the flyte release versions but that’s about it
the internal nightly testing stuff i think is in another repo
@austin - the failure was this one right? https://github.com/flyteorg/flyte/runs/7077307934?check_suite_focus=true this is the only one i saw. the other failures seem to be different.
the most recent error is
Copy code
Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream
a

austin

06/27/2022, 7:51 PM
I hadn’t seen that most recent one …
Copy code
Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream
that’s coming from a different workflow
“Functional test for sandbox image”
( that’s from master branch … but looks like runs on PRs on any branch )
That looks like it tests the sandbox image and snacks…. Not testing the cloud deployment, and snacks.
y

Yee

06/27/2022, 7:54 PM
wrt the entrypoints error though, can you help me try something? i’d like to add to the command
a

austin

06/27/2022, 7:55 PM
i’m able to hop on a call — if helpful
y

Yee

06/27/2022, 7:55 PM
no it should be quick…
pip show importlib-metadata
i just want to see the output of that.
what’s weird is that if you look at the “Setup Flytekit” section of the log, it’s not there
that library i mean
a

austin

06/27/2022, 7:56 PM
so basically just add that to the workflow so we can see what’s installed/running in the github action
y

Yee

06/27/2022, 7:57 PM
yeah
the “importlib-metadata” library is not directly required by flytekit (even though it should be) but it’s already in multiple dependencies that are in setup.py. click and keyring both use it
a

austin

06/27/2022, 7:59 PM
is running
y

Yee

06/27/2022, 7:59 PM
thanks
Copy code
Name: importlib-metadata
Version: 1.5.0
Summary: Read metadata from Python packages
Home-page: <http://importlib-metadata.readthedocs.io/>
Author: Barry Warsaw
Author-email: <mailto:barry@python.org|barry@python.org>
License: Apache Software License
Location: /usr/lib/python3/dist-packages
Requires: 
Required-by:
at bottom of “Setup Flytekit”
y

Yee

06/27/2022, 9:02 PM
hmm
sorry, was afk
Copy code
$ pip show importlib-metadata
Name: importlib-metadata
Version: 4.11.3
is what i have locally
that’s what it’s supposed to be… i have no idea why it’s so far back.
let me make a PR to add to setup.py in fltyekit
a

austin

06/27/2022, 9:04 PM
Copy code
pip install --upgrade importlib-metadata
just tried adding that to the workflow … to see if that addresses the failure [ is not a good solution for production, though ] … setup.py much better.
y

Yee

06/27/2022, 9:05 PM
thanks.
a

austin

06/27/2022, 9:18 PM
nice … we got a new error!
I’m gonna head out to kiteboard pretty soon — so afk till late tonight. Don’t hesitate to re-trigger actions, and/or push things to that
opta-aws
branch, if makes sense or u r trying to explore things.
y

Yee

06/27/2022, 9:31 PM
all good go have fun!
y

Yuvraj

06/28/2022, 1:48 AM
@austin in genesis_device we don’t have any anything that is not public, Everything is available here https://github.com/flyteorg/boilerplate/tree/master/boilerplate/flyte/end2end In genesis_device we use same boilerplate for functional test, The only difference is our aws setup for upgrading flyte nightly.
👍 1
a

austin

06/28/2022, 2:56 AM
@Yuvraj FYI --> https://github.com/flyteorg/flyte/blob/opta-aws/.github/workflows/workflow.yml and things around that [ you’ll notice lots should look familiar ]. That can take care of nightly testing, and testing to ensure that the getting started/deployment [ on aws ] guide is consistently working. Not understanding why there is an issue with running that on an AWS EKS cluster with self-signed certs. @Yee found one version incompatibility, and there easily might be more, which could be the culprit.
👍 1
what’s the size and number of the machines in genesis-device cluster? From the opta stuff from the code I saw, it looks like the min/default [ 3 medium … ], trying to confirm if that’s the case?
y

Yee

07/15/2022, 9:01 PM
the internal cluster we use for testing?
3 yeah
feel free to make the functional test one bigger
use 6
a

austin

07/15/2022, 9:03 PM
Ya, 6 it is then. I want to rule out resource issues somehow blocking network connectivity. Ran with 15 nodes and still got issues.
what instance_type?
y

Yee

07/15/2022, 9:03 PM
hmm
t3.medium
a

austin

07/15/2022, 9:04 PM
I think optas default was t3.medium
👍
y

Yee

07/15/2022, 9:04 PM
let me sign into kubectl again
a

austin

07/15/2022, 9:05 PM
on debugging … i think issue might have to do with a specific test. Currently is OK, and shutting down [ with a more limited set of tests ]: https://github.com/flyteorg/flyte/actions/runs/2678835918
y

Yee

07/15/2022, 9:06 PM
okay
let me know when you want me to take a look
have a meeting 3-4 but can hop on this otherwise
a

austin

07/15/2022, 9:07 PM
my debug option seems to be process of elimination, keep adding workflows to be run, until determine which is messing things up, ex: https://github.com/flyteorg/flyte/blob/opta-aws/boilerplate/flyte/end2end/run-tests.py#L41-L55
not sure where else to debug. At the moment there is a happy path, which breaks with a rather common error when LOTS/ALL the workflows are run. So, seems like need to keep adding/removing until figure out the culprits.
y

Yee

07/15/2022, 9:09 PM
sorry that’s slow.
sorry you have to deal with that
let me know next time it’s happening and i can hop onto kubectl and poke around
maybe there’ll be something obvious
a

austin

07/15/2022, 9:10 PM
I can rerun with the ‘whole’ list … so, could kick that off in a minute, which means cluster would be up in ~30.
y

Yee

07/15/2022, 9:11 PM
k
a

austin

07/15/2022, 9:13 PM
ps. seems to report the same error as: https://flyte-org.slack.com/archives/CP2HDHKE1/p1657905793892719 ( which is why I say is a ‘common’ error ).
that’s mostly a way of saying - once determining the cause - adding some better error messaging might be helpful.
7 Views