Specifically in the context of …. ```run_tests_out...
# flytekit
e
Specifically in the context of ….
Copy code
run_tests_output=$(/home/runner/work/flyte/flyte/boilerplate/flyte/end2end/end2end.sh /home/runner/work/flyte/flyte/.github/ci_config/config.yaml )
Traceback (most recent call last):
  File "./boilerplate/flyte/end2end/run-tests.py", line 11, in <module>
    from flytekit.remote import FlyteRemote
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 253, in <module>
    load_implicit_plugins()
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 247, in load_implicit_plugins
    discovered_plugins = entry_points(group="flytekit.plugins")
TypeError: entry_points() got an unexpected keyword argument 'group'
t
hi
so this end2end thing is kinda dead - you know that right? you’re reviving it in a better form right?
e
?
t
just saying, we don’t run that script anymore
irrelevant to your question i know
e
ah
what’s in its place?
what is doing nightly tests?
t
in public… nothing unf, afaik
which is why we’re hoping to revive this
on our internal clusters, we’re still testing
e
t
but we’d like to get the public ones going and in better shape again
wrt your question though, this is what we’re doing https://importlib-metadata.readthedocs.io/en/latest/using.html#entry-points
the
group
argument should be defined.
e
I had worked to create/destroy clusters nightly [ or on demand for releases/other ]
👍 1
t
if it’s not, is somehow the
entry_points
identifier getting clobbered somehow?
yeah that would be ideal, thank you!
e
and … was to use the existing stuff from genesis_device … to just get something running OSS, and then optimize from there.
the create/destroy works fine
t
perfect thank you.
e
but, I don’t intend [ right now ] to recreate the testing infra … imagine that whatever you’re currently doing should work
t
i believe the issue is (and it’s been a while since I touched this so I might be behind the curve) that if you follow the end2end script, it’ll eventually lead you here: https://github.com/flyteorg/flytetools/tree/master/flytetester/app/workflows
all that code was written in a legacy API and is no longer operable
honestly it probably should’ve already been deleted.
e
that’s what I was wondering
t
so the end2end script if it doesn’t already will need to be updated to basically do what we’re doing internally every night
which is to run a collection of the flytesnacks cookbook examples
e
exactly
t
perfect
e
haytham had shared genesis-device repo some months back
[ the actual static code ]
Sounds like that’s been updated since then?
t
not really no…
we update the flyte release versions but that’s about it
the internal nightly testing stuff i think is in another repo
@echoing-translator-95395 - the failure was this one right? https://github.com/flyteorg/flyte/runs/7077307934?check_suite_focus=true this is the only one i saw. the other failures seem to be different.
the most recent error is
Copy code
Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream
e
I hadn’t seen that most recent one …
Copy code
Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream
that’s coming from a different workflow
“Functional test for sandbox image”
( that’s from master branch … but looks like runs on PRs on any branch )
That looks like it tests the sandbox image and snacks…. Not testing the cloud deployment, and snacks.
t
wrt the entrypoints error though, can you help me try something? i’d like to add to the command
e
i’m able to hop on a call — if helpful
t
no it should be quick…
pip show importlib-metadata
i just want to see the output of that.
what’s weird is that if you look at the “Setup Flytekit” section of the log, it’s not there
that library i mean
e
so basically just add that to the workflow so we can see what’s installed/running in the github action
t
yeah
the “importlib-metadata” library is not directly required by flytekit (even though it should be) but it’s already in multiple dependencies that are in setup.py. click and keyring both use it
e
is running
t
thanks
Copy code
Name: importlib-metadata
Version: 1.5.0
Summary: Read metadata from Python packages
Home-page: <http://importlib-metadata.readthedocs.io/>
Author: Barry Warsaw
Author-email: <mailto:barry@python.org|barry@python.org>
License: Apache Software License
Location: /usr/lib/python3/dist-packages
Requires: 
Required-by:
at bottom of “Setup Flytekit”
t
hmm
sorry, was afk
Copy code
$ pip show importlib-metadata
Name: importlib-metadata
Version: 4.11.3
is what i have locally
that’s what it’s supposed to be… i have no idea why it’s so far back.
let me make a PR to add to setup.py in fltyekit
e
Copy code
pip install --upgrade importlib-metadata
just tried adding that to the workflow … to see if that addresses the failure [ is not a good solution for production, though ] … setup.py much better.
t
thanks.
e
nice … we got a new error!
I’m gonna head out to kiteboard pretty soon — so afk till late tonight. Don’t hesitate to re-trigger actions, and/or push things to that
opta-aws
branch, if makes sense or u r trying to explore things.
t
all good go have fun!
g
@echoing-translator-95395 in genesis_device we don’t have any anything that is not public, Everything is available here https://github.com/flyteorg/boilerplate/tree/master/boilerplate/flyte/end2end In genesis_device we use same boilerplate for functional test, The only difference is our aws setup for upgrading flyte nightly.
👍 1
e
@great-school-54368 FYI --> https://github.com/flyteorg/flyte/blob/opta-aws/.github/workflows/workflow.yml and things around that [ you’ll notice lots should look familiar ]. That can take care of nightly testing, and testing to ensure that the getting started/deployment [ on aws ] guide is consistently working. Not understanding why there is an issue with running that on an AWS EKS cluster with self-signed certs. @thankful-minister-83577 found one version incompatibility, and there easily might be more, which could be the culprit.
👍 1
what’s the size and number of the machines in genesis-device cluster? From the opta stuff from the code I saw, it looks like the min/default [ 3 medium … ], trying to confirm if that’s the case?
t
the internal cluster we use for testing?
3 yeah
feel free to make the functional test one bigger
use 6
e
Ya, 6 it is then. I want to rule out resource issues somehow blocking network connectivity. Ran with 15 nodes and still got issues.
what instance_type?
t
hmm
t3.medium
e
I think optas default was t3.medium
👍
t
let me sign into kubectl again
e
on debugging … i think issue might have to do with a specific test. Currently is OK, and shutting down [ with a more limited set of tests ]: https://github.com/flyteorg/flyte/actions/runs/2678835918
t
okay
let me know when you want me to take a look
have a meeting 3-4 but can hop on this otherwise
e
my debug option seems to be process of elimination, keep adding workflows to be run, until determine which is messing things up, ex: https://github.com/flyteorg/flyte/blob/opta-aws/boilerplate/flyte/end2end/run-tests.py#L41-L55
not sure where else to debug. At the moment there is a happy path, which breaks with a rather common error when LOTS/ALL the workflows are run. So, seems like need to keep adding/removing until figure out the culprits.
t
sorry that’s slow.
sorry you have to deal with that
let me know next time it’s happening and i can hop onto kubectl and poke around
maybe there’ll be something obvious
e
I can rerun with the ‘whole’ list … so, could kick that off in a minute, which means cluster would be up in ~30.
t
k
e
ps. seems to report the same error as: https://flyte-org.slack.com/archives/CP2HDHKE1/p1657905793892719 ( which is why I say is a ‘common’ error ).
that’s mostly a way of saying - once determining the cause - adding some better error messaging might be helpful.
167 Views