Specifically in the context of …. ```run_tests_out...
# flytekit
a
Specifically in the context of ….
Copy code
run_tests_output=$(/home/runner/work/flyte/flyte/boilerplate/flyte/end2end/end2end.sh /home/runner/work/flyte/flyte/.github/ci_config/config.yaml )
Traceback (most recent call last):
  File "./boilerplate/flyte/end2end/run-tests.py", line 11, in <module>
    from flytekit.remote import FlyteRemote
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 253, in <module>
    load_implicit_plugins()
  File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 247, in load_implicit_plugins
    discovered_plugins = entry_points(group="flytekit.plugins")
TypeError: entry_points() got an unexpected keyword argument 'group'
y
hi
so this end2end thing is kinda dead - you know that right? you’re reviving it in a better form right?
a
?
y
just saying, we don’t run that script anymore
irrelevant to your question i know
a
ah
what’s in its place?
what is doing nightly tests?
y
in public… nothing unf, afaik
which is why we’re hoping to revive this
on our internal clusters, we’re still testing
a
y
but we’d like to get the public ones going and in better shape again
wrt your question though, this is what we’re doing https://importlib-metadata.readthedocs.io/en/latest/using.html#entry-points
the
group
argument should be defined.
a
I had worked to create/destroy clusters nightly [ or on demand for releases/other ]
👍 1
y
if it’s not, is somehow the
entry_points
identifier getting clobbered somehow?
yeah that would be ideal, thank you!
a
and … was to use the existing stuff from genesis_device … to just get something running OSS, and then optimize from there.
the create/destroy works fine
y
perfect thank you.
a
but, I don’t intend [ right now ] to recreate the testing infra … imagine that whatever you’re currently doing should work
y
i believe the issue is (and it’s been a while since I touched this so I might be behind the curve) that if you follow the end2end script, it’ll eventually lead you here: https://github.com/flyteorg/flytetools/tree/master/flytetester/app/workflows
all that code was written in a legacy API and is no longer operable
honestly it probably should’ve already been deleted.
a
that’s what I was wondering
y
so the end2end script if it doesn’t already will need to be updated to basically do what we’re doing internally every night
which is to run a collection of the flytesnacks cookbook examples
a
exactly
y
perfect
a
haytham had shared genesis-device repo some months back
[ the actual static code ]
Sounds like that’s been updated since then?
y
not really no…
we update the flyte release versions but that’s about it
the internal nightly testing stuff i think is in another repo
@austin - the failure was this one right? https://github.com/flyteorg/flyte/runs/7077307934?check_suite_focus=true this is the only one i saw. the other failures seem to be different.
the most recent error is
Copy code
Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream
a
I hadn’t seen that most recent one …
Copy code
Error: Command failed: /opt/hostedtoolcache/flytectl/latest/x64/flytectl register examples -p flytesnacks -d development Error: example 0xc0009db310 failed to register rpc error: code = Unavailable desc = no healthy upstream
that’s coming from a different workflow
“Functional test for sandbox image”
( that’s from master branch … but looks like runs on PRs on any branch )
That looks like it tests the sandbox image and snacks…. Not testing the cloud deployment, and snacks.
y
wrt the entrypoints error though, can you help me try something? i’d like to add to the command
a
i’m able to hop on a call — if helpful
y
no it should be quick…
pip show importlib-metadata
i just want to see the output of that.
what’s weird is that if you look at the “Setup Flytekit” section of the log, it’s not there
that library i mean
a
so basically just add that to the workflow so we can see what’s installed/running in the github action
y
yeah
the “importlib-metadata” library is not directly required by flytekit (even though it should be) but it’s already in multiple dependencies that are in setup.py. click and keyring both use it
a
is running
y
thanks
Copy code
Name: importlib-metadata
Version: 1.5.0
Summary: Read metadata from Python packages
Home-page: <http://importlib-metadata.readthedocs.io/>
Author: Barry Warsaw
Author-email: <mailto:barry@python.org|barry@python.org>
License: Apache Software License
Location: /usr/lib/python3/dist-packages
Requires: 
Required-by:
at bottom of “Setup Flytekit”
y
hmm
sorry, was afk
Copy code
$ pip show importlib-metadata
Name: importlib-metadata
Version: 4.11.3
is what i have locally
that’s what it’s supposed to be… i have no idea why it’s so far back.
let me make a PR to add to setup.py in fltyekit
a
Copy code
pip install --upgrade importlib-metadata
just tried adding that to the workflow … to see if that addresses the failure [ is not a good solution for production, though ] … setup.py much better.
y
thanks.
a
nice … we got a new error!
I’m gonna head out to kiteboard pretty soon — so afk till late tonight. Don’t hesitate to re-trigger actions, and/or push things to that
opta-aws
branch, if makes sense or u r trying to explore things.
y
all good go have fun!
y
@austin in genesis_device we don’t have any anything that is not public, Everything is available here https://github.com/flyteorg/boilerplate/tree/master/boilerplate/flyte/end2end In genesis_device we use same boilerplate for functional test, The only difference is our aws setup for upgrading flyte nightly.
👍 1
a
@Yuvraj FYI --> https://github.com/flyteorg/flyte/blob/opta-aws/.github/workflows/workflow.yml and things around that [ you’ll notice lots should look familiar ]. That can take care of nightly testing, and testing to ensure that the getting started/deployment [ on aws ] guide is consistently working. Not understanding why there is an issue with running that on an AWS EKS cluster with self-signed certs. @Yee found one version incompatibility, and there easily might be more, which could be the culprit.
👍 1
what’s the size and number of the machines in genesis-device cluster? From the opta stuff from the code I saw, it looks like the min/default [ 3 medium … ], trying to confirm if that’s the case?
y
the internal cluster we use for testing?
3 yeah
feel free to make the functional test one bigger
use 6
a
Ya, 6 it is then. I want to rule out resource issues somehow blocking network connectivity. Ran with 15 nodes and still got issues.
what instance_type?
y
hmm
t3.medium
a
I think optas default was t3.medium
👍
y
let me sign into kubectl again
a
on debugging … i think issue might have to do with a specific test. Currently is OK, and shutting down [ with a more limited set of tests ]: https://github.com/flyteorg/flyte/actions/runs/2678835918
y
okay
let me know when you want me to take a look
have a meeting 3-4 but can hop on this otherwise
a
my debug option seems to be process of elimination, keep adding workflows to be run, until determine which is messing things up, ex: https://github.com/flyteorg/flyte/blob/opta-aws/boilerplate/flyte/end2end/run-tests.py#L41-L55
not sure where else to debug. At the moment there is a happy path, which breaks with a rather common error when LOTS/ALL the workflows are run. So, seems like need to keep adding/removing until figure out the culprits.
y
sorry that’s slow.
sorry you have to deal with that
let me know next time it’s happening and i can hop onto kubectl and poke around
maybe there’ll be something obvious
a
I can rerun with the ‘whole’ list … so, could kick that off in a minute, which means cluster would be up in ~30.
y
k
a
ps. seems to report the same error as: https://flyte-org.slack.com/archives/CP2HDHKE1/p1657905793892719 ( which is why I say is a ‘common’ error ).
that’s mostly a way of saying - once determining the cause - adding some better error messaging might be helpful.
165 Views