I'm having trouble registering a workflow. I creat...
# flytekit-java
b
I'm having trouble registering a workflow. I created a separate Scala sbt project and copied the simple Fibonacci workflow. It complies just fine. Then I try to use
jflyte
to register.
Copy code
scripts/jflyte register workflows \                                                                                                                                                                                                                               
             -d=development \
             -p=citrine \
             -v=$(git describe --always) \
             -cp=/home/aczerwon/citrine/job-api-sandbox/jvm/scala-sandbox/target/scala-2.13
and that results in an error
Copy code
java.lang.RuntimeException: Directory doesn't exist [/home/aczerwon/citrine/job-api-sandbox/jvm/scala-sandbox/target/scala-2.13]
        at org.flyte.jflyte.utils.ClassLoaders.listDirectory(ClassLoaders.java:79)
        at org.flyte.jflyte.utils.ClassLoaders.getClassLoaderUrls(ClassLoaders.java:61)
        at org.flyte.jflyte.utils.ClassLoaders.lambda$forDirectory$1(ClassLoaders.java:55)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at org.flyte.jflyte.utils.ClassLoaders.forDirectory(ClassLoaders.java:54)
        at org.flyte.jflyte.utils.ProjectClosure.loadAndStage(ProjectClosure.java:170)
        at org.flyte.jflyte.RegisterWorkflows.call(RegisterWorkflows.java:98)
        at org.flyte.jflyte.RegisterWorkflows.call(RegisterWorkflows.java:38)
        at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
        at picocli.CommandLine.access$1500(CommandLine.java:148)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
        at picocli.CommandLine.execute(CommandLine.java:2174)
        at org.flyte.jflyte.Main.main(Main.java:86)
Copy code
java.lang.RuntimeException: Directory doesn't exist [/home/aczerwon/citrine/job-api-sandbox/jvm/scala-sandbox/target/scala-2.13]
It definitely exists
b
@bumpy-match-83743 can you try
target/lib
?
b
that's not where the jar lives
b
Where are your jars right now?
b
in the path I specified
b
You should have one jar with the name of the project
b
correct
I can create a video if that works?
b
If you unzip, it should have a META-INF file with all the staged classes.
Sure
I guess also it depends how you've compile and all. Are you using
sbt
?
b
yes
ah.. should I make an uber jar is everything?
This is what the jar looks like
b
Seems fine. Let me test this on my end
b
I'll send you the project... stand-by
it's tiny
jump into
jvm/scala-sandbox
that's the
sbt
project
b
Ok I made it work. I've added the script folder and ran it with:
Copy code
scripts/jflyte register workflows \
-d=development \
-p=flyte-canary \
-v=$(git describe --always) \
-cp=target/scala-2.12
The problem is that
target/scala-2.12
is the relative path inside the project. Here's the zip with the tweaked script file
job-api-sandbox-main.zip
Hopefully that works ™️ .
b
so you just copied the scripts folder over as well?
b
yeah it's all there
b
I'll give it a go
stand-by
b
But that's for discovery. I think I deleted to much for example the env part to know where to push and so on.
I don't have much time now but can give it a go later.
b
yes please, because at the moment it fails with
Copy code
jq: error: Could not open file jflyte/target/jib-image.json: No such file or directory
so you changed the script as well
if you can share that script when you have a moment, that'd be great, thanks for your help
b
Yes. Just so it doesn't read the image from jib and pulls what is available in ghcr
Go inside your project I've added it there
b
ah.ok
java.lang.RuntimeException: Directory doesn't exist [/workdir/target/scala-2.12]
b
But yeah taking a look now, I've deleted too much and it won't pick up the necessary envs to upload it
b
so just some path issues
once I get this working I'll write up some docs in the repo for others
b
@bumpy-match-83743 bear with me almost done here.
b
please take your time, I really appreciate your help
I'll pay it forward
I'm guessing it's more work than you thought?
b
Okay I'm really struggling with my tiny sbt knowledge to get this working. The problem right now is the way
jflyte
expects files to be there. With sbt I don't know how to set this up to output the way it needs. Right now jflyte will read the dir but then output:
Copy code
Skipping artifact staging because there are no runnable tasks or dynamic workflow tasks
Meaning that it's not picking up. Here's an example on how jflyte expects the files to be packed. I've got all of them under
/lib
then there's the
META-INF
which has all the Classes that are LaunchPlans, tasks, workflows. That's how
jflyte
discovers them and pushes to flyteadmin/storage.
Screenshot 2024-07-11 at 21.11.06.png
b
Can we point to un-jar’d class files?
And we also need all the code from the raw flytekit-java library? And the Scala library as well?
b
But I've got a working example for you to at least take a look and try it out.
Using maven
b
I’ll go look at the mvn examples
b
scala-sandbox.zip
Check the
README
In Java the META-INF is auto generated for you by the
@Auto...
annotations and some black magic. In Scala you will need sbt to pick things up if you want to auto-generate them. For now it's manually to at least have a working example you can compare to.
And we also need all the code from the raw flytekit-java library? And the Scala library as well? (edited)
Not flytekit-java but the libraries used by Scala yes, all is uploaded to a bucket (S3, GCS). Once a workflow is triggered, the image will be just a normal flytekit-java image which will download all the jars from the bucket
b
Will flyte reuse downloaded images for subsequent workflow runs?
b
So we have a different concept compared with flytekit (python).
There's only flytekit-java image your code does not live there. This will for sure be cached in the k8s clusters. Each task will schedule a pod that uses said image, at the start of the image it will download the jars from your bucket and execute.
b
Yeah, I meant the jars… in case they’re large, and won’t want to pay the I/O on every invocation.
b
Not much you can do in that case for Runtime. On the upside is that when you're developing and running flyte jars are only staged once versus building and pushing docker images.
Jflyte only uploads jars that have changed skipping the rest
b
But the runtime downloads them from the S3 bucket on every workflow invocation?
b
Yep
On every task.
b
yikes
that'll be extremely slow... is that specific to the flytekit-java or is that how the whole platform works regardless of language?
b
Just flytekit java. We do some parallelism and so on so it's not slow slow. Maybe 5-10 seconds overhead depending on the k8s setup
b
Depends on how many jars and how big they are
So the python side does not do that? how does it manage code?
b
The usual docker image code loaded into it and so on
b
right.. so it has a docker image with the code, so k8s caches that and it's good to go
b
Yep
b
and why the change of design here? do you know?
b
For the developer experience.
b
really? I won't not make that trade-off 🤷‍♂️ we could always bind directly to the grpc services and choose an alternative strategy
b
There's also a couple more that came to mind. The ability to switch the underlying docker image if necessary (imagine you want to push a change to all the workflows like a bug fix) and to "hide" the dockerfile.
But yeah it's before my time and we should probably revisit if the rationale still makes sense. It can be very IO heavy for sure.
b
I almost got it working...
Copy code
[main] INFO org.flyte.jflyte.utils.Registrars - Discovering RunnableTaskRegistrar
[main] INFO org.flyte.jflyte.utils.Registrars - Discovered [org.flyte.flytekit.SdkRunnableTaskRegistrar]
[main] DEBUG org.flyte.flytekit.SdkRunnableTaskRegistrar - Discovering SdkRunnableTask
[main] INFO org.flyte.jflyte.utils.Registrars - Discovering DynamicWorkflowTaskRegistrar
[main] INFO org.flyte.jflyte.utils.Registrars - Discovered [org.flyte.flytekit.SdkDynamicWorkflowTaskRegistrar]
[main] DEBUG org.flyte.flytekit.SdkDynamicWorkflowTaskRegistrar - Discovering SdkDynamicWorkflowTask
[main] INFO org.flyte.jflyte.utils.Registrars - Discovering ContainerTaskRegistrar
[main] INFO org.flyte.jflyte.utils.Registrars - Discovered [org.flyte.flytekit.SdkContainerTaskRegistrar]
[main] DEBUG org.flyte.flytekit.SdkContainerTaskRegistrar - Discovering SdkContainerTask
[main] INFO org.flyte.jflyte.utils.Registrars - Discovering PluginTaskRegistrar
[main] INFO org.flyte.jflyte.utils.Registrars - Discovered [org.flyte.flytekit.SdkPluginTaskRegistrar]
[main] DEBUG org.flyte.flytekit.SdkPluginTaskRegistrar - Discovering SdkPluginTask
[main] INFO org.flyte.jflyte.utils.Registrars - Discovering WorkflowTemplateRegistrar
[main] INFO org.flyte.jflyte.utils.Registrars - Discovered [org.flyte.flytekit.SdkWorkflowTemplateRegistrar]
[main] DEBUG org.flyte.flytekit.SdkWorkflowTemplateRegistrar - Discovering SdkWorkflow
[main] INFO org.flyte.jflyte.utils.Registrars - Discovering LaunchPlanRegistrar
[main] INFO org.flyte.jflyte.utils.Registrars - Discovered [org.flyte.flytekit.SdkLaunchPlanRegistrar]
[main] DEBUG org.flyte.flytekit.SdkLaunchPlanRegistrar - Discovering SdkLaunchPlans
[main] INFO org.flyte.jflyte.utils.ProjectClosure - Skipping artifact staging because there are no runnable tasks or dynamic workflow tasks
and all the jars are there
Copy code
~/c/job-api-sandbox    jvm/scala-sandbox/target/lib  ll                                                                                                                                                                                                                                                    (base)
total 23M
-rw-rw-r-- 1 aczerwon aczerwon 223K Jul  4 02:08 flytekit-api-0.4.59.jar
-rw-rw-r-- 1 aczerwon aczerwon 179K Jul  4 02:07 flytekit-java-0.4.59.jar
-rw-rw-r-- 1 aczerwon aczerwon 158K Jul  4 02:08 flytekit-scala_2.13-0.4.59.jar
-rw-rw-r-- 1 aczerwon aczerwon  44K Jul  9  2012 hamcrest-core-1.3.jar
-rw-rw-r-- 1 aczerwon aczerwon 263K Mar 26  2018 jline-2.14.6.jar
-rw-rw-r-- 1 aczerwon aczerwon 1.5M Jul 14  2020 jna-5.6.0.jar
-rw-rw-r-- 1 aczerwon aczerwon 376K Feb 13  2021 junit-4.13.2.jar
-rw-rw-r-- 1 aczerwon aczerwon  48K May 22 02:01 junit-interface-1.0.0.jar
-rw-rw-r-- 1 aczerwon aczerwon 178K Jun 27  2021 magnolia-core_2.13-1.0.0-M4.jar
-rw-rw-r-- 1 aczerwon aczerwon  24K Jul  4  2019 mercator_2.13-0.2.1.jar
-rw-rw-r-- 1 aczerwon aczerwon 169K May 22 02:01 munit_2.13-1.0.0.jar
-rw-rw-r-- 1 aczerwon aczerwon  51K May 22 02:01 munit-diff_2.13-1.0.0.jar
-rw-rw-r-- 1 aczerwon aczerwon  11M Jun  7  2019 scala-compiler-2.13.0.jar
-rw-rw-r-- 1 aczerwon aczerwon 5.7M Apr 29 14:41 scala-library-2.13.14.jar
-rw-rw-r-- 1 aczerwon aczerwon 3.7M Apr 29 14:41 scala-reflect-2.13.14.jar
-rw-rw-r-- 1 aczerwon aczerwon  27K Jul 11 15:08 scala-sandbox-0.1.1-SNAPSHOT.jar
-rw-rw-r-- 1 aczerwon aczerwon  15K Jun 28  2013 test-interface-1.0.jar
ok, to get it working, I manually had to create the
resources/META-INF/services
directory. Then when I do
mvn clean package
, I get them registered.
I assume those are generated somehow?
So it uses the ServiceLoader SPI pattern for discovery
f
@bumpy-match-83743 this decision was done by folks at Spotify as Rafael is saying to help them avoid building docker images completely. Not sure, if this is still the right decision