hi folks do you run Flyte local mode on notebook on transien Flyte #flyte-support

hi folks, do you run Flyte -- local mode -- on not...

agreeable-flower-8989

12/11/2023, 11:31 PM

hi folks, do you run Flyte -- local mode -- on notebook on transient notebook env (eg. databricks, collab)? In that situation, how do you enable local cache? the default behavior writes output to

~/.*flyte*/*local*-*cache*/

which assumes strongly a durable persistent local env

freezing-airport-6809

12/12/2023, 3:07 AM

Wdym

freezing-airport-6809

12/12/2023, 3:07 AM

Local cache is local

agreeable-flower-8989

12/12/2023, 3:08 AM

right.. but if i run Flyte code on databricks environement for example.. cache just doens't work across runs

agreeable-flower-8989

12/12/2023, 3:09 AM

so i'm asking if there are good solutions for such ephemeral environment?

freezing-airport-6809

12/12/2023, 3:09 AM

Wdym is there no disk

freezing-airport-6809

12/12/2023, 3:09 AM

There should be

freezing-airport-6809

12/12/2023, 3:10 AM

Ya we can make it store in s3

freezing-airport-6809

12/12/2023, 3:10 AM

I love the idea

agreeable-flower-8989

12/12/2023, 3:10 AM

yah i see that ``~/.*flyte*/*local*-*cache*/` now has sq-lite artifact

freezing-airport-6809

12/12/2023, 3:10 AM

Shall we collaborate on this

agreeable-flower-8989

12/12/2023, 3:10 AM

and flytekit likely queries on this local db

freezing-airport-6809

12/12/2023, 3:10 AM

Ya it does

agreeable-flower-8989

12/12/2023, 3:11 AM

so if we implement s3 path, it's gonna look more like Flyte running on K8S cluster

freezing-airport-6809

12/12/2023, 3:11 AM

No it won’t

freezing-airport-6809

12/12/2023, 3:11 AM

As that needs a db

freezing-airport-6809

12/12/2023, 3:11 AM

Here we will have to use lookup

agreeable-flower-8989

12/12/2023, 3:12 AM

just s3 look-up by uri path?

freezing-airport-6809

12/12/2023, 3:12 AM

But if this is a custom cache then we could Simply upload the cache db

freezing-airport-6809

12/12/2023, 3:12 AM

That’s the other option

agreeable-flower-8989

12/12/2023, 3:12 AM

ok i'd love to collaborate

agreeable-flower-8989

12/12/2023, 3:13 AM

databricks is the standard notebooking env we are going with, so we'd like to have caching funcationality here

freezing-airport-6809

12/12/2023, 3:13 AM

Ok I have never used it so would love to understand

freezing-airport-6809

12/12/2023, 3:13 AM

Why databricks

freezing-airport-6809

12/12/2023, 3:14 AM

Let’s have a chat sometime

agreeable-flower-8989

12/12/2023, 3:14 AM

databricks notebook have great UX and it's been worth the $$ for the productivity gain for our engs

agreeable-flower-8989

12/12/2023, 3:15 AM

let me write up something and will share with you to get first round of feedback

agreeable-flower-8989

12/12/2023, 2:35 PM

https://github.com/flyteorg/flyte/issues/4580

agreeable-flower-8989

12/12/2023, 2:52 PM

we are actually want to make local/remote execution more seamless.. often when folks do remote execution, they are hoping they can reuse result from remote execution to iterate locally as well

freezing-airport-6809

12/12/2023, 3:16 PM

@agreeable-flower-8989 using remote cache locally is dangerous

freezing-airport-6809

12/12/2023, 3:16 PM

But you can fetch all the data

freezing-airport-6809

12/12/2023, 3:17 PM

Checkout the new Flyte data uri

agreeable-flower-8989

12/12/2023, 3:17 PM

it is dangerous in the sense that you are concerned about data corruption right?

agreeable-flower-8989

12/12/2023, 3:18 PM

i think read-only secondary cache is sufficient for us

agreeable-flower-8989

12/12/2023, 3:18 PM

anyways that's a secondary ask.. I think the first ask is just to be able to have external durable storage for local execution, as described in the issue above

agreeable-flower-8989

12/12/2023, 3:19 PM

pls let me know further thougths, and will be happy to contribute

👍 1

freezing-airport-6809

12/12/2023, 3:21 PM

Let me discuss today

🙏 1

agreeable-flower-8989

12/13/2023, 3:03 AM

hi ketan! any further thoughts on this?

freezing-airport-6809

12/13/2023, 6:20 AM

i read it briefly, i have some comments, i guess i think if we set a s3 path you do not even need a prefix / context

freezing-airport-6809

12/13/2023, 6:20 AM

but also we wont have time to work on this at the moment

agreeable-flower-8989

12/13/2023, 2:11 PM

if s3 path can be env var as well, that would work.. i'm happy to implement the work here, but want to make sure that directinoally it's something OSS will accept

freezing-airport-6809

12/14/2023, 5:25 AM

yes i think we should, @thankful-minister-83577 is out but he will be back week after (he got married)

agreeable-flower-8989

12/14/2023, 1:15 PM

sounds good. will work with Yee on this then

agreeable-flower-8989

12/21/2023, 4:17 PM

hi @high-accountant-32689! thanks for the input here https://github.com/flyteorg/flyte/issues/4580#issuecomment-1864541011 also happy to chat here if it's more helpful.

high-accountant-32689

12/22/2023, 3:05 PM

awesome, let's keep chatting here. It wouldn't be too hard to lean on our flytekit's existing infra to support loading/writing to a blob store. I just wanted to separate the two ideas: (1) the local scope, and (2) a remote cache. If you want to throw a PR I'd be more than happy to review.

agreeable-flower-8989

12/22/2023, 8:20 PM

gotcha.. 1/ local scope here is simply have the cache local disk to be configurable right? 2/ remote cache will also reuse that cache path? 2.1/ do you have preference if we will simply sync the whole DB files that python diskcache write.. or should we try to encode the cache key in indiviual remote blob store path (closer to how on-cluster execution works)

high-accountant-32689

12/26/2023, 1:54 PM

1/ correct. 2/ Yeah, the local scope is optional, its purpose is just to help you segregate local caches. 2.1/ That's a good question. It'd be simpler to sync all DB files, but I fear that this might make the local cache very slow after multiple runs (imagine the case of a few thousand objects of different sizes being stored there). I also dislike the fact that if we go that route the local cache becomes slower and slower... so my vote goes for to make each entry its own separate entry in the blob store. wdyt?

agreeable-flower-8989

01/05/2024, 10:15 PM

Sorry eduardo for late response. And happy new year! 2.1/ Yup each cache entry can have its own entry in the blob store. That makes sense to me

👍 1

3 Views

Open in Slack

Previous Next