``` opt venv lib python3 10 site packages flytekit types sch Flyte #flyte-support

```/opt/venv/lib/python3.10/site-packages/flytekit...

sticky-angle-28419

12/27/2022, 5:54 PM

Copy code

/opt/venv/lib/python3.10/site-packages/flytekit/types/schema/types.py:323: FutureWarning: In the future `np.bool` will be defined as the corresponding NumPy scalar.  (This may have returned Python scalars in past versions.
  _np.bool: SchemaType.SchemaColumn.SchemaColumnType.BOOLEAN,  # type: ignore
Traceback (most recent call last):
  File "/opt/venv/bin/pyflyte", line 5, in <module>
    from flytekit.clis.sdk_in_container.pyflyte import main
  File "/opt/venv/lib/python3.10/site-packages/flytekit/__init__.py", line 195, in <module>
    from flytekit.types import directory, file, numpy, schema
  File "/opt/venv/lib/python3.10/site-packages/flytekit/types/schema/__init__.py", line 1, in <module>
    from .types import (
  File "/opt/venv/lib/python3.10/site-packages/flytekit/types/schema/types.py", line 313, in <module>
    class FlyteSchemaTransformer(TypeTransformer[FlyteSchema]):
  File "/opt/venv/lib/python3.10/site-packages/flytekit/types/schema/types.py", line 323, in FlyteSchemaTransformer
    _np.bool: SchemaType.SchemaColumn.SchemaColumnType.BOOLEAN,  # type: ignore
  File "/opt/venv/lib/python3.10/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'bool'. Did you mean: 'bool_'?

glamorous-carpet-83516

12/27/2022, 6:45 PM

Which version of flytekit are you using?

sticky-angle-28419

12/27/2022, 7:51 PM

Using

flytekit==1.2.3

sticky-angle-28419

12/27/2022, 7:53 PM

@glamorous-carpet-83516

thankful-minister-83577

12/27/2022, 8:01 PM

this is a known issue sorry - we were a bit late in keeping up with the numpy deprecation notice.

thankful-minister-83577

12/27/2022, 8:01 PM

https://github.com/flyteorg/flytekit/pull/1377

thankful-minister-83577

12/27/2022, 8:01 PM

can you bump to 1.2.7?

sticky-angle-28419

12/27/2022, 8:02 PM

I remember having issues with 1.2.4 so I had to do:

sticky-angle-28419

12/27/2022, 8:02 PM

Copy code

grpcio-status<1.49.0
flytekit==1.2.3

sticky-angle-28419

12/27/2022, 8:02 PM

Can I just do

flytekit==1.2.7

now?

sticky-angle-28419

12/27/2022, 8:02 PM

@thankful-minister-83577

thankful-minister-83577

12/27/2022, 8:03 PM

yes

thankful-minister-83577

12/27/2022, 8:03 PM

we’ve added that to the setup.py https://github.com/flyteorg/flytekit/blob/release-v1.2/setup.py#L53

sticky-angle-28419

12/27/2022, 8:03 PM

Great I’ll try that

sticky-angle-28419

12/27/2022, 8:04 PM

Thanks!

thankful-minister-83577

12/27/2022, 8:04 PM

see eduardo’s post for more info: https://flyte-org.slack.com/archives/CNMKCU6FR/p1669927238849689

sticky-angle-28419

12/28/2022, 5:16 PM

@thankful-minister-83577 using 1.2.7 works - thanks!

👍 1

sticky-angle-28419

12/30/2022, 12:28 AM

@thankful-minister-83577 A quick question - is Flyte used for training only? Should I be using serving tools like BentoML for inference? What if a large amount of data needs to be pre-processed (say via Spark) prior to inference? Where does Flyte fit in (or is it not meant to be used for inference at all, even for data pre-processing)?

sticky-angle-28419

12/30/2022, 12:31 AM

Can Flyte replace other data workflow tools like Airflow, Prefect, Dagster, etc?

thankful-minister-83577

12/30/2022, 6:26 PM

will let @broad-monitor-993 answer this one.

👍 2

broad-monitor-993

12/30/2022, 7:05 PM

The short answer is yes, they all pretty much have the same feature set at a high level for many of the core use cases (scheduled batch processing). Benefits of Flyte over other orchestrators would include: • node-level containerization • Node-level resource allocation • Strongly typed interfaces • Native data passing • Multitenancy • Declarative infrastructure (eg ephemeral Spark, ray clusters) • Reference tasks/workflows for reusable components

👍 2

broad-monitor-993

12/30/2022, 7:06 PM

Flyte is good for any workflows involving data, not just ML training.

broad-monitor-993

12/30/2022, 7:07 PM

Re: inference we’d recommend other tools like bentoml or kserve (a bentoml integration is in the issues https://github.com/flyteorg/flyte/issues/3107)

broad-monitor-993

12/30/2022, 7:10 PM

Flyte works well for batch inference (where latency requirements are 10s of minutes or more), this will do with large data pre-processing workloads. For anything faster, inference tools like bentoml works well. You could also use Flyte in event-driven architectures https://blog.flyte.org/build-an-event-driven-neural-style-transfer-application-using-aws-lambda if you have ~1-10 minute latency requirements for inference

broad-monitor-993

12/30/2022, 7:23 PM

For online inference use cases (sub minute latency) with large data preprocessing requirements it would make sense to use a feature store (Flyte has Feast integration https://docs.flyte.org/projects/cookbook/en/stable/auto/case_studies/feature_engineering/feast_integration/index.html) Where Flyte can orchestrate the generation of features to be read into e.g. a bentoml service

sticky-angle-28419

12/30/2022, 7:29 PM

This is great - thank you very much @broad-monitor-993. I’ll also check out the articles!

sticky-angle-28419

12/30/2022, 7:30 PM

One quick question though - the issue there mentions

signal

node. Is that just a custom node to use as a flag or is there something built in in Flyte?

broad-monitor-993

12/30/2022, 7:39 PM

We’re still working on the signal node I believe @thankful-minister-83577 , but it’ll be a first-class node in Flyte for human-in-the-loop use cases (eg the requirement for a human to approve a model based on some metrics before deploying)

👍 1

sticky-angle-28419

12/30/2022, 7:41 PM

Oh I see - ok cool thanks!

victorious-kilobyte-69570

01/19/2023, 10:41 AM

@sticky-angle-28419 Just to add to this conversation based on my readings over the past couple of days. This are my findings and some understanding of prefect vs flyte caching - • Prefect docs mention that their caching as of today is at "task" level, and not workflow level. ◦ Also they mention the cache can contain a maximum of 2000 characters, and you have to enable a parameter to persist the cache in your prefect_storage after a workflow is run. • Whereas Flyte on the other hand has caching at both workflow and task levels. You can also version the caches if needed. And retrieve those versions when needed. ◦ Flyte cache can be based on the hash of the input and output of tasks as well. I may have got this wrong. @broad-monitor-993 @freezing-airport-6809 @tall-lock-23197 @thankful-minister-83577 and anyone else from Flyte Org, please do clarify if I have made mistakes in my research regarding "Flyte" , obviously you don't have to speak for "Prefect". 🙂 Caching, is something that could be a pivotal feature for people looking to choose between workflow management tools. And threads like these are really cool to read, where comparisons are done in an open manner.

broad-monitor-993

01/19/2023, 2:51 PM

Whereas Flyte on the other hand has caching at both workflow and task levels

Caching works with `@task`s and

@dynamic

workflows. Currently, caching is not supported for static workflows

You can also version the caches if needed.

Yes, Flyte’s opinion is it’s too complicated trying to figure out if a task’s upstream dependencies have changed (which could potentially live in other modules, etc), so you can use any version string to version the cache.

Flyte cache can be based on the hash of the input and output of tasks as well.

The main use case for a user-defined hash method for inputs is for blob-store-serialized objects like files, directories, dataframes, pickle files, etc. In this case, you need to define a

HashMethod

, which will incur some runtime cost as Flyte computes the hash of, e.g. a dataframe.

victorious-kilobyte-69570

01/19/2023, 3:04 PM

Great! Just want to clarify using a scenario, please bear with my long question below , im new to Flyte. So the use of

HashMethod

means "both the input parameters to the function, and the output" ie., DF/files/filepaths/pickles etc will be hashed and stored in Flyte storage (which in aws is an S3 bucket). Am I understanding this correctly? This means if I run a large spark based "task1" , and then the next "task2" requires "task1"s output for some operation, using HashMethod and potentially "cache_version" , I can run a workflow multiple times for evaluating "task2" which takes "task1"s cached output right? Basically Im trying to say that "Hashed and versioned tasks" could potentially avoid multiple writes to disk (output_1.csv, output2.csv etc) while a data science/data engineering "task2" is being ideated/refined? I have one more question, if my above understanding is correct. So please do clarify.

broad-monitor-993

01/19/2023, 4:56 PM

HashMethod

annotated outputs (e.g. for files, dataframes) will calculate a hash key based on the user-defined hash function, and this key will be used as the cache key. Assume

task1

produces this output, when the output is passed into downstream task

task2

, the hash key will be used to determine whether or not to re-run

task2

or just hit the cache to return the pre-computed value.

This means if I run a large spark based “task1” , and then the next “task2" requires “task1”s output for some operation, using HashMethod and potentially “cache_version” , I can run a workflow multiple times for evaluating “task2” which takes “task1"s cached output right?

correct

Basically Im trying to say that “Hashed and versioned tasks” could potentially avoid multiple writes to disk (output_1.csv, output2.csv etc) while a data science/data engineering “task2" is being ideated/refined?

correct

broad-monitor-993

01/19/2023, 5:06 PM

so to re-state what you’re saying to make sure I understand: •

task1

is a relatively cheap spark job that produces a parquet file. The output of this task has a

HashMethod

so has a cache key associated with the output. •

task2

is an expensive data processing spark job that depends on

task1

, and is set to

cache=True

with a

cache_version="1"

• assuming that

cache_version

stays the same and the output of

task1

produces the same cache key, the first invocation of

task2

will run it, but subsequent invocations will hit the cache. However, since

task1

doesn’t have

cache=True

task1

will always run. Now if

task1

is also cached based on some primitive datatype inputs (like

datetime

int

str

, etc), then

task1

will not be run (avoiding multiple writes to disk) if a cache key for the output already exists.

victorious-kilobyte-69570

01/19/2023, 5:29 PM

Great I understand now. Just in my case your description of task1 and task2 is the reverse, I meant task1 to be a very expensive task, so avoiding re-running it seemed a better outcome of using cache. But the analogy, and in turn Flyte, works both ways anyway. Thanks again for the detailed answer. I give credit to my initial understanding to the 'Caching' doc section of Flyte Docs. But I think you guys can really sell/market this a lot more! Just showing more emphasis on this caching method and the way Flyte thought process is, a huge factor for anyone choosing a workflow tool. Prefect, in my research, had very primitive caching capability when compared to Flyte. Good and careful caching in Flyte can potentially alleviate, some of, the need for feature store usage I think. I will try to complete my installation of Flyte in a dev env in a cloud, and try out such a scenario. Thanks for clarifying the thought process @broad-monitor-993. Would be glad to hear more from anyone reading this thread.

👍 1

freezing-airport-6809

01/19/2023, 5:52 PM

@colossal-musician-95989 we can sell a lot of things a lot more. We are terrible at that - help us spread the word. The caching in Flyte has evolved through many design discussions, user sessions and lot of careful planning. Thank you for sharing

broad-monitor-993

01/19/2023, 6:44 PM

@colossal-musician-95989 yes our flyte.org website revamp (coming soon!) should feature caching a lot more prominently. Out of curiosity, do you find the current docs on caching clear and understandable?

victorious-kilobyte-69570

01/19/2023, 7:16 PM

I feel it has everything a person needs to understand Caching, but it needs a thorough read, maybe even a few times for a complete new comer to workflow tools. Also, terms related to caching such as "task Signature" could be simpler in my opinion. For example "Any changes to task Function". Prefect V1 docs for caching - https://docs-v1.prefect.io/core/concepts/persistence.html#output-caching-based-on-a-file-target Its pretty old, but has similar concepts explained in simple sentences. Im just sharing it so that I can show you what kind of understanding fits a crowd who are new to these tools. But I do wish the above prefect page to be more technical. Closer to flyte's way of explaining. Basically a middle ground will be nice. Also in Flyte current stable docs, I would wish for more details on "local cache storage" and "remote cache storage". The above prefect v1 doc gives some insight to it.

❤️ 5

thankful-minister-83577

01/19/2023, 7:25 PM

thank you for the feedback @victorious-kilobyte-69570

broad-monitor-993

01/19/2023, 7:32 PM

[flyte-docs]

user

01/19/2023, 7:32 PM

📘 Create a new Flyte Docs issue: https://github.com/flyteorg/flyte/issues/new?assignees=&labels=documentation%2Cuntriaged&template=docs_issue.yaml&title=%5BDocs%5D+

broad-monitor-993

01/19/2023, 7:34 PM

@colossal-musician-95989 if you don’t mind, would you fill in a new issue ^^ for improving the caching docs? It be super helpful if you can link this slack thread and summarize the main suggestions you have for improving the readability and content.

freezing-airport-6809

01/19/2023, 11:33 PM

thank you @victorious-kilobyte-69570 this is fantastic feedback

victorious-kilobyte-69570

01/20/2023, 8:54 AM

@broad-monitor-993 sure I will create an issue.

victorious-kilobyte-69570

01/20/2023, 8:56 AM

@freezing-airport-6809 feels good to talk to the flyte team directly, hope that we all together push the general direction of Flyte to greater heights.

❤️ 3

powerful-gold-59386

01/20/2023, 1:37 PM

@victorious-kilobyte-69570 I have already created an issue for this (sorry didn't see that Niels asked you to open one). Feel free to comment on it with anything else you want to say: https://github.com/flyteorg/flyte/issues/3249

victorious-kilobyte-69570

01/20/2023, 2:16 PM

Oh nice. great will do. Sorry Im still working in office couldnt find time, was planning to write up at home after office, with a beer in hand 😄 . Will do tonight (india time)

freezing-airport-6809

01/20/2023, 2:52 PM

Issues with beer in hand 😂

victorious-kilobyte-69570

01/21/2023, 7:13 AM

hehe

🍺 2

victorious-kilobyte-69570

01/21/2023, 2:03 PM

https://github.com/flyteorg/flyte/issues/3249#issuecomment-1399256660 @freezing-airport-6809 @powerful-gold-59386 @broad-monitor-993 Done.

175 Views

Open in Slack

Previous Next