https://flyte.org logo
#ask-the-community
Title
# ask-the-community
m

Mike Carley

09/20/2022, 2:19 PM
How do you guys typically get your data sets? Is it mostly just queries from existing RDBMS?
k

Ketan (kumare3)

09/20/2022, 2:25 PM
Hi Mike, thank you for joining. Folks get dataset from data warehouses, blob stores and rdbms
m

Mike Carley

09/20/2022, 2:26 PM
That is what I figured. So do you think most folks here typically make it themselves or does like a DBA/Data Engineer typically set them up?
I would like to see if the schema or general data drift tends to be a problem. I could see a single column name change throwing off the whole task
k

Ketan (kumare3)

09/20/2022, 2:52 PM
That is true, but it will fail in flytes case
If you model it well
m

Mike Carley

09/20/2022, 2:54 PM
Is there a way to model it in such a way that a change to a column name or data type won’t cause a breakage?
k

Ketan (kumare3)

09/20/2022, 3:25 PM
It will cause a break, if you use typed columns, but that is optional
All Types in structured data set a data frames are optional
Cc @Yee
m

Mike Carley

09/20/2022, 4:35 PM
Right, but I might be wrong so please correct me. If you don’t type it, how do you perform things like math operations? It would just default to a string, right?
y

Yee

09/20/2022, 5:06 PM
hop on a call? @Mike Carley
m

Mike Carley

09/20/2022, 5:09 PM
Maybe. I don’t think its really worth your time. I am just trying to figure out how MLOps platforms like this one cope with data supplies that change their stuff up all the time, and without telling anyone
y

Yee

09/20/2022, 6:09 PM
i don’t mind if you don’t mind @Mike Carley
sorry got pulled away to investigate another bug
would you have some time later this week for 15-20 mins?
k

Katrina P

09/20/2022, 8:01 PM
FWIW there's an existing Flyte Integration with Great Expectations https://docs.flyte.org/projects/cookbook/en/stable/auto/integrations/flytekit_plugins/greatexpectations/index.html having been in both the data engineering and data science realms, its been hard to set up a working contract between source and sink. Great Expectations is a great project working on getting tests for that. I think the power of Flyte is in that tasks are statically typed in that you won't accidentally allow the workflow to go through. I think the proper behavior for the workflow is to fail when data changes
m

Mike Carley

09/20/2022, 8:29 PM
Its all good @Yee. I have some time tomorrow. Does that work for you?
y

Yee

09/20/2022, 8:38 PM
yeah sure, what tz are you in?
meetings here and there but pretty open for most of the day
m

Mike Carley

09/23/2022, 5:08 PM
@Niels Bantilan
25 Views