How do you guys typically get your data sets? Is i...
# ask-the-community
How do you guys typically get your data sets? Is it mostly just queries from existing RDBMS?
Hi Mike, thank you for joining. Folks get dataset from data warehouses, blob stores and rdbms
That is what I figured. So do you think most folks here typically make it themselves or does like a DBA/Data Engineer typically set them up?
I would like to see if the schema or general data drift tends to be a problem. I could see a single column name change throwing off the whole task
That is true, but it will fail in flytes case
If you model it well
Is there a way to model it in such a way that a change to a column name or data type won’t cause a breakage?
It will cause a break, if you use typed columns, but that is optional
All Types in structured data set a data frames are optional
Cc @Yee
Right, but I might be wrong so please correct me. If you don’t type it, how do you perform things like math operations? It would just default to a string, right?
hop on a call? @Mike Carley
Maybe. I don’t think its really worth your time. I am just trying to figure out how MLOps platforms like this one cope with data supplies that change their stuff up all the time, and without telling anyone
i don’t mind if you don’t mind @Mike Carley
sorry got pulled away to investigate another bug
would you have some time later this week for 15-20 mins?
FWIW there's an existing Flyte Integration with Great Expectations having been in both the data engineering and data science realms, its been hard to set up a working contract between source and sink. Great Expectations is a great project working on getting tests for that. I think the power of Flyte is in that tasks are statically typed in that you won't accidentally allow the workflow to go through. I think the proper behavior for the workflow is to fail when data changes
Its all good @Yee. I have some time tomorrow. Does that work for you?
yeah sure, what tz are you in?
meetings here and there but pretty open for most of the day
@Niels Bantilan