GitHub
03/09/2023, 11:54 PMGitHub
03/10/2023, 12:46 AMs3fs/gcsfs
to provide the fsspec implementations.
Important
This PR removes the DataPersistence
and DataPersistencePlugins
constructs completely. If you were importing these, these are now going away. However most people should not have been using these.
The flytekitplugins-data-fsspec
plugin has also been emptied out, though it will continue to be published for the time being. The logic that was in there originally has been moved into flytekit code in this PR, including the Structured Dataset handlers.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
Notes for users and contributors
If you're importing LOCAL
or S3
from flytekit/types/structured/structured_dataset.py
these have been removed.
If you're registering with None
as the protocol for a structured dataset handler, what happens now is that the handler is registered with the fsspec
protocol. For users that are not familiar with StructuredDataset
encoders and decoders, which one gets used is based on three things: the Python type of the dataclass, the storage protocol (local vs s3 vs gcs etc.) and the file format (parquet, csv, etc.). This was done as a separate commit #1543. The waterfall is now:
1. Exact match on all three attributes
2. Protocol match but a generic format (""
) handler, if registered
3. Protocol match and a match on the default format for that dataframe type, if default format is set
4. An fsspec
handler with a format match (will match generic ""
format too)
5. An fsspec
handler with a generic format, if registered
6. An fsspec
handler with the default format for that dataframe type, if default format is set
7. An fsspec
handler if that protocol only has one entry and the requested format was generic.
8. Protocol specific handler if that protocol only has one entry.
Others
• Added a Dockerfile.dev
for slightly easier debugging. (Longer term we hope to come up with a better dev experience.)
• Some attempt was made to add the ability to include a custom header containing a hash to the fsspec HTTP filesystem. This is needed for pyflyte run
register
and anything else that uses the DataProxy
service to post upload to a signed URL. Unfortunately no solution was found, so the upload path for these two commands just uses the old way of calling the requests library.
Tracking Issue
flyteorg/flyte#3197
flyteorg/flytekit
✅ All checks have passed
30/30 successful checksGitHub
03/10/2023, 1:39 AM<https://github.com/flyteorg/flyteidl/tree/master|master>
by eapolinario
<https://github.com/flyteorg/flyteidl/commit/3dfcaf6671d85ee72c1ce00961c17421c5c91111|3dfcaf66>
- Init customTokenSource.refreshTime (#381)
flyteorg/flyteidlGitHub
03/10/2023, 1:49 AMGitHub
03/10/2023, 2:15 AMGitHub
03/10/2023, 4:55 AM<https://github.com/flyteorg/flytekit/tree/master|master>
by kumare3
<https://github.com/flyteorg/flytekit/commit/e90ee2576c87298826b1ba702cf7569f43f737ba|e90ee257>
- Pyflyte docs formatting (#1538)
flyteorg/flytekitGitHub
03/10/2023, 6:19 AM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/28da983bba36e243bc671f9ba1aa53a0791efd62|28da983b>
- Data subsystem (#1526)
flyteorg/flytekitGitHub
03/10/2023, 8:06 AMGitHub
03/10/2023, 8:17 AMGitHub
03/10/2023, 8:28 AMGitHub
03/10/2023, 8:45 AMGitHub
03/10/2023, 9:25 AM<https://github.com/flyteorg/flytekit-java/tree/master|master>
by sonjaer
<https://github.com/flyteorg/flytekit-java/commit/2aaab0b28221fc43b87bbf96593af90008e3c1ce|2aaab0b2>
- add scala subworkflow ex (#208)
flyteorg/flytekit-javaGitHub
03/10/2023, 9:51 AM<https://github.com/flyteorg/flytekit-java/tree/master|master>
by sonjaer
<https://github.com/flyteorg/flytekit-java/commit/116814abe545532633dda63117c00d0426f422f0|116814ab>
- Add ability to fetch lp with version (#209)
flyteorg/flytekit-javaGitHub
03/10/2023, 11:47 AM<https://github.com/flyteorg/flytekit-java/tree/master|master>
by github-actions[bot]
<https://github.com/flyteorg/flytekit-java/commit/6ae23c45743a60012868438a77ebef4f622a573e|6ae23c45>
- [maven-release-plugin] prepare release 0.4.6
flyteorg/flytekit-javaGitHub
03/10/2023, 11:47 AM<https://github.com/flyteorg/flytekit-java/tree/master|master>
by github-actions[bot]
<https://github.com/flyteorg/flytekit-java/commit/aa72cfb18cf9f78d5f4eb093da146b8651b92578|aa72cfb1>
- [maven-release-plugin] prepare for next development iteration
flyteorg/flytekit-javaGitHub
03/10/2023, 12:34 PMGitHub
03/10/2023, 1:17 PMFlyteFile
. Everything went well until I tried to define a custom TypeTransformer
for it, which I wanted to behave exactly like `FlyteFile`'s type transformer with a few additions.
Pseudocode:
class Resource(Generic[T], FlyteFile):
...
class ResourceTransformer(TypeTransformer[Resource]):
def to_literal(ctx, python_val, python_type, expected):
lit = TypeEngine().to_literal(ctx, python_val, FlyteFile, None) # <- construct a literal representation of the FlyteFile
... # now, do something with the literal like embedding a hash etc.
return lit
TypeEngine.register(ResourceTransformer(...))
The problem is that the current type resolution will now resolve FlyteFile
as the appropriate transformer for a Resource
object, as per this code:
https://github.com/flyteorg/flytekit/blob/28da983bba36e243bc671f9ba1aa53a0791efd62/flytekit/core/type_engine.py#L756-L760
There is a TODO comment a little above that sort of acknowledges this problem, and proposes to use an ordered dict, presumably to walk it in reverse and resolve the Resource
transformer before the FlyteFile
transformer. This might be brittle though for multiple inheritance use cases like the one here, maybe even with a third resource type sprinkled in that gets registered way after the other two.
What I propose instead is trying to walk down the `python_val`'s class hierarchy and greedily match the first type transformer available, roughly like so:
# flytekit/core/type_engine.py
# begins at the code block linked above
# Step
# To facilitate cases where users may specify one transformer for multiple types that all inherit from
# parent.
if hasattr(python_type, "__mro__"):
for base_type in inspect.getmro(python_type):
try:
return cls._REGISTRY[base_type]
except KeyError:
continue
For an inheritance chain like mine with MRO Resource -> FlyteFile -> ... -> object
, this would lead to a match directly at the Resource
level. What do you think?
Expected behavior
TypeEngine().to_literal(ctx, python_val, python_type, expected)
calls the ResourceTransformer
class to construct the literal, but instead it calls `FlyteFile`'s associated transformer.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
03/10/2023, 3:41 PMimage▾
Yee
GitHub
03/10/2023, 6:01 PMGitHub
03/10/2023, 6:21 PMGitHub
03/10/2023, 6:24 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by eapolinario
<https://github.com/flyteorg/flyteadmin/commit/2d3942298bbbdcd9841be002612ba0c4da77e5eb|2d394229>
- Add Dan Rammer and Eduardo to codeowners (#539)
flyteorg/flyteadminGitHub
03/10/2023, 6:24 PMGitHub
03/10/2023, 6:31 PMGitHub
03/10/2023, 6:50 PM<https://github.com/flyteorg/flyte/tree/master|master>
by jeevb
<https://github.com/flyteorg/flyte/commit/401973c72dd59497f03ca6efd2267fe850ccb5c3|401973c7>
- Fix local compile (#3444)
flyteorg/flyteGitHub
03/10/2023, 6:57 PMGitHub
03/10/2023, 7:25 PMGitHub
03/10/2023, 7:29 PM<https://github.com/flyteorg/flyte/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flyte/commit/05771f335998ec19a4cf032b4419fd894bd1f4e0|05771f33>
- Update swagger.rst (#3398)
flyteorg/flyteGitHub
03/10/2023, 7:37 PMGitHub
03/10/2023, 8:02 PMDataLoadingConfig
information to the K8sPod
proto message so that Flyte can inject copilot as a sidecar in Pod tasks. This allows the use of PodTemplate
configuration with ContainerTasks
.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☐ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
^^^
Tracking Issue
flyteorg/flyte#3123
Follow-up issue
NA
flyteorg/flyteidl
✅ All checks have passed
13/13 successful checks