GitHub
02/27/2023, 3:38 PMduckdb_task = DuckDBQuery(name="duckdb_task", query="SELECT SUM(a) FROM mydf", inputs=kwtypes(mydf=pd.DataFrame))
Describe alternatives you've considered
An alternative is to handle DuckDB code from within a Flyte task: https://gist.github.com/samhita-alla/003c3f409e8caa88470f6f7206b54ae3.
Propose: Link/Inline OR Additional context
A task plugin that accepts a query, a dataframe/pyarrow table/parquet file/csv file and parameters.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
02/27/2023, 3:38 PMDuckDBQuery
task plugin that runs queries using DuckDB as the DBMS.
Type
☐ Bug Fix
☐ Feature
☑︎ Plugin
Are all requirements met?
☐ Code completed
☐ Smoke tested
☑︎ Unit tests added
☑︎ Code documentation added
☐ Any pending items have an associated Issue
Complete description
Capturing the crucial assumptions I made while building the task plugin:
• The DuckDBQuery
task parameter that a user needs to send argument to includes query
and can contemplate adding includes inputs
.
• query
can include a set of queries that'll be run sequentially. The last query needs to be a SELECT query.
• inputs
can include structured dataset or a list of parameters to be sent to the queries.
• The output
is a pyarrow table. Can be converted to any structured dataset compatible type.
• The connection mode is set to :memory
, i.e., the data is always stored in an in-memory, non-persistent database. It can be set to a file, but it's difficult to make the file accessible to different DuckDBQuery
pods, which otherwise wouldn't make sense because file is persistent, and it needs to be leveraged.
Example:
duckdb_query = DuckDBQuery(
name="read_parquet",
query=[
"INSTALL httpfs",
"LOAD httpfs",
"""SELECT hour(lpep_pickup_datetime) AS hour, count(*) AS count FROM READ_PARQUET(?) GROUP BY hour""",
],
inputs=kwtypes(params=list[str]),
)
@workflow
def parquet_wf(parquet_file: str) -> pd.DataFrame:
return duckdb_query(params=[parquet_file])
assert isinstance(
parquet_wf(parquet_file="<https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2022-02.parquet>"),
pd.DataFrame,
)
Tracking Issue
Fixes flyteorg/flyte#3246
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/
flyteorg/flytekit
✅ All checks have passed
30/30 successful checksGitHub
02/27/2023, 4:02 PMGitHub
02/27/2023, 4:27 PMGitHub
02/27/2023, 4:42 PMGitHub
02/27/2023, 4:48 PMGitHub
02/27/2023, 5:00 PMGitHub
02/27/2023, 5:03 PMGitHub
02/27/2023, 5:58 PM<https://github.com/flyteorg/flyte/tree/master|master>
by cosmicBboy
<https://github.com/flyteorg/flyte/commit/7f66e475b522745f3721170906db32f61d9af357|7f66e475>
- update swagger link in flyte demo (#3376)
flyteorg/flyteGitHub
02/27/2023, 8:21 PMGitHub
02/27/2023, 8:25 PMSdkType<T>
in Java is using JacksonSdkType
. For example
If a developer wants to define a task equivalent to (String str) -> str.toUpperCase() currently he/she needs to write it as
@AutoService(SdkRunnableTask.class)
public class ToUpperCaseTask extends SdkRunnableTask<ToUpperCaseTask.Input, ToUpperCaseTask.Output> {
public ToUpperCaseTask() {
super(JacksonSdkType.of(Input.class), JacksonSdkType.of(Output.class));
}
@AutoValue
public abstract static class Input {
public abstract SdkBindingData<String> in();
}
@AutoValue
public abstract static class Output {
public abstract SdkBindingData<String> out();
public static Output create(String out) {
return new AutoValue_ToUpperCaseTask_Output(out);
}
}
@Override
public Output run(Input input) {
return Output.create(SdkBindingDataFactory.of(input.get().toUpperCase()));
}
}
For such trivial task, defining the AutoValues to define the types takes the majority of the lines.
With this PR the developer could write the task this way instead
@AutoService(SdkRunnableTask.class)
public class ToUpperCaseTask extends SdkRunnableTask<SdkBindingData<String>, SdkBindingData<String>> {
public ToUpperCaseTask() {
super(SdkTypes.of(SdkLiteralTypes.strings(), "str"), SdkTypes.of(SdkLiteralTypes.strings(), "upr"));
}
@Override
public SdkBindingData<String> run(SdkBindingData<String> input) {
return input.get().toUpperCase();
}
}
Defining the types for the task now takes one line only and the run is simplified too as no need to create or access AutoValues
Tracking Issue
_Remove the '_fixes_' keyword if there will be multiple PRs to fix the linked issue_
fixes https://github.com/flyteorg/flyte/issues/
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/
flyteorg/flytekit-java
DCO: DCO
✅ 1 other check has passed
1/2 successful checksGitHub
02/27/2023, 9:57 PMlimits:
cpu: "2"
memory: 200Mi
requests:
cpu: "2"
memory: 200Mi
Removing this will make it
resources:
limits:
cpu: "0"
memory: "0"
requests:
cpu: "0"
memory: "0"
which means that there is no limit
Type
☑︎ Bug Fix
☐ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
How did you fix the bug, make the feature etc. Link to any design docs etc
Tracking Issue
_Remove the '_fixes_' keyword if there will be multiple PRs to fix the linked issue_
fixes https://github.com/flyteorg/flyte/issues/
Follow-up issue
NA
OR
https://github.com/flyteorg/flyte/issues/
flyteorg/flyteadmin
GitHub Actions: Build & Push Flyteadmin Image
GitHub Actions: Goreleaser
GitHub Actions: Build & Push Flytescheduler Image
GitHub Actions: Bump Version
✅ 8 other checks have passed
8/12 successful checksGitHub
02/27/2023, 10:09 PMfrom typing import List
from pydantic import BaseModel
class Foo(BaseModel):
count: int
size: float = None
Support as a valid transform
Describe alternatives you've considered
dataclasses are already supported, but they allow limited extensibility to schema extraction as they use marshmallow underneath
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
02/27/2023, 10:11 PMSdkType<T>
with low overhead
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☐ Smoke tested
☑︎ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
Currently the way to create custom SdkType<T>
in Java is using JacksonSdkType
. For example
If a developer wants to define a task equivalent to (String str) -> str.toUpperCase()
currently he/she needs to write it as
@AutoService(SdkRunnableTask.class)
public class ToUpperCaseTask extends SdkRunnableTask<ToUpperCaseTask.Input, ToUpperCaseTask.Output> {
public ToUpperCaseTask() {
super(JacksonSdkType.of(Input.class), JacksonSdkType.of(Output.class));
}
@AutoValue
public abstract static class Input {
public abstract String in();
}
@AutoValue
public abstract static class Output {
public abstract String out();
public static Output create(String out) {
return new AutoValue_ToUpperCaseTask_Output(out);
}
}
@Override
public Output run(Input input) {
return Output.create(input.toUpperCase()));
}
}
For such trivial task, defining the AutoValues to define the types takes the majority of the lines.
With this PR the developer could write the task this way instead
@AutoService(SdkRunnableTask.class)
public class ToUpperCaseTask extends SdkRunnableTask<String, String> {
public ToUpperCaseTask() {
super(SdkTypes.ofPrimitive("in", String.class), SdkTypes.ofPrimitive("out", String.class));
}
@Override
public String run(String input) {
return input.toUpperCase();
}
}
Defining the types for the task now takes one line only and the run
is simplified too as no need to create or access AutoValues
This PR supports creating simplified `SdkType`s for the following types:
• SdkTypes.ofPrimitive("in", Long.class)
for SdkType<Long>
• SdkTypes.ofPrimitive("in", Double.class)
for SdkType<Double>
• SdkTypes.ofPrimitive("in", String.class)
for SdkType<String>
• SdkTypes.ofPrimitive("in", Boolean.class)
for SdkType<Boolean>
• SdkTypes.ofPrimitive("in", Instant.class)
for SdkType<Instant>
• SdkTypes.ofPrimitive("in", Duration.class)
for SdkType<Duration>
Also support for collection and map for the the same primitive types:
• SdkTypes.ofCollection("in", Long.class)
for SdkType<List<Long>>
• SdkTypes.ofMap("in", Long.class)
for SdkType<Map<String, Long>>
And finally support for struts, collection of structs and map of struts:
• SdkTypes.ofStruct("in", JacksonSdkType.of(Input.class))
for SdkType<Input>
• SdkTypes.ofCollection("in", JacksonSdkType.of(Input.class))
for SdkType<List<Input>>
• SdkTypes.ofMap("in", JacksonSdkType.of(Input.class))
for SdkType<Map<String, Input>>
flyteorg/flytekit-java
✅ All checks have passed
2/2 successful checksGitHub
02/27/2023, 11:03 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/5e2baee6f2ba0c12ddc123c7848438fa59f7a5a3|5e2baee6>
- chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 (#675)
flyteorg/flyteconsoleGitHub
02/27/2023, 11:06 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/e843866da37e911753e24514c45abe18b4e79346|e843866d>
- chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 (#678)
flyteorg/flyteconsoleGitHub
02/27/2023, 11:08 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/92c8d9183e75963ba35ecac43633d8a20c40e3bf|92c8d918>
- [Snyk] Upgrade @typescript-eslint/eslint-plugin from 5.47.0 to 5.48.2 (#683)
flyteorg/flyteconsoleGitHub
02/27/2023, 11:24 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/fcb41b524138b7c0b34f931f2e9827420f842667|fcb41b52>
- [Snyk] Upgrade @typescript-eslint/parser from 5.47.0 to 5.48.2 (#684)
flyteorg/flyteconsoleGitHub
02/27/2023, 11:26 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/8a9386629fbcbb7292b54f53b90fa2c19c4d6f6d|8a938662>
- [Snyk] Upgrade prettier from 2.8.1 to 2.8.3 (#685)
flyteorg/flyteconsoleGitHub
02/27/2023, 11:28 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/9325daa90f3ab1a19ab2c369c38b6f3f01643828|9325daa9>
- [Snyk] Upgrade @types/morgan from 1.9.3 to 1.9.4 (#686)
flyteorg/flyteconsoleGitHub
02/27/2023, 11:33 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/131991703956894274ccf61272ffa4057358c430|13199170>
- [Snyk] Upgrade @types/react from 16.14.34 to 16.14.35 (#687)
flyteorg/flyteconsoleGitHub
02/28/2023, 12:15 AMGitHub
02/28/2023, 12:02 PMScreenshot from 2023-02-28 20-54-49▾
GitHub
02/28/2023, 1:53 PM@AutoService(SdkRunnableTask.class)
public class ToUpperCaseTask extends SdkRunnableTask<ToUpperCaseTask.Input, ToUpperCaseTask.Output> {
public ToUpperCaseTask() {
super(JacksonSdkType.of(Input.class), JacksonSdkType.of(Output.class));
}
@AutoValue
public abstract static class Input {
public abstract SdkBindingData<String> in();
}
@AutoValue
public abstract static class Output {
public abstract SdkBindingData<String> out();
public static Output create(String out) {
return new AutoValue_ToUpperCaseTask_Output(out);
}
}
@Override
public Output run(Input input) {
return Output.create(SdkBindingDataFactory.of(input.get().toUpperCase()));
}
}
For such trivial task, defining the AutoValues to define the types takes the majority of the lines, i.e. it is too verbose.
It would be cool to have an alternative way to express the same in a simpler way
Goal: What should the final outcome look like, ideally?
I would like to have a way to define this simple types as one liner: SdkTypes.of(SdkLiteralTypes.strings(), "str")
Describe alternatives you've considered
I have considered code generation, but it has a cost and it is better to explore first alternatives without it
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
02/28/2023, 1:58 PM<https://github.com/flyteorg/flytekit-java/tree/master|master>
by narape
<https://github.com/flyteorg/flytekit-java/commit/09dce86469eedd8da49da94feeb0297ab4196e20|09dce864>
- Literaltype as sdktype (#198)
flyteorg/flytekit-javaGitHub
02/28/2023, 2:09 PMGitHub
02/28/2023, 2:38 PM<https://github.com/flyteorg/flytekit-java/tree/master|master>
by narape
<https://github.com/flyteorg/flytekit-java/commit/746c4a32f10ce339c7ec36709c403a732ccca155|746c4a32>
- fix: upgrade info.picocli:picocli from 4.7.0 to 4.7.1 (#197)
flyteorg/flytekit-javaGitHub
02/28/2023, 2:52 PMGitHub
02/28/2023, 3:11 PMGitHub
02/28/2023, 3:12 PMreported_at
field on workflow, node, and task events to more accurately track executions.
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☑︎ Code documentation added
☐ Any pending items have an associated Issue
Complete description
^^^
Tracking Issue
flyteorg/flyte#3272
Follow-up issue
NA
flyteorg/flytepropeller
GitHub Actions: Build & Push Flytepropeller Image
GitHub Actions: Goreleaser
GitHub Actions: Bump Version
✅ 11 other checks have passed
11/14 successful checks