GitHub
02/06/2024, 11:21 PMGitHub
02/08/2024, 5:56 PMnode_dependency_hints
for dynamic tasks by @Tom-Newton in #2015
• Warn user when overriding requests but not limits by @fg91 in #2151
• Improve error message for pyflyte run by @pingsutw in #2142
• Remove upper version bound from protobuf by @pingsutw in #2144
• Agent Metadata Servicer by @Future-Outlier in #2012
• Fix: Improve error handling in workflow compilation when output binding fails by @fg91 in #2047
• Add metadata to literal by @pingsutw in #2147
• Add _literal_map_to_python_input to base task by @pingsutw in #2150
• Fix: Allow both '*_test.py' and '*test_.py' test module naming convention for nested tasks by @fg91 in #2155
• Fix: Limit grpcio version in flytekit-identity-aware-proxy due to regression by @fg91 in #2156
• [Refactor] Rename flyin to flyteinteractive by @MortalHappiness in #2157
• install latest flyteidl with monodocs build by @cosmicBboy in #2162
• Bump fastapi from 0.108.0 to 0.109.1 by @dependabot in #2161
• Replace Agent State with Agent Phase by @Future-Outlier in #2123
• install latest flyteidl with monodocs build by @cosmicBboy in #2163
• Bump pillow from 10.0.1 to 10.2.0 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2125
• Remove docs gh workflows by @eapolinario in #2164
• Remove kubernetes from dependencies by @pingsutw in #2148
• Fix lint error caused by #2164 by @eapolinario in #2166
• Add support Union[FlyteDirectory, FlyteFile] by @pingsutw in #2149
• Bump cryptography from 41.0.6 to 42.0.0 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2165
• Remove upper version bound from fsspec by @pingsutw in #2143
• Use python3.8 to generate dev-requirements.txt for Great Expectations by @pingsutw in #2168
• Modify recursive paths by @wild-endeavor in #2121
• Force tests in a module to be run by the same worker by @eapolinario in #2177
• art id proto by @wild-endeavor in #1928
• add env vars option in pyflyte package by @fiedlerNr9 in #2171
New Contributors
• @neilisaur made their first contribution in #2131
• @MortalHappiness made their first contribution in #2157
Full Changelog: v1.10.3...v1.10.3b7
flyteorg/flytekitGitHub
02/08/2024, 6:03 PM1.10.7
number so as to match the coming main flyte release.
Below are the original release notes:
What's Changed
• Extends ImageSpec to accept image names from plugin and have priority for plugins by @thomasjpfan in #2119
• Use logger in data_persistence.py by @eapolinario in #2129
• Adjust tar method to iterate over files/dirs in dir rather than strip… by @neilisaur in #2131
• Include exception type in error messages by @Tom-Newton in #2130
• Adds get_default_image into configuration plugin by @thomasjpfan in #2133
• Add 3.12 as classifier by @honnix in #2135
• Fixing copy-all version of tar file creation as well by @neilisaur in #2134
• Bump pillow from 10.1.0 to 10.2.0 in /plugins/flytekit-onnx-pytorch by @dependabot in #2127
• Bump aiohttp from 3.8.6 to 3.9.2 by @dependabot in #2137
• Bump aiohttp from 3.9.1 to 3.9.2 in /plugins/flytekit-spark by @dependabot in #2140
• Bump aiohttp from 3.9.0 to 3.9.2 in /plugins/flytekit-airflow by @dependabot in #2139
• Bump aiohttp from 3.9.1 to 3.9.2 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2138
• Bump pillow from 10.1.0 to 10.2.0 in /plugins/flytekit-onnx-tensorflow by @dependabot in #2126
• Envvars local execution by @eapolinario in #2132
• node_dependency_hints
for dynamic tasks by @Tom-Newton in #2015
• Warn user when overriding requests but not limits by @fg91 in #2151
• Improve error message for pyflyte run by @pingsutw in #2142
• Remove upper version bound from protobuf by @pingsutw in #2144
• Agent Metadata Servicer by @Future-Outlier in #2012
• Fix: Improve error handling in workflow compilation when output binding fails by @fg91 in #2047
• Add metadata to literal by @pingsutw in #2147
• Add _literal_map_to_python_input to base task by @pingsutw in #2150
• Fix: Allow both '*_test.py' and '*test_.py' test module naming convention for nested tasks by @fg91 in #2155
• Fix: Limit grpcio version in flytekit-identity-aware-proxy due to regression by @fg91 in #2156
• [Refactor] Rename flyin to flyteinteractive by @MortalHappiness in #2157
• install latest flyteidl with monodocs build by @cosmicBboy in #2162
• Bump fastapi from 0.108.0 to 0.109.1 by @dependabot in #2161
• Replace Agent State with Agent Phase by @Future-Outlier in #2123
• install latest flyteidl with monodocs build by @cosmicBboy in #2163
• Bump pillow from 10.0.1 to 10.2.0 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2125
• Remove docs gh workflows by @eapolinario in #2164
• Remove kubernetes from dependencies by @pingsutw in #2148
• Fix lint error caused by #2164 by @eapolinario in #2166
• Add support Union[FlyteDirectory, FlyteFile] by @pingsutw in #2149
• Bump cryptography from 41.0.6 to 42.0.0 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2165
• Remove upper version bound from fsspec by @pingsutw in #2143
• Use python3.8 to generate dev-requirements.txt for Great Expectations by @pingsutw in #2168
• Modify recursive paths by @wild-endeavor in #2121
• Force tests in a module to be run by the same worker by @eapolinario in #2177
• art id proto by @wild-endeavor in #1928
• add env vars option in pyflyte package by @fiedlerNr9 in #2171
New Contributors
• @neilisaur made their first contribution in #2131
• @MortalHappiness made their first contribution in #2157
Full Changelog: v1.10.3...v1.10.3b7
flyteorg/flytekitGitHub
02/14/2024, 4:01 AMGitHub
02/14/2024, 7:04 PMnode_dependency_hints
for dynamic tasks by @Tom-Newton in #2015
• Warn user when overriding requests but not limits by @fg91 in #2151
• Improve error message for pyflyte run by @pingsutw in #2142
• Remove upper version bound from protobuf by @pingsutw in #2144
• Agent Metadata Servicer by @Future-Outlier in #2012
• Fix: Improve error handling in workflow compilation when output binding fails by @fg91 in #2047
• Add metadata to literal by @pingsutw in #2147
• Add _literal_map_to_python_input to base task by @pingsutw in #2150
• Fix: Allow both '*_test.py' and '*test_.py' test module naming convention for nested tasks by @fg91 in #2155
• Fix: Limit grpcio version in flytekit-identity-aware-proxy due to regression by @fg91 in #2156
• [Refactor] Rename flyin to flyteinteractive by @MortalHappiness in #2157
• install latest flyteidl with monodocs build by @cosmicBboy in #2162
• Bump fastapi from 0.108.0 to 0.109.1 by @dependabot in #2161
• Replace Agent State with Agent Phase by @Future-Outlier in #2123
• install latest flyteidl with monodocs build by @cosmicBboy in #2163
• Bump pillow from 10.0.1 to 10.2.0 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2125
• Remove docs gh workflows by @eapolinario in #2164
• Remove kubernetes from dependencies by @pingsutw in #2148
• Fix lint error caused by #2164 by @eapolinario in #2166
• Add support Union[FlyteDirectory, FlyteFile] by @pingsutw in #2149
• Bump cryptography from 41.0.6 to 42.0.0 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2165
• Remove upper version bound from fsspec by @pingsutw in #2143
• Use python3.8 to generate dev-requirements.txt for Great Expectations by @pingsutw in #2168
• Modify recursive paths by @wild-endeavor in #2121
• Force tests in a module to be run by the same worker by @eapolinario in #2177
• art id proto by @wild-endeavor in #1928
• add env vars option in pyflyte package by @fiedlerNr9 in #2171
• Build python 3.12 default flytekit image by @eapolinario in #2181
• Fix airflow sensor by @pingsutw in #2169
• Remove asynchronous flag from base agent by @pingsutw in #2141
• Update Base Agent Logs by Rich Progress Feature by @Future-Outlier in #2159
• Fix Base Agent Input Bug by @Future-Outlier in #2186
• Bump grpcio from 1.53.0 to 1.53.2 in /tests/flytekit/integration/remote/mock_flyte_repo/workflows by @dependabot in #2187
• Disable rich logger in the default image by @pingsutw in #2185
• Enhance error logging in pyflyte by @ddl-rliu in #2190
• Do not lazy-load pyspark.ml by @eapolinario in #2184
• Bump flyteidl to 1.10.7 by @eapolinario in #2192
New Contributors
• <https:/…
flyteorg/flytekitGitHub
02/14/2024, 9:00 PMGitHub
02/16/2024, 12:04 PMGitHub
02/17/2024, 7:38 AMpyflyte run --remote
on this file:
from flytekit import task, workflow
from flytekit.image_spec import ImageSpec
from flytekitplugins.flyin import vscode
image_spec2 = ImageSpec(
requirements="requirements.txt", # python packages to install
registry="<http://ghcr.io/enghabu/test-flytekit|ghcr.io/enghabu/test-flytekit>", builder="envd")
@task(container_image=image_spec2)
@vscode(port=8080)
def say_hello(name: str) -> str:
return f"hello {name}!"
@workflow
def wf2(name: str = "union"):
say_hello(name=name)
2. Update the requirements.txt and run again
Error:
RPC Failed, with Status: StatusCode.INVALID_ARGUMENT
details: task with different structure already exists with id resource_type:TASK project:"flytesnacks" domain:"development" name:"simple_wf.say_hello" version:"a-nzT1l6aEAKm2MYRWfEzg"
Debug string UNKNOWN:Error received from peer ipv4:3.130.217.183:443 {grpc_message:"task with different structure already exists with id resource_type:TASK project:\"flytesnacks\" domain:\"development\" name:\"simple_wf.say_hello\" version:\"a-nzT1l6aEAKm2MYRWfEzg\" ", grpc_status:3, created_time:"2024-02-14T20:11:29.658756-08:00"}
Expected behavior
Task version should have changed because the image has changed...
It seems that we compute the digest of the task before creating/uploading the imageSpec which leads to us only discovering the image id pretty much too late into the registration process? Speculating
Additional context to reproduce
cc @jpvotta
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
02/20/2024, 7:34 PMdocker build
command does not have the --push
flag. It should be two separate commands docker build
and docker push
or single docker buildx build
command.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteDylan Wilder
02/22/2024, 1:50 PMNikki Everett
02/22/2024, 9:41 PMGitHub
02/29/2024, 7:00 AMGitHub
02/29/2024, 10:03 PMGitHub
03/04/2024, 9:53 AMcontainer_image
as an example for a task node override:
@task
def foo() -> str:
return "foo"
@workflow
def wf():
foo().with_overrides(
container_image="{{.image.default.fqn}}:" + "9410d0f0ed4a25577ab35a79bd3eb1119d8627d59c7a8fb947d42ca9fb46a61c"
)
If one runs this workflow with pyflyte run
, everything works as expected. If, however, one uses pyflyte register
and runs it manually from the UI, the default image is used.
* * *
@workflow
def wf():
foo().with_overrides(container_image="ubuntu:foo")
foo().with_overrides(container_image="ubuntu:bar")
Another problem is that even if one uses pyflyte run
, in this example both task nodes will use the image "ubuntu:bar"
.
Expected behavior
The reason for this behaviour is that only resource/extended resource overrides are treated as workflow-level task-node overrides which are registered separately from the task themselves and then overridden in the backend.
Other overrides such as container_image
override the task-level metadata before the registration process. These can only be used once as "the last override wins" and appear to not work with pyflyte register
.
The proper fix is to treat all overrides as task node overrides in flyteidl instead of overriding task-level metadata before registration.
* * *
☑︎ Container image
• #4858
• flyteorg/flytekit#2176
☐ ...
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
03/06/2024, 12:57 AMGitHub
03/06/2024, 8:00 AM--overwrite-cache
in pyflyte backfill
cli by @ChungYujoyce in #2214
• Add default_project
in register_launch_plan
by @ChungYujoyce in #2215
• Fix Monodocs build by @pingsutw in #2235
• Enhance default image to check FLYTE_INTERNAL_IMAGE by @ddl-rliu in #2223
• Define a str method for FlyteException by @noahjax in #2203
New Contributors
• @noahjax made their first contribution in #2203
Full Changelog: v1.10.8b1...v1.10.8b2
flyteorg/flytekitGitHub
03/06/2024, 6:23 PMGitHub
03/07/2024, 12:04 AMYee
GitHub
03/10/2024, 11:31 AMpyflyte backfill
command.
Ideally when running a backfill, it would be possible to ignore cache for all executions in that backfill. Backfills are sometimes necessary when upstream data had a data quality issue which was later fixed, requiring downstream jobs to re-run on the restated data.
Goal: What should the final outcome look like, ideally?
The pyflyte backfill
command should have a flag named something like --ignore-cache
or --overwrite-cached-outputs
that behaves similar to the web UI when selecting "Overwrite cached outputs".
Describe alternatives you've considered
I don't know of simple alternatives to running a backfill from the command line that has an option to overwrite the cache.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
03/10/2024, 11:32 AMTraceback (most recent call last):
File "/opt/venv/bin/entrypoint.py", line 16, in <module>
from flytekit.configuration import (
File "/opt/venv/lib/python3.9/site-packages/flytekit/__init__.py", line 305, in <module>
load_implicit_plugins()
File "/opt/venv/lib/python3.9/site-packages/flytekit/__init__.py", line 301, in load_implicit_plugins
p.load()
File "/opt/venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 184, in load
module = import_module(match.group('module'))
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/opt/venv/lib/python3.9/site-packages/flytekitplugins/spark/__init__.py", line 21, in <module>
from .pyspark_transformers import PySparkPipelineModelTransformer
File "/opt/venv/lib/python3.9/site-packages/flytekitplugins/spark/pyspark_transformers.py", line 7, in <module>
pyspark_ml = lazy_module("<http://pyspark.ml|pyspark.ml>")
File "/opt/venv/lib/python3.9/site-packages/flytekit/lazy_import/lazy_module.py", line 41, in lazy_module
loader = importlib.util.LazyLoader(spec.loader)
File "/usr/lib/python3.9/importlib/util.py", line 282, in __init__
self.__check_eager_loader(loader)
File "/usr/lib/python3.9/importlib/util.py", line 273, in __check_eager_loader
raise TypeError('loader must define exec_module()')
This doesn't repro locally (i.e. when only installing flytekitplugins-spark==1.10.3
and interacting with a python interpreter or running local tasks).
Expected behavior
Lazy-loading modules should work in all cases, including when running Spark tasks.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
03/10/2024, 11:35 AM# main.py
import subprocess
import typing
# copy-paste from <https://github.com/flyteorg/flytekit/blob/f16ac4910043a56de235d8dc1383996b6ddd13ef/flytekit/extras/tasks/shell.py#L102-L123>
def _run_script(script) -> typing.Tuple[int, str, str]:
process = subprocess.Popen(script, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=0, shell=True, text=True)
out = ""
for line in process.stdout:
print(line)
out += line
code = process.wait()
return code, out, process.stderr.read()
print(_run_script("python error_creator.py"))
# error_creator.py
import sys
for i in range(200000):
sys.stderr.write("This is an error message\n")
print("This is the output of the program")
Notice that running python error_creator.py
on its own finishes instantly, but running python main.py
hangs.
If you reduce the number of iterations in error_creator.py
to 2000, you'll see that python main.py
finishes instantly too.
I originally found this issue in a production Flyte deployment, and used strace
and lldb
to verify that the issue is caused by the pipe filling up. Manually reading from the pipe got rid of the deadlock in my case. I ran a command like: cat /proc/12484/fd/2
flyteorg/flyteGitHub
03/10/2024, 11:37 AMGitHub
03/10/2024, 12:01 PMShellTask
where users are unable to retrieve standard output (stdout) and standard error (stderr) from executed ShellTasks directly. Users would have to write both channels to the disk, reading them back to retrieve the command result, and finally deleting them. All those are unnecessary steps which increase disk accesses and make the task writing heavier.
Goal: What should the final outcome look like, ideally?
Expected Behavior:
Users should have the capability to access the stdout and stderr outputs generated during the execution of a ShellTask.
Describe alternatives you've considered
I've considered implementing a custom ShellTask
class that would be 99% the same code as flytekit.extras.tasks.shell.ShellTask
... Not very satisfying...
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
03/12/2024, 4:23 AMGitHub
03/12/2024, 7:28 AMGitHub
03/12/2024, 7:44 AM--overwrite-cache
in pyflyte backfill
cli by @ChungYujoyce in #2214
• Add default_project
in register_launch_plan
by @ChungYujoyce in #2215
• Fix Monodocs build by @pingsutw in #2235
• Enhance default image to check FLYTE_INTERNAL_IMAGE by @ddl-rliu in #2223
• Define a str method for FlyteException by @noahjax in #2203
• Remove trailing '=' character from generated image tag hash value by @jasonlai1218 in #2240
• Simpler dev images by @eapolinario in #2243
• Only have one flyteidl entry in pyproject toml by @thomasjpfan in #2245
• DOCS-301 update flytekit links to monodocs by @ppiegaze in #2210
• Output metadata tracking by @wild-endeavor in #2221
• Define python versions a priori and schedule CI runs once a day by @eapolinario in #2237
• Simplify version calculation by @eapolinario in #2247
• Adds a FlyteDeck markdown renderer to core by @thomasjpfan in #2246
• DeleteTask should return DeleteTaskResponse by @pingsutw in #2251
• Promote source code renderer and use it by default by @thomasjpfan in #2248
• Refactor Databricks Agent Phase by @Future-Outlier in #2244
• feat(resources): support serialization of Resources dataclasses by @cameronraysmith in #2250
• Fix sql alchemy plugin error module '<http://pandas.io|pandas.io>' has no attribute 'common'
by @Future-Outlier in #2249
• ShellTask: stores process return code, stdout and stderr for later use. by @benoistlaurent in #2229
• Bump flyteidl for spark plugin by @pingsutw in #2253
• Rewrite test to work better on all platforms by @eapolinario in #2255
• Support pydantic plugin in 2.xx version by @Future-Outlier in #2217
• Fix FlyteFS by @pingsutw in #2208
New Contributors
• @noahjax made their first contribution in #2203
• @cameronraysmith made their first contribution in <https://github.com/flyteorg/flyt…
flyteorg/flytekitGitHub
03/14/2024, 3:04 AMGitHub
03/15/2024, 9:34 PMimport pickle
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config
remote = FlyteRemote(
config=Config.auto(),
default_project="flytesnacks",
default_domain="development"
)
pickle.dumps(remote)
AttributeError: Can't pickle local object 'get_flyte_fs.<locals>._FlyteFS'
The issue stems the nested class here:
https://github.com/flyteorg/flytekit/blob/d61e79e722875348b1ccd354e1076fcf12600053/flytekit/remote/remote_fs.py#L91
I am using python 3.12.1, flytekit==1.11.0.
Expected behavior
flytekit remote objects are serializable with pickle
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
03/18/2024, 10:44 PM@task(cache=True, cache_version="v1")
def t(log_level: int, a: str) -> str:
...
According to the docs, one of the inputs to cache key calculation is the task signature, but in the case of this example, it'd be great if we could ignore log_level
as part of the cache key calculation.
Goal: What should the final outcome look like, ideally?
We should be able to do something along the lines of:
@task(cache=True, cache_version="v1", ignore_input_vars=["log_level"])
def t(log_level: int, a: str) -> str:
...
This would essentially skip some of the parameters for cache key calculation purposes.
Describe alternatives you've considered
We have the ability to override the hashing mechanism used to translate python types into Flyte types, as described in https://docs.flyte.org/projects/cookbook/en/latest/auto/core/flyte_basics/task_cache.html#caching-of-non-flyte-offloaded-objects.
One could use this idea and provide constant hashes for the arguments they want to ignore, for example:
def constant_function(x: int) -> str:
return "const"
@task
def t_produce_annotated_literals() -> Annotated[int, HashMethod(constant_function)]:
log_level = ...
return log_level
@task(cache=True, cache_version="v1")
def t(log_level: int, a: str) -> str:
...
@workflow
def wf() -> str:
log_level = t_produce_annotated_literals()
return t(log_level=log_level, a="some string")
Propose: Link/Inline OR Additional context
Expose ignore_input_vars
in the @task
decorator and ensure the new interface is used during cache key calculation in both local and remote executions.
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyte