GitHub
05/06/2023, 2:00 AMGitHub
05/06/2023, 5:40 AM<https://github.com/flyteorg/flytekit/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flytekit/commit/35e52ef669dcfd682f53c4b2bdbed628838945e1|35e52ef6>
- External Plugin Service (grpc) (#1524)
flyteorg/flytekitGitHub
05/06/2023, 5:40 AMGitHub
05/06/2023, 5:59 AMGitHub
05/06/2023, 8:43 AMpyflyte register workflows --image <http://ghcr.io/flyteorg/flytekit:py3.11-sqlalchemy-1.6.0b0|ghcr.io/flyteorg/flytekit:py3.11-sqlalchemy-1.6.0b0>
This image exist https://github.com/flyteorg/flytekit/pkgs/container/flytekit/86821037?tag=py3.11-sqlalchemy-1.6.0b0
I can see that the workflow is registered. However, when i look at the image, it is defaulted to
<http://cr.flyte.org/flyteorg/flytekit:py3.11-sqlalchemy-1.5.0|cr.flyte.org/flyteorg/flytekit:py3.11-sqlalchemy-1.5.0>
might have been a parsing error here, missing out gr in cr.flyte.org/flyteorg/flytekit:py3.11-sqlalchemy-1.5.0
Please have a look at the screenshot below:-
Expected behavior
workflow should be using the image given in the cli
Additional context to reproduce
1. Use the sample code that was given here https://raw.githubusercontent.com/flyteorg/flytesnacks/master/cookbook/integrations/flytekit_plugins/sql/sql_alchemy.py
2. Then try to register the workflow using "pyflyte register workflows --image ghcr.io/flyteorg/flytekit:py3.11-sqlalchemy-1.6.0b"
3. Go to the web ui and look at the task details, expand the tree, goto -> container -> image name
Screenshots
flyte-image-web▾
register-flyte-workflow▾
GitHub
05/06/2023, 4:13 PMfrom flytekit import workflow, task
import logging
logging.basicConfig(level="DEBUG")
logger = logging.getLogger("wf")
logger.setLevel("DEBUG")
@task
def task_1():
<http://logger.info|logger.info>("task_1")
@task
def task_2():
<http://logger.info|logger.info>("task_2")
@workflow
def sub_workflow_1():
task_1()
@workflow
def sub_workflow_2():
task_2()
@workflow
def parent_workflow():
s1 = sub_workflow_1()
s2 = sub_workflow_2()
s2 >> s1
if __name__ == "__main__":
parent_workflow()
Output is:
INFO:wf:task_1
INFO:wf:task_2
Expected output is:
INFO:wf:task_2
INFO:wf:task_1
Screenshots
When registered with a Flyte cluster, the DAG is correct.
image▾
GitHub
05/07/2023, 10:43 AM❯ flytectl demo start
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
W0507 10:37:51.431316 66838 loader.go:221] Config not found: /home/compute/.flyte/k3s/k3s.yaml
🧑🏭 Bootstrapping a brand new flyte cluster... 🔨 🔧
🐋 Going to use Flyte v1.5.0 release with image <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-bc521dac6f0dc6e1942efa6447611cd3354540c4|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-bc521dac6f0dc6e1942efa6447611cd3354540c4>
🐋 pulling docker image for release <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-bc521dac6f0dc6e1942efa6447611cd3354540c4|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-bc521dac6f0dc6e1942efa6447611cd3354540c4>
🧑🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
W0507 10:38:10.078019 66838 loader.go:221] Config not found: /home/compute/.flyte/k3s/k3s.yaml
W0507 10:38:10.078495 66838 loader.go:221] Config not found: /home/compute/.flyte/k3s/k3s.yaml
Error: open /home/compute/.flyte/k3s/k3s.yaml.lock: permission denied
{"json":{},"level":"error","msg":"open /home/compute/.flyte/k3s/k3s.yaml.lock: permission denied","ts":"2023-05-07T10:38:10Z"}
❯ cat ~/.flyte/k3s/k3s.yaml
cat: /home/compute/.flyte/k3s/k3s.yaml: No such file or directory
❯ ll ~/.flyte/k3s/
total 0
lrwxrwxrwx 1 root root 32 May 7 10:38 k3s.yaml -> /var/lib/flyte/config/kubeconfig
Expected behavior
~/.flyte/k3s/k3s.yaml is created by flytectl demo start
Additional context to reproduce
The host is ubuntu linux
❯ flytectl version
{
"App": "flytectl",
"Build": "bd6b856",
"Version": "0.6.36",
"BuildTime": "2023-05-07 10:43:00.839982826 +0000 UTC m=+0.029162480"
}%
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/08/2023, 1:29 PMuserID
and rate
per user ID. rate
is a native Golang package to track total count of requests.
• The in-memory storage is cleaned periodically to reduce memory footprint.
• The component, which is considered as part of the security scope, relies on the identity of the user provided by auth
package. In fact, the best way to uniquely identify requests to rate limit is to track usage per authenticated user.
Tradeoff
• To keep it simple, an in-memory map storage is used to track rate per user. This becomes inaccurate if multiple instances of flyteadmin
is deployed. If this is the case and we are serious about rate limit, another improvement is needed, like introduction of Redis for example, which is out of scopes for this PR.
Tracking Issue
_Remove the '_fixes_' keyword if there will be multiple PRs to fix the linked issue_
fixes flyteorg/flyte#327
Follow-up issue
NA
flyteorg/flyteadmin
✅ All checks have passed
2/2 successful checksGitHub
05/08/2023, 4:00 PMGitHub
05/08/2023, 4:00 PMGitHub
05/08/2023, 4:06 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/e28a142e966cf689fd8b51ba9049763626dc2eba|e28a142e>
- Add support fetching description entity (#735)
flyteorg/flyteconsoleGitHub
05/08/2023, 4:06 PMimage▾
image▾
image▾
GitHub
05/08/2023, 4:24 PMPipelineModel
transformer works when running spark locally, but is failing to load inside the k8s spark plugin. This is likely because the transformer downloads the pipeline to the driver and it is NOT available on the workers.
Likely we need to save and load directly to the remote path.
Expected behavior
The type transformer shouldn't fail.
Additional context to reproduce
from flytekit import workflow, task, StructuredDataset
from flytekitplugins.spark import Spark
import flytekit
import pandas as pd
from typing import Tuple
from pyspark.ml.feature import StringIndexer
from pyspark.ml.pipeline import Pipeline, PipelineModel
@task(cache=True, cache_version="1.0")
def create_df() -> pd.DataFrame:
return pd.DataFrame(
data={
"id": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"cat": ["a", "b", "c", "a", "b", "c", "a", "b", "c"],
"num": [1, 2, 3, 4, 5, 6, 7, 8, 9],
})
@task(task_config=Spark(
spark_conf={
"spark.driver.memory": "4g",
"spark.executor.memory": "2g",
"spark.executor.instances": "1",
"spark.driver.cores": "2",
"spark.executor.cores": "1",
}
), cache=True, cache_version="1.0")
def save_pipeline(df: pd.DataFrame) -> Tuple[pd.DataFrame, PipelineModel]:
spark = flytekit.current_context().spark_session
spark_df = spark.createDataFrame(df)
cat_indexer = StringIndexer(inputCol="cat", outputCol="cat_index")
pipeline = Pipeline(stages=[cat_indexer])
fitted_pipeline = pipeline.fit(spark_df)
spark_df_transformed = fitted_pipeline.transform(spark_df)
return (
spark_df_transformed.toPandas(),
fitted_pipeline
)
@task(task_config=Spark(
spark_conf={
"spark.driver.memory": "4g",
"spark.executor.memory": "2g",
"spark.executor.instances": "1",
"spark.driver.cores": "2",
"spark.executor.cores": "1",
}
))
def compare_dfs(pipeline: PipelineModel, df: pd.DataFrame, expected: pd.DataFrame):
spark = flytekit.current_context().spark_session
spark_df = spark.createDataFrame(df)
df_transformed = pipeline.transform(spark_df).toPandas()
dataframes_equal = expected.sort_values(by=["id"]).equals(
df_transformed.sort_values(by=["id"]).reset_index(drop=True)
)
assert dataframes_equal, "Dataframes are not equal"
@workflow
def test_pipeline_transformer():
df = create_df()
df_transformed, pipeline = save_pipeline(df=df)
compare_dfs(pipeline=pipeline, df=df, expected=df_transformed)
[3/3] currentAttempt done. Last Error: SYSTEM::Traceback (most recent call last):
File "/opt/venv/lib/python3.9/site-packages/flytekit/exceptions/scopes.py", line 165, in system_entry_point
return wrapped(*args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/flytekit/core/base_task.py", line 518, in dispatch_execute
native_inputs = TypeEngine.literal_map_to_kwargs(exec_ctx, input_literal_map, self.python_interface.inputs)
File "/opt/venv/lib/python3.9/site-packages/flytekit/core/type_engine.py", line 867, in literal_map_to_kwargs
return {k: TypeEngine.to_python_value(ctx, lm.literals[k], python_types[k]) for k, v in lm.literals.items()}
File "/opt/venv/lib/python3.9/site-packages/flytekit/core/type_engine.py", line 867, in <dictcomp>
return {k: TypeEngine.to_python_value(ctx, lm.literals[k], python_types[k]) for k, v in lm.literals.items()}
File "/opt/venv/lib/python3.9/site-packages/flytekit/core/type_engine.py", line 831, in to_python_value
return transformer.to_python_value(ctx, lv, expected_python_type)
File "/opt/venv/lib/python3.9/site-packages/flytekitplugins/spark/pyspark_transformers.py", line 42, in to_python_value
return PipelineModel.load(local_dir)
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 353, in load
return cls.read().load(path)
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 282, in load
metadata = DefaultParamsReader.loadMetadata(path, <http://self.sc|self.sc>)
File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 565, in loadMetadata
metadataStr = sc.textFile(metadataPath, 1).first()
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1906, in first
raise ValueError("RDD is empty")
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
05/08/2023, 4:26 PMGitHub
05/08/2023, 4:28 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flytekit/commit/7efde40e1c396ed461ca16958462056f6af09938|7efde40e>
- Include traceback in errors from admin (#1621)
flyteorg/flytekitGitHub
05/08/2023, 4:46 PMGitHub
05/08/2023, 5:33 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by eapolinario
<https://github.com/flyteorg/flyteadmin/commit/00915ec5bc25b2dbe739767d5e060e988f6402e2|00915ec5>
- Add migration to turn parent_id
column into bigint
only if necessary (#554)
flyteorg/flyteadminGitHub
05/08/2023, 5:57 PMGitHub
05/08/2023, 6:16 PMparent_id
column into bigint
only if necessary (#554)
flyteorg/flyteadminGitHub
05/08/2023, 6:36 PMGitHub
05/08/2023, 7:13 PM<https://github.com/flyteorg/flyteadmin/tree/master|master>
by eapolinario
<https://github.com/flyteorg/flyteadmin/commit/507ad4497beefe0e157515617027e5a8a2bd8483|507ad449>
- Added start time for supporting restarts for fixed rate schedules (#476)
flyteorg/flyteadminGitHub
05/08/2023, 7:13 PMGitHub
05/08/2023, 7:13 PMGitHub
05/08/2023, 7:48 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flytekit/commit/ca4676103d8e5b41e32d507b88d4faaa9d276a6d|ca467610>
- Add support for copying all the files in source root (#1622)
flyteorg/flytekitGitHub
05/08/2023, 7:49 PMGitHub
05/08/2023, 8:03 PM<https://github.com/flyteorg/flytekit/tree/master|master>
by peridotml
<https://github.com/flyteorg/flytekit/commit/35bb5561537e45a32c6ae328684fe09cea318786|35bb5561>
- fix PipelineModel transformer issue 3648 (#1623)
flyteorg/flytekitGitHub
05/08/2023, 8:28 PMflytectl get execution
will output:
failed to convert to a known Literal. Input Type [union_type:..] not supported
Type
☐ Bug Fix
☑︎ Feature
☐ Plugin
Are all requirements met?
☑︎ Code completed
☑︎ Smoke tested
☑︎ Unit tests added
☐ Code documentation added
☐ Any pending items have an associated Issue
Complete description
For creating making literal for union type, the code will find the first one that matches the input value. If the input is 1.0
and union type is Union(Integer, Float), integer will be used since 1.0 will match with integer.
For default generation of a union type, I just use the first variants possible. For example, if the union contains
Union(String, Integer), then default value generated when creating an execFile will be "" not 0.
Tested with flytectl commands, get task and create execution is working fine
Tracking Issue
fixes flyteorg/flyte#3094
flyteorg/flyteidl
✅ All checks have passed
13/13 successful checksGitHub
05/08/2023, 8:33 PM<https://github.com/flyteorg/flyteplugins/tree/master|master>
by pingsutw
<https://github.com/flyteorg/flyteplugins/commit/76a80ec5b7240d16fe5a1a190d51d8b784b88115|76a80ec5>
- Add nil check in databricks plugin (#332)
flyteorg/flytepluginsGitHub
05/08/2023, 8:34 PMGitHub
05/08/2023, 8:39 PM