brash-london-45337
03/02/2022, 7:51 PM@task(cache=True, cache_version="1.0")
def submit_adhoc_spark_job(user: str) -> str:
...
But running the workflow several times, still seems to re-execute the submit_adhoc_spark_job
node. Following the docs, maintaining the “Project, Domain, Cache Version, Task Signature, and Inputs” should be caching the output, but this doesn’t seem to be the case. Is there some other configurations I’m missing?hallowed-mouse-14616
03/02/2022, 8:06 PMbrash-london-45337
03/02/2022, 8:31 PMhallowed-mouse-14616
03/02/2022, 8:42 PMflytectl get execution ...
?hallowed-mouse-14616
03/02/2022, 8:43 PMbrash-london-45337
03/02/2022, 8:44 PMbrash-london-45337
03/02/2022, 8:44 PMbrash-london-45337
03/02/2022, 8:45 PMn0
node is the one that I set to cache and confirmed from the logs it is indeed resubmitting the spark jobhallowed-mouse-14616
03/02/2022, 8:47 PMbrash-london-45337
03/02/2022, 8:48 PMlz4o8w2cwy
came first. They are the same as far as i can tell:hallowed-mouse-14616
03/02/2022, 8:50 PMbrash-london-45337
03/02/2022, 8:51 PM{
"config": {},
"id": {
"resourceType": 1,
"project": "ml-serving",
"domain": "development",
"name": "src.python.flyte.ml_serving.example.main.submit_adhoc_spark_job",
"version": "d31c33c58f2956fac010277b98aa24552cfc5b6f"
},
"type": "python-task",
"metadata": {
"discoverable": true,
"runtime": {
"type": 1,
"version": "0.26.0",
"flavor": "python"
},
"retries": {},
"discoveryVersion": "1.0"
},
"interface": {
"inputs": {
"variables": {
"user": {
"type": {
"simple": 3
},
"description": "user"
}
}
},
"outputs": {
"variables": {
"o0": {
"type": {
"simple": 3
},
"description": "o0"
}
}
}
},
"container": {
"command": [],
"args": [
"pyflyte-execute",
"--inputs",
"{{.input}}",
"--output-prefix",
"{{.outputPrefix}}",
"--raw-output-data-prefix",
"{{.rawOutputDataPrefix}}",
"--resolver",
"flytekit.core.python_auto_container.default_task_resolver",
"--",
"task-module",
"src.python.flyte.ml_serving.example.main",
"task-name",
"submit_adhoc_spark_job"
],
"env": [
{
"key": "FLYTE_INTERNAL_CONFIGURATION_PATH",
"value": "/app/flytekit.config"
},
{
"key": "FLYTE_INTERNAL_IMAGE",
"value": "<http://030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f|030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f>"
}
],
"config": [],
"ports": [],
"image": "<http://030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f|030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f>",
"resources": {
"requests": [],
"limits": []
}
}
}
hallowed-mouse-14616
03/02/2022, 8:54 PMhallowed-mouse-14616
03/02/2022, 8:58 PMbrash-london-45337
03/02/2022, 9:07 PMhallowed-mouse-14616
03/02/2022, 9:12 PMhallowed-mouse-14616
03/02/2022, 9:14 PMkubectl -n flyte get deployments datacatalog
and then check the FlytePropeller configmap for the catalog-cache
configuration.hallowed-mouse-14616
03/02/2022, 9:16 PMcatalog-cache:
endpoint: datacatalog:89
insecure: true
type: datacatalog
if the type
is set to noop
then datacatalog is entirely disabled.freezing-airport-6809
brash-london-45337
03/02/2022, 9:30 PMbrash-london-45337
03/02/2022, 10:11 PMexecution_id
for output directories, but it seems to come back as a multi-line str eg., print(flytekit.current_context().execution_id)
project: "ml-serving"
domain: "development"
name: "cnqve8n83g"
is there a more standard way of getting maybe just the “name” part of the execution_id?hallowed-mouse-14616
03/02/2022, 10:13 PMfreezing-airport-6809
.name
work?high-accountant-32689
03/02/2022, 11:04 PM.name
should work.brash-london-45337
03/02/2022, 11:08 PM.name()
as I was looking at the identifierbrash-london-45337
03/02/2022, 11:08 PM