Brian Tang
03/02/2022, 7:51 PM@task(cache=True, cache_version="1.0")
def submit_adhoc_spark_job(user: str) -> str:
...
But running the workflow several times, still seems to re-execute the submit_adhoc_spark_job
node. Following the docs, maintaining the “Project, Domain, Cache Version, Task Signature, and Inputs” should be caching the output, but this doesn’t seem to be the case. Is there some other configurations I’m missing?Dan Rammer (hamersaw)
03/02/2022, 8:06 PMBrian Tang
03/02/2022, 8:31 PMDan Rammer (hamersaw)
03/02/2022, 8:42 PMflytectl get execution ...
?Brian Tang
03/02/2022, 8:44 PMn0
node is the one that I set to cache and confirmed from the logs it is indeed resubmitting the spark jobDan Rammer (hamersaw)
03/02/2022, 8:47 PMBrian Tang
03/02/2022, 8:48 PMlz4o8w2cwy
came first. They are the same as far as i can tell:Dan Rammer (hamersaw)
03/02/2022, 8:50 PMBrian Tang
03/02/2022, 8:51 PM{
"config": {},
"id": {
"resourceType": 1,
"project": "ml-serving",
"domain": "development",
"name": "src.python.flyte.ml_serving.example.main.submit_adhoc_spark_job",
"version": "d31c33c58f2956fac010277b98aa24552cfc5b6f"
},
"type": "python-task",
"metadata": {
"discoverable": true,
"runtime": {
"type": 1,
"version": "0.26.0",
"flavor": "python"
},
"retries": {},
"discoveryVersion": "1.0"
},
"interface": {
"inputs": {
"variables": {
"user": {
"type": {
"simple": 3
},
"description": "user"
}
}
},
"outputs": {
"variables": {
"o0": {
"type": {
"simple": 3
},
"description": "o0"
}
}
}
},
"container": {
"command": [],
"args": [
"pyflyte-execute",
"--inputs",
"{{.input}}",
"--output-prefix",
"{{.outputPrefix}}",
"--raw-output-data-prefix",
"{{.rawOutputDataPrefix}}",
"--resolver",
"flytekit.core.python_auto_container.default_task_resolver",
"--",
"task-module",
"src.python.flyte.ml_serving.example.main",
"task-name",
"submit_adhoc_spark_job"
],
"env": [
{
"key": "FLYTE_INTERNAL_CONFIGURATION_PATH",
"value": "/app/flytekit.config"
},
{
"key": "FLYTE_INTERNAL_IMAGE",
"value": "<http://030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f|030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f>"
}
],
"config": [],
"ports": [],
"image": "<http://030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f|030465607062.dkr.ecr.us-west-2.amazonaws.com/stripe-flyte/ml-serving/example:d31c33c58f2956fac010277b98aa24552cfc5b6f>",
"resources": {
"requests": [],
"limits": []
}
}
}
Dan Rammer (hamersaw)
03/02/2022, 8:54 PMBrian Tang
03/02/2022, 9:07 PMDan Rammer (hamersaw)
03/02/2022, 9:12 PMkubectl -n flyte get deployments datacatalog
and then check the FlytePropeller configmap for the catalog-cache
configuration.catalog-cache:
endpoint: datacatalog:89
insecure: true
type: datacatalog
if the type
is set to noop
then datacatalog is entirely disabled.Ketan (kumare3)
Brian Tang
03/02/2022, 9:30 PMexecution_id
for output directories, but it seems to come back as a multi-line str eg., print(flytekit.current_context().execution_id)
project: "ml-serving"
domain: "development"
name: "cnqve8n83g"
is there a more standard way of getting maybe just the “name” part of the execution_id?Dan Rammer (hamersaw)
03/02/2022, 10:13 PMKetan (kumare3)
.name
work?Eduardo Apolinario (eapolinario)
03/02/2022, 11:04 PM.name
should work.Brian Tang
03/02/2022, 11:08 PM.name()
as I was looking at the identifier