Seeing a somewhat weird UI/caching bug. I have a w...
# flytekit
g
Seeing a somewhat weird UI/caching bug. I have a workflow that launches a dynamic with cache enabled on the tasks that the dynamic runs. Cache works properly when launching fresh from console, but when I click the "Relaunch" button, cache is hit for the first few tasks (outside of the dynamic), but not for any of the tasks in the dynamic 🤔 On closer inspection, the inputs are identical for the dynamic in both executions (where cache worked, and cache did not work). I then looked at the inputs to the workflow in both executions, and it looks like when I click "Relaunch" and copy the inputs, the order of the keys are different, but their contents are identical to the original run. Would that somehow mess up the cache?
y
what type is the input
g
These are the inputs
Copy code
a: str,
    b: str,
    c: list[str] = [],
    d: list[str] = [],
    e: Optional[str] = None,
    f: Optional[str] = None,
    g: bool = False,
y
not sure if it matters but this is one dynamic task with seven different inputs?
can you copy paste the task signature?
could you also go to the inputs tab of both runs and copy the literal here (redact as needed of course)
g
Sorry this was the input to the workflow. The inputs to the dynamic are:
Copy code
datasets_with_paths: DatasetsWithPaths,
    reference_file: ReferenceFile,
    region_file: FlyteFile,
    gcs_output_dir: str,
    keep_consensus_bams: bool,
Inputs to dynamic that successfully used cache:
Copy code
{
  "keep_consensus_bams": true,
  "reference_file": {
    "type": "single blob",
    "uri": "REDACTED"
  },
  "region_file": {
    "type": "single blob",
    "uri": "REDACTED"
  },
  "datasets_with_paths": {
    "type": "single (yaml) blob",
    "uri": "REDACTED"
  },
  "gcs_output_dir": "REDACTED"
}
Inputs to dynamic that did not use cache (when using Relaunch)
Copy code
{
  "gcs_output_dir": "REDACTED",
  "keep_consensus_bams": true,
  "reference_file": {
    "type": "single blob",
    "uri": "REDACTED"
  },
  "region_file": {
    "type": "single blob",
    "uri": "REDACTED"
  },
  "datasets_with_paths": {
    "type": "single (yaml) blob",
    "uri": "REDACTED"
  }
}
All values were the same, just order of the keys are different
I think I got lucky when I hit cache..
if I try again, I don't hit it - seems like I have to get lucky and have the keys match 😭
or it's a UI bug
but given there are logs, it looks like it's running everything
@Yee any other ideas here?
y
and all the paths are exactly the same?
what types do
DatasetsWithPaths
and
ReferenceFile
resolve to on the flyte side?
g
all paths are identical
they resolve to blob
y
also maybe worth it to check data catalog logs (if you’re running them separate)
this has been mostly resolved, or at least understood. the confusion stemmed from the fact that console does not show the version of tasks that is actually run, instead pulling them from admin.
debugged by downloading the futures file and inspecting the tasks included in the dynamic job spec.
the version shown in the console for child tasks had one discovery version but they actually ran with another.
command used to inspect the
futures.pb
file was
Copy code
flyte-cli parse-proto -f futures.pb -p flyteidl.core.dynamic_job_pb2.DynamicJobSpec
@Jason Porter can you confirm the source for the Task tab for children of dynamic tasks?
150 Views