https://flyte.org logo
a

Alex Pozimenko

06/16/2022, 9:26 PM
hi flyte team, we recently upgraded the console to the latest and now it doesn't show dynamic subtasks. Is there a switch/config setting for that?
h

Haytham Abuelfutuh

06/16/2022, 9:28 PM
Cc @Jason Porter @Nastya Rusina
a

Alex Pozimenko

06/16/2022, 9:30 PM
Copy code
flyteadmin_version     = "v1.1.21"
  flyteconsole_version   = "v1.1.0"
  flytecopilot_version   = "v0.0.26"
  flytepropeller_version = "v1.1.12"
n

Nastya Rusina

06/16/2022, 9:34 PM
Does it happen for all three views: Node executions/Graph/Timeline? Can you please provide a sample/screenshot of what you see and where the info is missing.
a

Alex Pozimenko

06/16/2022, 9:39 PM
yes, all views
checkerboard_dynamic_tasks
is there one that supposed to have subtasks
j

Jason Porter

06/16/2022, 9:40 PM
Okay thanks @Alex Pryiomka - we'll take a look
a

Alex Pozimenko

06/16/2022, 9:40 PM
the version info i provided is wrong
is there a way to see versions in the console?
j

Jason Porter

06/16/2022, 9:41 PM
Yes. There is a little "i" in the top right corner; clicking that will open version information 👍
a

Alex Pozimenko

06/16/2022, 9:41 PM
UI Version 1.1.0 Admin Version 1.1.21
so it was correct 🙂
j

Jason Porter

06/16/2022, 9:44 PM
Okay great - we're going to track this bug fix here 👍 https://github.com/flyteorg/flyteconsole/issues/512
thx 1
a

Alex Pozimenko

06/16/2022, 10:41 PM
@Jason Porter - in case this is helpful - the console shows subtasks while workflow is running, but they disappear for completed workflows. It looks particularly odd when a subtask fails, so Flyte marks the workflow as failed, yet the attempts are marked as successful and no errors in the top level logs
n

Nastya Rusina

06/17/2022, 6:33 PM
@Alex Pozimenko Can you please check if old executions (prior to update) for same workflow show sub-workflow items?
a

Alex Pozimenko

06/17/2022, 6:39 PM
@Nastya Rusina - yes, old executons look fine.
🙇‍♀️ 1
n

Nastya Rusina

06/17/2022, 6:46 PM
Thanks for confirming. We will dig deeper, but I suspect that something changed in the structure of the returned DAG. cc: @Haytham Abuelfutuh
e

eugene jahn

06/21/2022, 4:43 PM
Hi @Alex Pozimenko I I have tried to reproduce the issue on my site. However, can't really reproduce it. Are you free sometimes to have a short call? just want to make sure the api payload is correct.
a

Alex Pozimenko

06/21/2022, 6:33 PM
hey @eugene jahn, how about 1pm today?
e

Eugene Jahn

06/21/2022, 6:42 PM
how about 2pm PST?
e

eugene jahn

06/21/2022, 6:43 PM
Can we do 2pm PST?
j

Jason Porter

06/21/2022, 6:43 PM
2pm works for me 👍
a

Alex Pozimenko

06/21/2022, 6:59 PM
sgtm. alex.pozimenko@woven-planet.global
e

eugene jahn

06/21/2022, 7:04 PM
sent invitation! see you later
a

Alex Pozimenko

06/21/2022, 8:56 PM
sorry, in a mtg which will likely run a little over, so will join 5 min later
👍 1
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "start-node",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "f67f46d3206de43699b7"
        }
      },
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-f67f46d3206de43699b7/start-node/data/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "created_at": "2022-06-21T20:16:34.290172734Z",
        "updated_at": "2022-06-21T20:16:34.290172734Z"
      },
      "metadata": {
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "f67f46d3206de43699b7"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-f67f46d3206de43699b7/n0/data/inputs.pb>",
      "closure": {
        "error": {
          "code": "RetriesExhausted|USER:Unknown",
          "message": "[2/2] currentAttempt done. Last Error: USER::Traceback (most recent call last):\n\n      File \"/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.8/site-packages/flytekit/exceptions/scopes.py\", line 203, in user_entry_point\n        return wrapped(*args, **kwargs)\n      File \"/root/flyte/flyte/tasks/fs1_training_data.py\", line 88, in get_annotation_info_task\n        call_scene_reconstruction_binary(\n      File \"/root/flyte/flyte/commands/scene_command.py\", line 51, in call_scene_reconstruction_binary\n        subprocess_handler.run(binary=command, args=params, log_stdout=True)\n      File \"/root/cli/cli/subprocess_handler.py\", line 53, in run\n        raise subprocess.CalledProcessError(returncode=exit_code, cmd=command, output=stdout, stderr=stderr)\n\nMessage:\n\n    Command '['annotation', 'query', '--verbose', '--data-source', 'FS1', '--label-source', 'SCALE', '--output-file', '/tmp/flyte/20220621_202627/sandbox/local_flytekit/39e65c520948189a3f2663c3f94842e3/annotations.csv', '--scale-project-name', 'panda_lfd_lidar', '--task-id', '60c72d8333efc20018b6ea0d']' returned non-zero exit status 1.\n\nUser error.",
          "kind": "USER"
        },
        "phase": "FAILED",
        "started_at": "2022-06-21T20:16:34.460611038Z",
        "duration": "804.056366105s",
        "created_at": "2022-06-21T20:16:34.354000459Z",
        "updated_at": "2022-06-21T20:29:58.516977105Z"
      },
      "metadata": {
        "spec_node_id": "n0",
        "is_dynamic": true
      }
    }
  ]
}
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "start-node",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "xcipb7ytwp"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-xcipb7ytwp/start-node/data/inputs.pb>",
      "closure": {
        "phase": "SUCCEEDED",
        "created_at": "2021-09-23T13:27:27.874059940Z",
        "updated_at": "2021-09-23T13:27:27.874059940Z"
      },
      "metadata": {
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "xcipb7ytwp"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-xcipb7ytwp/n0/data/inputs.pb>",
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-xcipb7ytwp/n0/data/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "started_at": "2021-09-23T13:27:28.357826621Z",
        "duration": "11798.029477360s",
        "created_at": "2021-09-23T13:27:28.089924503Z",
        "updated_at": "2021-09-23T16:44:06.387304360Z"
      },
      "metadata": {
        "is_parent_node": true,
        "spec_node_id": "n0"
      }
    },
    {
      "id": {
        "node_id": "n1",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "xcipb7ytwp"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-xcipb7ytwp/n1/data/inputs.pb>",
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-xcipb7ytwp/n1/data/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "started_at": "2021-09-23T16:44:52.527074281Z",
        "duration": "548.360438964s",
        "created_at": "2021-09-23T16:44:52.317653506Z",
        "updated_at": "2021-09-23T16:54:00.887512964Z"
      },
      "metadata": {
        "spec_node_id": "n1"
      }
    },
    {
      "id": {
        "node_id": "end-node",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "xcipb7ytwp"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-xcipb7ytwp/end-node/data/inputs.pb>",
      "closure": {
        "phase": "SUCCEEDED",
        "created_at": "2021-09-23T16:54:01.094210244Z",
        "updated_at": "2021-09-23T16:54:01.446059558Z"
      },
      "metadata": {
        "spec_node_id": "end-node"
      }
    }
  ]
}
example of failed old execution that shows expand option:
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "start-node",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "f860a446514bb4e07be0"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-f860a446514bb4e07be0/start-node/data/inputs.pb>",
      "closure": {
        "phase": "SUCCEEDED",
        "created_at": "2021-09-22T15:58:52.269739635Z",
        "updated_at": "2021-09-22T15:58:52.269739635Z"
      },
      "metadata": {
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "f860a446514bb4e07be0"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-f860a446514bb4e07be0/n0/data/inputs.pb>",
      "closure": {
        "error": {
          "code": "RetriesExhausted|USER:Unknown",
          "message": "[2/2] currentAttempt done. Last Error: USER::Traceback (most recent call last):\n\n      File \"/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.8/site-packages/flytekit/common/exceptions/scopes.py\", line 203, in user_entry_point\n        return wrapped(*args, **kwargs)\n      File \"/root/flyte/flyte/tasks/fs1_training_data.py\", line 324, in generate_l5mldatastore_dataset_task\n        call_scene_reconstruction_binary(\n      File \"/root/flyte/flyte/commands/scene_command.py\", line 51, in call_scene_reconstruction_binary\n        subprocess_handler.run(binary=command, args=params, log_stdout=True)\n      File \"/root/cli/cli/subprocess_handler.py\", line 53, in run\n        raise subprocess.CalledProcessError(returncode=exit_code, cmd=command, output=stdout, stderr=stderr)\n\nMessage:\n\n    Command '['annotation', 'write-training-data-chunk-l5mldatastore', '/tmp/flytezs70k80t/local_flytekit/92f659b1ba7cba47388b85d2c3ca177f/', '--verbose', '--chunk-id', '217', '--dataset-name', 'fs1_stereo_dataset', '--dataset-version', '0.0.3071-main.32268f0_0.0.1', '--end-ts', '1621979365', '--filtered-tracks-pb', '/tmp/flytezs70k80t/local_flytekit/8f281133547f3cb71452f2ef735477d1/filtered_tracks.pb', '--mission-id', '1863992517161235894_9314680418666361737', '--obstacles-frame', 'camera', '--output-file', '/tmp/flytezs70k80t/20210922_171104/local_flytekit/740367bf4e8c77e6ae90c2d4b3f0aa40/l5mldatastore_chunk_metadata_217.json', '--partition-name', 'train', '--start-ts', '1621979337']' returned non-zero exit status 1.\n\nUser error.",
          "kind": "USER"
        },
        "phase": "FAILED",
        "started_at": "2021-09-22T15:58:52.479254644Z",
        "duration": "4790.533657594s",
        "created_at": "2021-09-22T15:58:52.358488635Z",
        "updated_at": "2021-09-22T17:18:43.012912594Z"
      },
      "metadata": {
        "is_parent_node": true,
        "spec_node_id": "n0"
      }
    }
  ]
}
(^^^these are private links only available to wp employees)
h

Haytham Abuelfutuh

06/21/2022, 9:56 PM
Hey @Alex Pozimenko, sorry you are facing problems with the upgrade. If you don’t mind double confirming this, both the older and more recent executions ran the same version of the worklfow?
Mind also checking what version of FlytePropeller are you running?
@katrina Do you think this’s related to this change? https://github.com/flyteorg/flyteadmin/pull/382 I see is_parent_node is no longer set to true… but it should be, right?
k

katrina

06/21/2022, 10:39 PM
we should only ever set is_parent to true but never go from true to false (with the case you were seeing that running workflows showing the subtasks but completed ones not)
hey @Alex Pozimenko if you don't mind, would it be possible to get the same node execution json you shared above when it is running and the console is showing subtasks?
e

eugene jahn

06/21/2022, 10:51 PM
@katrina this is the example that executed success before the update https://jsonblob.com/988938897279172608
🙏 1
k

katrina

06/21/2022, 11:02 PM
huh so
is_parent_node
is indeed set
how long do the subtasks run for? are they particularly short-lived?
a

Alex Pozimenko

06/21/2022, 11:22 PM
if you don’t mind double confirming this, both the older and more recent executions ran the same version of the worklfow?
@Haytham Abuelfutuh the versions are different (they're 2 months apart)
@Haytham Abuelfutuh flytepropeller-v1.1.12
h

Haytham Abuelfutuh

06/21/2022, 11:42 PM
thank you
a

Alex Pozimenko

06/21/2022, 11:42 PM
i have the workflow running now, is this the json response you need?
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "start-node",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "a8djqk9pzmdgfdjf75lx"
        }
      },
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-a8djqk9pzmdgfdjf75lx/start-node/data/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "created_at": "2022-06-21T23:38:52.525455365Z",
        "updated_at": "2022-06-21T23:38:52.525455365Z"
      },
      "metadata": {
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "a8djqk9pzmdgfdjf75lx"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-a8djqk9pzmdgfdjf75lx/n0/data/inputs.pb>",
      "closure": {
        "phase": "RUNNING",
        "started_at": "2022-06-21T23:38:52.669382566Z",
        "created_at": "2022-06-21T23:38:52.616230119Z",
        "updated_at": "2022-06-21T23:38:52.669382566Z"
      },
      "metadata": {
        "is_parent_node": true,
        "spec_node_id": "n0"
      }
    }
  ]
}
or this (same execution as above):
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "n0-0-start-node",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "a8djqk9pzmdgfdjf75lx"
        }
      },
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-a8djqk9pzmdgfdjf75lx/n0/data/0/start-node/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "created_at": "2022-06-21T23:41:52.895604301Z",
        "updated_at": "2022-06-21T23:41:52.895604301Z"
      },
      "metadata": {
        "retry_group": "0",
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0-0-dn0",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "a8djqk9pzmdgfdjf75lx"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-a8djqk9pzmdgfdjf75lx/n0/data/0/dn0/inputs.pb>",
      "closure": {
        "phase": "RUNNING",
        "started_at": "2022-06-21T23:41:55.385688619Z",
        "created_at": "2022-06-21T23:41:53.138938047Z",
        "updated_at": "2022-06-21T23:41:55.385688619Z",
        "workflow_node_metadata": {
          "executionId": {
            "project": "avfleetscenes",
            "domain": "dev",
            "name": "foyc3j4i"
          }
        }
      },
      "metadata": {
        "retry_group": "0",
        "spec_node_id": "dn0"
      }
    },
    {
      "id": {
        "node_id": "n0-0-dn1",
        "execution_id": {
          "project": "avfleetscenes",
          "domain": "dev",
          "name": "a8djqk9pzmdgfdjf75lx"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avfleetscenes-dev-a8djqk9pzmdgfdjf75lx/n0/data/0/dn1/inputs.pb>",
      "closure": {
        "phase": "RUNNING",
        "started_at": "2022-06-21T23:41:55.677555860Z",
        "created_at": "2022-06-21T23:41:53.193659416Z",
        "updated_at": "2022-06-21T23:41:55.677555860Z",
        "workflow_node_metadata": {
          "executionId": {
            "project": "avfleetscenes",
            "domain": "dev",
            "name": "f4gdb2lq"
          }
        }
      },
      "metadata": {
        "retry_group": "0",
        "spec_node_id": "dn1"
      }
    }
  ]
}
in case this helps, the console continues to show expand option after completion if execution was open while it was still running. But if I refresh the page, the expand option disappear
k

katrina

06/22/2022, 12:30 AM
this definitely sounds like a back-end issue overwriting the is parent node bit. thanks @Alex Pozimenko for all the helpful reporting, i will take a look at fixing this
👍 1
hey @Alex Pozimenko just to double check, what flyteadmin version were you on before you upgraded?
a

Alex Pozimenko

06/22/2022, 4:55 PM
v0.6.112
1
k

katrina

06/22/2022, 6:15 PM
hey @Jason Porter for my understanding, what indicates to the UI that it should show subtasks?
@Alex Pozimenko could you share the workflow definition (with anything sensitive scrubbed out?)
j

Jason Porter

06/22/2022, 6:22 PM
That's kinda a complex question 😅 but yes, *generally speaking (in terms of checking existence and getting phase) we key off of
is_parent_node
k

katrina

06/22/2022, 6:25 PM
thanks Jason, @Eugene Jahn confirmed for me in DM 😄
also @Alex Pozimenko sorry one more q, just to double check the flytepropeller deployment is still using v1.1.12?
✔️ 1
a

Alex Pozimenko

06/22/2022, 7:38 PM
@katrina - sanitized wf definition. Hopefully i didn't remove anything material:
Copy code
from flytekit import task, dynamic, workflow, Resources
from flytekit.core.node_creation import create_node

from typing import List, Tuple, NamedTuple

SceneLevelProcessingResults = NamedTuple("OP2",
                                         num_scenes_published=int)


@task(requests=Resources(mem='4G'), retries=6)
def simulation_metadata_collect(run_id: str) -> pd.DataFrame:
    """Collect all task metadata for downstream workers"""
    # experiment_task_metadata_df = ...
    # return experiment_task_metadata_df


@dynamic
def checkerboard_dynamic_tasks(run_id: str,
                               number_shards: int) -> Tuple[List[int], List[int]]:
    num_issues_from_shards = []
    num_scenes_from_shards = []

    for i in range(number_shards):
        scene_level_processing_results = checkerboard_scene_level_processing(
            run_id=run_id,
            shard=i,
            number_shards=number_shards)
        num_issues_from_shards.append(scene_level_processing_results.num_scenes_published)
        num_scenes_from_shards.append(scene_level_processing_results.num_scenes_published)

    return num_issues_from_shards, num_scenes_from_shards


@task(requests=Resources(mem='10G'), retries=5)
def checkerboard_scene_level_processing(run_id: str,
                                        shard: int = 0,
                                        number_shards: int = 1,
                                        ) -> SceneLevelProcessingResults:

    scene_pipeline = InitPipeline(run_id=run_id,
                                    number_shards=number_shards,
                                    shard=shard)
    scenes: List[CheckerboardScene] = scene_pipeline.initialize_scenes()
    # ... 

    return SceneLevelProcessingResults(num_scenes_published=len(scenes))


@workflow
def CheckerboardParallelBackendLaunch(run_id: str, number_shards: int = 1):
    metadata_collect_task = create_node(simulation_metadata_collect, run_id=run_id)

    # WF to calculate issues, dynamically scaled by amount of metrics to process
    dynamic_tasks = create_node(checkerboard_dynamic_tasks,
                                run_id=run_id,
                                number_shards=number_shards)

    metadata_collect_task >> dynamic_tasks
k

katrina

06/22/2022, 8:09 PM
thanks @Alex Pozimenko and just to double check what does InitPipeline do?
a

Alex Pozimenko

06/22/2022, 8:11 PM
it simply initializes scene_pipeline object that is used to get a list of scenes
k

katrina

06/22/2022, 8:25 PM
sigh still can't repro, even after the dynamic trask succeeds i see
Copy code
"id": {
"node_id": "n1",
"execution_id": {
"project": "flytesnacks",
"domain": "development",
"name": "f629e8fae1aa143e1a14"
}
},
"input_uri": "<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f629e8fae1aa143e1a14/n1/data/inputs.pb>",
"closure": {
"output_uri": "<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f629e8fae1aa143e1a14/n1/data/0/outputs.pb>",
"phase": "SUCCEEDED",
"started_at": "2022-06-22T20:21:44.605794900Z",
"duration": "120.055171600s",
"created_at": "2022-06-22T20:21:44.517225200Z",
"updated_at": "2022-06-22T20:23:44.660965600Z"
},
"metadata": {
"is_parent_node": true,
"spec_node_id": "n1",
"is_dynamic": true
}
},
as expected
a

Alex Pozimenko

06/22/2022, 8:26 PM
did you refresh the console after workflow completed?
k

katrina

06/22/2022, 8:27 PM
yup and i can expand the dynamic task
a

Alex Pozimenko

06/22/2022, 8:28 PM
lmk if you want to debug on our end
also, the original workflow had more tasks, (2 before and 2 after). I removed them as I didn't think they matter as we have another workflow with a single dynamic task that has the same problem, but it's possible the other one is constructed differently
other tasks are plain @task's
k

katrina

06/22/2022, 8:44 PM
yeah that should be fine, the reporting should be on a per node basis so i don't think that materially changes things
@Alex Pozimenko if you run the modified workflow you shared with me, does that also fail to expand subtasks for you?
a

Alex Pozimenko

06/22/2022, 9:21 PM
i haven't tried it
@katrina, here's a bare-bones workflow that can be used to repro the issue. I ran it and confirmed that subtasks don't show after completion
Copy code
import flytekit

@flytekit.dynamic(
    requests=flytekit.Resources(mem='256Mi', cpu='1'),
)
def simple_batch_task(iterations: int, input_string: str) -> None:
    for i in range(iterations):
        identity_sub_task(input_string=input_string)


@flytekit.task(
    requests=flytekit.Resources(mem='256Mi', cpu='1')
)
def identity_sub_task(input_string: str) -> str:
    return input_string


@flytekit.workflow
def HelloWorldDynamicTaskWorkflow(input_string: str = 'Hello World',
                                    iterations: int = 5):
    simple_batch_task(iterations=iterations, input_string=input_string)
k

katrina

06/23/2022, 10:32 PM
this is really weird, using flytesandbox I can run this locally (refreshed after success) and the sub tasks drop downs still appear for me
it looks like you're on all the latest components so i wonder if this is a regression but nothing seems suspicious in recent changes
@Jason Porter would console v1.1.0 vs v1.1.1 make any difference here?
a

Alex Pozimenko

06/23/2022, 10:45 PM
maybe some client side issue? I'm using Chrome Version 96.0.4664.55 (Official Build) (x86_64)
j

Jason Porter

06/23/2022, 11:10 PM
Hmm nothing obvious however technically all of the changes between those two version could potentially effect that view 😅. Let me have FE team take another look into this
k

katrina

06/23/2022, 11:37 PM
hey @Alex Pozimenko argh so I upgraded my sandbox components to match the exact versions you're using and i still can't repro 🤯
a

Alex Pozimenko

06/23/2022, 11:43 PM
odd... is there a flag in API response that enables the expand?
k

katrina

06/23/2022, 11:50 PM
no it should be coming from that is_parent_node bit in the backend
i believe 😅
a

Alex Pozimenko

06/23/2022, 11:52 PM
this is what I get:
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "start-node",
        "execution_id": {
          "project": "avexampleworkflows",
          "domain": "dev",
          "name": "alkhkhqw6qlxgrqq2lv9"
        }
      },
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avexampleworkflows-dev-alkhkhqw6qlxgrqq2lv9/start-node/data/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "created_at": "2022-06-23T20:37:45.792945109Z",
        "updated_at": "2022-06-23T20:37:45.792945109Z"
      },
      "metadata": {
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0",
        "execution_id": {
          "project": "avexampleworkflows",
          "domain": "dev",
          "name": "alkhkhqw6qlxgrqq2lv9"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avexampleworkflows-dev-alkhkhqw6qlxgrqq2lv9/n0/data/inputs.pb>",
      "closure": {
        "phase": "SUCCEEDED",
        "started_at": "2022-06-23T20:37:45.909076702Z",
        "duration": "310.470520303s",
        "created_at": "2022-06-23T20:37:45.849547550Z",
        "updated_at": "2022-06-23T20:42:56.379596303Z"
      },
      "metadata": {
        "spec_node_id": "n0",
        "is_dynamic": true
      }
    },
    {
      "id": {
        "node_id": "end-node",
        "execution_id": {
          "project": "avexampleworkflows",
          "domain": "dev",
          "name": "alkhkhqw6qlxgrqq2lv9"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avexampleworkflows-dev-alkhkhqw6qlxgrqq2lv9/end-node/data/inputs.pb>",
      "closure": {
        "phase": "SUCCEEDED",
        "created_at": "2022-06-23T20:42:56.461800234Z",
        "updated_at": "2022-06-23T20:42:56.522386713Z"
      },
      "metadata": {
        "spec_node_id": "end-node"
      }
    }
  ]
}
k

katrina

06/23/2022, 11:53 PM
interesting, is_dynamic is true, but not is_parent
this isn't cached right?
a

Alex Pozimenko

06/23/2022, 11:54 PM
i don't think so
it has execution id in the url
so sounds like a backend issue (the response doesn't have is_parent)
and this is what i get while it's running:
Copy code
{
  "node_executions": [
    {
      "id": {
        "node_id": "start-node",
        "execution_id": {
          "project": "avexampleworkflows",
          "domain": "dev",
          "name": "a2n7dnlw76stjlpzqhq6"
        }
      },
      "closure": {
        "output_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avexampleworkflows-dev-a2n7dnlw76stjlpzqhq6/start-node/data/0/outputs.pb>",
        "phase": "SUCCEEDED",
        "created_at": "2022-06-23T23:56:16.891813217Z",
        "updated_at": "2022-06-23T23:56:16.891813217Z"
      },
      "metadata": {
        "spec_node_id": "start-node"
      }
    },
    {
      "id": {
        "node_id": "n0",
        "execution_id": {
          "project": "avexampleworkflows",
          "domain": "dev",
          "name": "a2n7dnlw76stjlpzqhq6"
        }
      },
      "input_uri": "<s3://lyft-av-prod-pdx-flyte/metadata/propeller/production/avexampleworkflows-dev-a2n7dnlw76stjlpzqhq6/n0/data/inputs.pb>",
      "closure": {
        "phase": "RUNNING",
        "started_at": "2022-06-23T23:56:17.014704942Z",
        "created_at": "2022-06-23T23:56:16.951710847Z",
        "updated_at": "2022-06-23T23:56:17.014704942Z"
      },
      "metadata": {
        "is_parent_node": true,
        "spec_node_id": "n0"
      }
    }
  ]
}
👀 1
on completion is_parent_node is replaced with is_dynamic
k

katrina

06/24/2022, 5:08 PM
https://flyte-org.slack.com/archives/CNMKCU6FR/p1656028456973969?thread_ts=1655414807.481039&amp;cid=CNMKCU6FR this isn't an indication of being cached, are there any icons like ( ) that appear in the console ?
a

Alex Pozimenko

06/24/2022, 5:55 PM
no icons on the console. I also ran new versions of the wf several times, with same consistent results
hi @katrina and @Haytham Abuelfutuh, happy Monday. Any thoughts on how we proceed from here?
h

Haytham Abuelfutuh

06/27/2022, 5:47 PM
Hey Alex, let me sync up with Katrina and follow up
👍 1
Hey @Alex Pozimenko Do you mind if we give you docker images for propeller and admin with additional logging to look into what’s going on?
k

katrina

06/27/2022, 6:31 PM
for flyteadmin, can you update your deployment to use this image: ghcr.io/flyteorg/flyteadmin:v1.1.26-node-exec-logging
h

Haytham Abuelfutuh

06/27/2022, 6:42 PM
And this
<http://ghcr.io/flyteorg/flytepropeller:v1.1.15-patch1|ghcr.io/flyteorg/flytepropeller:v1.1.15-patch1>
for flytepropeller
a

Alex Pozimenko

06/27/2022, 6:49 PM
thanks, will look into this. I actually haven't tried to repro this issue in our dev/scratch environment, so perhaps that's what I should try next 🙂
j

Jason Porter

06/28/2022, 10:36 PM
@eugene jahn
a

Alex Pozimenko

06/29/2022, 5:52 PM
hey folks, sorry about the delay. I was able to repro the same issue in our dev environment. Next will switch the images as requested above. Is there anything specific I should be looking for?
h

Haytham Abuelfutuh

06/29/2022, 6:55 PM
If you can capture the logs from propeller and admin, that would be great.. happy to jump on a call to observe if you want
a

Alex Pozimenko

06/29/2022, 8:32 PM
noticed this err in propeller log:
Copy code
{"json":{"exec_id":"a2xflpcg2x5zfpkfnlrk","ns":"dev","routine":"worker-8"},"level":"error","msg":"Failed to update workflow. Error [Operation cannot be fulfilled on <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> \"a2xflpcg2x5zfpkfnlrk\": the object has been modified; please apply your changes to the latest version and try again]","ts":"2022-06-29T20:29:00Z"}
E0629 20:29:00.729217       1 workers.go:102] error syncing 'dev/a2xflpcg2x5zfpkfnlrk': Operation cannot be fulfilled on <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> "a2xflpcg2x5zfpkfnlrk": the object has been modified; please apply your changes to the latest version and try again
i see the same err in other environments too
h

Haytham Abuelfutuh

06/29/2022, 8:44 PM
cc @Yee
@Alex Pozimenko that isn’t a problem per se. We’re fixing it though but I think that’s independent… digging into the logs
a

Alex Pozimenko

06/29/2022, 8:45 PM
sg
h

Haytham Abuelfutuh

06/29/2022, 8:46 PM
I think propeller’s logs are either cut off or log level is set to warnings only
👀 1
a

Alex Pozimenko

06/29/2022, 8:48 PM
i don't see explicit log level in the pod spec.
so it's whatever the default for the container
h

Haytham Abuelfutuh

06/29/2022, 8:49 PM
if you add, to the config map, this:
Copy code
logger:
  level: 6
  show-source: true
You might see you already have “logger” there…
a

Alex Pozimenko

06/29/2022, 8:50 PM
got it, just need to find where it is defined
there's no env var override?
h

Haytham Abuelfutuh

06/29/2022, 8:51 PM
You can also set $LOGGER_LEVEL=6 I believe (haven’t played with that in a while though)
a

Alex Pozimenko

06/29/2022, 8:52 PM
no logger int he configmap
ok, so the updated config map should look like this?
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: flyte-propeller-config
  namespace: "prod"
data:
  propeller: |-
    propeller:
      logger:
        level: 6
        show-source: true
      kube-client-config:
        qps: 100
        burst: 50
        timeout: 30s
      rawoutput-prefix: "s3://${flyte_bucket_name}"
....
@Haytham Abuelfutuh
❤️ 1
h

Haytham Abuelfutuh

06/29/2022, 9:48 PM
oh
that looks the same 😞
a

Alex Pozimenko

06/29/2022, 9:49 PM
yeah, i was going to say that too... does the configmap look right?
let me try the env var
h

Haytham Abuelfutuh

06/29/2022, 9:55 PM
oh no
it shouldn’t be under
propeller:
Copy code
apiVersion: v1
kind: ConfigMap
metadata:
  name: flyte-propeller-config
  namespace: "prod"
data:
  propeller: |-
    logger:
      level: 6
      show-source: true
    propeller:
      kube-client-config:
        qps: 100
        burst: 50
        timeout: 30s
      rawoutput-prefix: "s3://${flyte_bucket_name}"
....
👍 1
a

Alex Pozimenko

06/29/2022, 10:01 PM
this looks right
h

Haytham Abuelfutuh

06/30/2022, 5:38 PM
Thanks @Alex Pozimenko I think that clarified things quite a bit. I believe I know what’s causing what you are seeing, I’m tracking down why it lands in that state though
🙏 1
Hey @Alex Pozimenko, We think we have a fix. Can you try this workaround for now? We want to set this config change. Can you modify flyte admin’s config to match this line? https://github.com/flyteorg/flyte/blob/d60da1662f3dfc616b1fdb72a323887399da1cb0/charts/flyte-core/values.yaml#L489
Copy code
flyteadmin:
  eventVersion: 2
k

katrina

07/01/2022, 7:11 PM
alternatively you can use ghcr.io/flyteorg/flyteadmin:v1.1.26-node-exec-event-version which hard codes the event version
a

Alex Pozimenko

07/01/2022, 8:06 PM
on it
trying the config change now
the workaround worked
shall i apply the same in prod?
k

katrina

07/01/2022, 8:25 PM
awesome, yes please do!
a

Alex Pozimenko

07/01/2022, 8:26 PM
are there any side effects or anything we should keep an eye on after making the change?
k

katrina

07/01/2022, 8:32 PM
not really. the change only affects caching the dynamic workflow closure which is produced dynamically at run time and is solely a performance optimization. the version bump has been out for our and other users' deployments for a long time and won't affect any in-progress workflows
a

Alex Pozimenko

07/01/2022, 8:32 PM
ok, thanks
3 Views