Alex Bain

    Alex Bain

    4 months ago
    @katrina @Prafulla Mahindrakar @Haytham Abuelfutuh I updated to Flyte
    1.0.1
    and now workflows executed with
    flyte-cli execute-launch-plan
    (with flytekit 0.26.0) lose their k8s service account that was declared with
    flyte-cli register-files --kubernetes-service-account
    . When I execute the same workflow from Flyte Console the k8s service account is fine.
    However, our blocker is that subworkflows are also losing the k8s service account, i.e. the parent workflow uses the declared account but the subworkflow execution goes back to
    default
    -- this one is the actual blocker for us.
    Now I need to revert and go back to the previous version of Flyte, so I'll lose the fix for https://github.com/flyteorg/flyte/issues/2424
    k

    katrina

    4 months ago
    hey @Alex Bain the latest flyte should actually include a fix explicitly to handle execution spec attributes formerly being dropped, so this is really unexpected! can you share the full command you use to execute workflows?
    Alex Bain

    Alex Bain

    4 months ago
    Let me double check the commands
    Haytham Abuelfutuh

    Haytham Abuelfutuh

    4 months ago
    Is it ok with you to update flytekit version?
    Alex Bain

    Alex Bain

    4 months ago
    flyte-cli -p avexampleworkflows -d dev -h avflyteadmin.pdx.l5.woven-planet.tech execute-launch-plan -u lp:avexampleworkflows:dev:app.workflows.fabrik.spark_test_workflow.spark_workflow:542c917bfabfd61be18d9c4c71a51d552d4f8428 -r abain
    loses the declared k8s service account for me (while running from the UI is fine). This is on flytekit 0.26.0.
    ^^^^ I could upgrade to flytectl but this won't fix the subworkflow executions (which also lose the k8s service account even when the parent workflow is executed from the UI).
    Haytham Abuelfutuh

    Haytham Abuelfutuh

    4 months ago
    OH that's a much bigger issue... so you are on flyte 1.0.1 for all server-side components? propeller, admin... etc.?
    so sorry you are having these issues with 1.0.0 upgrade 😞
    Alex Bain

    Alex Bain

    4 months ago
    Yes, I'm generally on 1.0.1 on the backend and flytekit 0.26.0 (since we have some old-style SDK tasks). Would updating to the latest flytekit solve the subworkflow issue?
    k

    katrina

    4 months ago
    hey @Alex Bain I suspect the flyte-cli issue might be due to an older version of flytekit with a bug that's since been fixed, but to verify could i ask you to use flytectl to fetch the execution spec?
    Haytham Abuelfutuh

    Haytham Abuelfutuh

    4 months ago
    Would updating to the latest flytekit solve the subworkflow issue?
    I highly doubt that... looking into this
    k

    katrina

    4 months ago
    something like
    flytectl get execution -p flytesnacks -d development <name>
    Alex Bain

    Alex Bain

    4 months ago
    flytectl get execution --admin.endpoint avflyteadmin.pdx.l5.woven-planet.tech:443 -p avexampleworkflows -d dev f08d22ec484784cf4a06 -o json
    ^^^^^ That snippet is when I started the execution with
    flyte-cli execute-launch-plan
    I can get users to replace
    flyte-cli execute-launch-plan
    with
    flytectl create execution
    , but my real blocker is subworkflows losing the k8s service account
    k

    katrina

    4 months ago
    gotcha. the service account issue should be fixed in newer versions of flytekit - the one you're currently on is inserting "default" as the actual default k8s service account but this has since been fixed
    Alex Bain

    Alex Bain

    4 months ago
    Would upgrading flytekit also fix the issue with subworkflows? Even when I launch the parent workflow from the UI (which shows the correct k8s service account), the subworkflow execution goes back to "default"
    k

    katrina

    4 months ago
    yeah this seems like a back-end issue that isn't scoped to just flytekit 😞 I think @Haytham Abuelfutuh was looking into this
    Alex Bain

    Alex Bain

    4 months ago
    Ok, thanks for letting me know. I need to revert from Flyte 1.0.1 backend components to 1.0.0 admin / datacatalog / propeller for now since we have a few big subworkflow users.
    k

    katrina

    4 months ago
    hey @Alex Bain to debug the broader issue with subworkflows (so sorry about this!), could i ask you to flytectl get an execution you launched in the UI?
    Alex Bain

    Alex Bain

    4 months ago
    @katrina here is an execution (that has a subworkflow) that I launched from console:
    flytectl get execution --admin.endpoint avflyteadmin.pdx.l5.woven-planet.tech:443 -p avfleetscenes -d dev *adb2nvlmk42w44g2f98g* -o json
    k

    katrina

    4 months ago
    thank you! also if you do end up reverting your flyte deployment could I ask you to flytectl get a working parent workflow execution? just want to verify if the annotations are being updated in the older deployment
    Alex Bain

    Alex Bain

    4 months ago
    Note ^^^^ corrected the execution, now it is the
    flyte.workflows.fs1_scene_generation.FS1SceneReconstructionNoIndexSparkWorkflow
    parent workflow (whose subworkflow fails)
    Yes, once we finished reverting I'll re-register the workflow and describe it again for you.
    FYI the failing subworkflow
    flytectl get execution --admin.endpoint avflyteadmin.pdx.l5.woven-planet.tech:443 -p avfleetscenes -d dev fsar4pvq -o json
    @katrina after reverting to Flyte 1.0.0 my parent workflow is now
    k

    katrina

    4 months ago
    hey @Alex Bain thanks so much! i think i found the issue with https://github.com/flyteorg/flyteadmin/pull/422
    Alex Bain

    Alex Bain

    4 months ago
    Here is its child workflow that is using the declared k8s service account again
    k

    katrina

    4 months ago
    hey @Alex Bain thanks again for your patience! the latest flyteadmin release has the fix for the subworkflow service accounts (confirmed this on my end with our flyte deployment) out of curiosity what flyte version did you revert back to? we're trying to understand just how broken this has been 😞
    Alex Bain

    Alex Bain

    4 months ago
    @katrina I reverted back to admin, propeller and datacatalog 1.0.0 and it is working fine for me on those versions