Hey all, I'm creating a workflow using `flytekit.a...
# ask-the-community
j
Hey all, I'm creating a workflow using
flytekit.approve
. The workflow runs fine on my native local machine, but when I try to register the workflow I get the following error:
Copy code
raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "failed to compile workflow for [resource_type:WORKFLOW project:"myproject" domain:"workflows" name:"myproject.workflows.pytorch_training.model_training_and_approval_workflow" version:"DwDJxBGH4EDglOXa-ZZwOA==" ] with err failed to compile workflow with err Collected Errors: 1
        Error 0: Code: VariableNameNotFound, Node Id: n1, Description: Variable [o0] not found on node [n1].
"
        debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2023-05-09T17:24:07.713904-07:00", grpc_status:13, grpc_message:"failed to compile workflow for [resource_type:WORKFLOW project:\"myproject\" domain:\"workflows\" name:\"myproject.workflows.pytorch_training.model_training_and_approval_workflow\" version:\"DwDJxBGH4EDglOXa-ZZwOA==\" ] with err failed to compile workflow with err Collected Errors: 1\n\tError 0: Code: VariableNameNotFound, Node Id: n1, Description: Variable [o0] not found on node [n1].\n"}"
>
Any idea what's causing this?
k
seems like n1 task doesn’t have output. could you share code snippet?
j
yeah, the workflow looks roughly like:
Copy code
@workflow
def model_training_and_approval_workflow(
   my_args: int
):
    model_scores, mlflow_run = model_training_workflow(my_args = my_args)
    flytekit.approve(upstream_item=model_scores,name="model_score_review",timeout=timedelta(seconds=300),)
    model_staging(mlflow_run=mlflow_run)
k
could you share the @task code also
j
I was able to reproduce it with a paired down workflow:
Copy code
@task
def mytask() -> int:
    return 1

@workflow
def simple_workflow(
):
    model_scores = mytask()
    flytekit.approve(
        upstream_item=model_scores,
        name="model_score_review",
        timeout=timedelta(seconds=300),
    )
y
this works for me
Copy code
$ cat core/approve_sample.py
from datetime import timedelta
from flytekit import task, workflow, approve


@task
def mytask() -> int:
    return 1

@workflow
def simple_workflow(
):
    model_scores = mytask()
    approve(
        upstream_item=model_scores,
        name="model_score_review",
        timeout=timedelta(seconds=300),
    )
what version is your backend on?
j
See image below from the flyte console and we're using version
1.3.0
for the flyte helm release
I bumped up the helm release to 1.5.0 and getting the same error
j
@Yee for clarity, it worked against a remote cluster when you ran it? i'm seeing the same issue running the above simple example against my eks cluster deployment
works locally fine tho
y
yeah no i was testing on our reference implementation
it’s on EKS
can you look at logs on the backed maybe?
admin
j
ya one sec
seems to be the similiar output and the rest of the logs look standard:
Copy code
{
  "json": {
    "src": "workflow_manager.go:107",
    "wf": "taunty.myworkflow"
  },
  "level": "debug",
  "msg": "Failed to compile workflow with id [resource_type:WORKFLOW project:\"taunty\" domain:\"workflows\" name:\"taunty.myworkflow\" version:\"dnoVpjSXY87EJpY-fIcyzg==\" ] with err failed to compile workflow with err Collected Errors: 1\n\tError 0: Code: VariableNameNotFound, Node Id: n1, Description: Variable [o0] not found on node [n1].\n",
  "ts": "2023-05-10T21:29:28Z"
}
y
can you kubectl get the flyteadmin or flyte-binary pod -o yaml | grep image
j
Copy code
<http://cr.flyte.org/flyteorg/flyteadmin:v1.1.72|cr.flyte.org/flyteorg/flyteadmin:v1.1.72>
j
bumping the images to 1.6.0 fixed the issue. Thanks all!
151 Views