Hi -- I tried my first backfill using `pyflyte bac...
# ask-the-community
d
Hi -- I tried my first backfill using
pyflyte backfill
and I messed something up with the parallelism. Now the workflow is in the state
FAILING
and I don't see a way to cancel this or clean it up. If I try to run another backfill with (hopefully) fixed parallelism settings, I get an error:
Copy code
RPC Failed, with Status: StatusCode.INVALID_ARGUMENT
	details: workflow with different structure already exists
	Debug string UNKNOWN:Error received from peer  {grpc_message:"workflow with different structure already exists", grpc_status:3, created_time:"2023-09-21T16:42:09.869845-07:00"}
Is there a way to clear/cancel the backfill via command line or the Flyte UI?
For a little more detail about what went wrong, I first ran a test backfill for a period of 3 days. It ran serially and was successful. Then I changed my launch plan to include
max_parallelism=2
and passed
--parallel
to
pyflyte backfill
. Instead of launching two executions at a time, it started 25. Things started failing at the stage of writing to the database (something I need to fix) and the backfill workflow has been in the state
FAILING
for the last hour. The command I am trying to run now is
Copy code
pyflyte --config path/to/config.yml backfill \
    --project my-project-name \
    --domain prod \
    --from-date 2023-01-25 \
    --to-date 2023-03-25 \
    --serial \
    launch_plan_name
s
k
Hmm do you need a max parallel control
d
Hi -- Sorry for the long delay in getting back to you on this.
You should be able to cancel the workflow, right?
I don't see a way to cancel the previous backfill workflow. When I browse to this workflow in the Flyte UI, the only action I can take is "Launch Workflow". I tried using
flytectl get execution
to list the existing execution so I could then try
flytectl delete execution
but I'm not familiar enough with
flytectl
to make any progress. I'm not even sure it's a viable route.
Hmm do you need a max parallel control
I assumed the using
--parallel
would use the
max_parallelism
from the launch plan being referenced. But it doesn't seem to do that. So I think either 1) use the existing
max_parallelism
from the launch plan, or 2) support something like
--parallelism=2
.
I added a comment to this RFC about backfills in the web ui: https://github.com/flyteorg/flyte/discussions/3333
k
please suggest how would you ilke it in the ui
d
@Ketan (kumare3) I added a few ideas for the UI in that Github discussion. But I still don't see a way to cancel a workflow. When a backfill is run, it creates a workflow named "backfill-{launch_plan_name}". When that workflow fails, I only see the options "Recover" and "Relaunch". But I don't see anything to clear the state so I can run another backfill using the same launch plan.
k
Hmm let me try this. Cc @Kevin Su can you try this
d
Any luck on this? I tried passing
--execution-name
thinking that might allow new backfill with a different name, but I get the same error.
Copy code
pyflyte --config marketplace/pa/dags/bpo_ltv_model/flyte/config.yml backfill \
    --project marketplace--pa--dags--bpo-ltv-model \
    --domain dprod \
    --from-date 2022-10-01 \
    --to-date 2023-01-01 \
    --serial \
    --execution-name backfill-bpo_ltv_180d_horizon_1d_observation_20221001_20230101 \
    bpo_ltv_180d_horizon_1d_observation
...
RPC Failed, with Status: StatusCode.INVALID_ARGUMENT
	details: workflow with different structure already exists
	Debug string UNKNOWN:Error received from peer  {grpc_message:"workflow with different structure already exists", grpc_status:3, created_time:"2023-11-16T10:33:05.190869-08:00"}
Let me know if this is better handled as a github issue. I'm happy to create one if that is best. I'm just assuming I'm going about something wrong.
s
have you tried specifying the version?
d
I have not. I'm not sure what value I might use here. From the docs:
Version for the registered workflow. If not specified it is auto-derived using the start and end date
If I look at the versions available for one of my workflows, they are the available tags for the docker image with the default being the latest version. Is this the same meaning for "version"?
s
i believe it's the version of the backfill execution. could you try changing the version and check if a new backfill gets triggered?
d
Will do.
Sorry again for the delay. Including a version value in the command worked. We just used the latest version string from the UI.