Hello, still trying to deploy workflows with CI/CD...
# ask-the-community
Hello, still trying to deploy workflows with CI/CD. I am surprised by how many rough edges there seem to be. Workflow:
Copy code
    name: Register Flyte workflows
    runs-on: ubuntu-latest
      - name: Checkout
        uses: actions/checkout@v2

        # flytekit needs newer version than 3.8 which ships with ubuntu-latest
      - name: Setup Python
        uses: actions/setup-python@v4
          python-version: '3.10'

      - run: pip install flytekit==1.1.*

      - name: Setup flytectl
        uses: unionai-oss/flytectl-setup-action@v0.0.1

      - name: Package workflows
        shell: bash
        run: |
          pyflyte \
          --pkgs flyte.workflows package \
          --image ${{ env.DOCKER_IMAGE }} \
          --output ${{ env.FLYTE_PACKAGE }}

      - name: Register workflows
        uses: unionai-oss/flyte-register-action@v0.0.2
          project: ${{ env.FLYTE_PROJECT }}
          version: ${{ env.VERSION }}
          proto: ${{ env.FLYTE_PACKAGE }}
          domain: ${{ env.FLYTE_DOMAIN }}
          config: ${{ env.FLYTE_CONFIG }}
      # OR
      # - name: Register workflows 
      #   shell: bash
      #   run: |
      #     flytectl register files \
      #     --archive ${{ env.FLYTE_ARCHIVE }} \
      #     --project ${{ env.FLYTE_PROJECT }} \
      #     --domain ${{ env.FLYTE_DOMAIN }} \
      #     --config ${{ env.FLYTE_CONFIG }} \
      #     --version ${{ env.VERSION }}
`Package workflows`reports success, but
Register workflows
using the action fails with
Error: input package have some invalid files. try to run pyflyte package again [flyte-package.tgz]
Register workflows
is even worse. It fails with a bunch of errors like
Failed to unmarshal file /tmp/register789499772/00_flyte.workflows.workflow_name.pb
but it fails silently and still reports succeeding to register resources. A workflows IS indeed registered on the Flyte server, but it is broken and cannot be run. Packaging and registering work if I run locally. Please advice on how to proceed debugging this.
Hi @Sebastian, We use fall back mechanism in flytectl when unmarhsalling workflows,tasks an launchplans like here https://github.com/flyteorg/flytectl/blob/master/cmd/register/register_util.go#L95-L124 If its none of those then its a hard failure. If you want to continiue ignoring error on one of the files in package then you can use --continueOnError flag to skip over that file and investigate further what is wrong with the packaging of it which can be done separately Regarding this
input package have some invalid files. try to run pyflyte package again
Can you check the contents of the package and see if it contains file which are other than these • prefix fast and suffix .tar.gz (tarred source for fast registration) • .pb (serialized proto of flyte tasks, workflows, launchplans) If there are others in that package then it would show the tar as invalid w.r.t flytectl registration
Hi @Prafulla Mahindrakar thanks for responding! How come not everything is a hard failure? It would be nice to at least have the option to fail fast and not propagate erroneous packages. I do not want to enable continueOnError (but that seems to be the default in this case anyway) The package contains only .pb files so I do not see why it fails
continueOnError is not the default. Its hard failure if its unable to unmarshal into any of know flyte types of task,workflow, launchplan etc. The log line you should could be due unamrshal of launchplan into eg a task which would fail . Check the code where it defines the order of unmarshal and then in end fails the registration if none of them match. continueOnError is only applicable in case of package or folder which contains multiple files and you want to skip over failure of some of them. default behavior is to fail fast . Also strange if the package contains only .pb files then it shouldn’t add anything to invalid list here https://github.com/flyteorg/flytectl/blob/6a3744c37460850d5bae4a753c13b4ffc5f4f428/cmd/register/register_util.go#L858 Are there any hidden files .
I am reading through the 'failed to unmarshal' error logs a bit more carefully and I see some odd looking entries with
\"$ref\": \"#/definitions/StructSchema\"
is a json serialized struct I pass as argument to tasks. It could be related to this. Will remove json serielized args and see what happens
I think there are two error you mentioned in this thread • >Error: input package have some invalid files. try to run pyflyte package again [flyte-package.tgz] for which i was suggesting to look more in the invalid files list •
Failed to unmarshal file /tmp/register789499772/00_flyte.workflows.workflow_name.pb
but it fails silently and still reports succeeding to register resources. A workflows IS indeed registered on the Flyte server, but it is broken and cannot be run. For this one is final o/p a success as you mentioned and what do you mean by its broken and cannot be run .
If the registration accepts a task/workflow and succeeds in doing that but it fails during launch of workflow at runtime then it could be the compiler didn’t catch the issue which we can dig deeper from sdk teams help.
Removing structs did not make any difference. By not works I mean the workflow errors with
SYSTEM ERROR! Contact platform administrators.
Flytekit generates this one as FlyteScopedSystemException while execution the python task . I guess this requires a deeper look into your task definition using the sdk . can you share your workflow code .Adding @Samhita Alla @Kevin Su for help with this one
👀 1
No I cannot share it due to company policy. I might be able to put together an MRE though
yes just general outline of you task and workflow definitions should do
I get the same errors just with a toy example like
Copy code
import logging
import pandas as pd
from flytekit import workflow, task

log = logging.getLogger(__name__)

def query_data() -> pd.DataFrame:
    <http://log.info|log.info>("querying data")
    return pd.DataFrame({"a": [1, 3, 4], "b": [4, 5, 6]})

def do_a_thing(df: pd.DataFrame, x: int) -> pd.DataFrame:
    <http://log.info|log.info>("doing the thing")
    return df + x

def wf(x: int):

    df = query_data()
    do_a_thing(df=df, x=x)

if __name__ == "__main__":
And like I mentioned, everything works locally
@Sebastian Thanks, let me test it
Thanks, appreciate it
Cc @Eduardo Apolinario (eapolinario) / @Yee let's capture this as an area of improvement- cc @Samhita Alla
@Sebastian, are you still seeing the error?
@Samhita Alla no it was solved. I don't even know what solved it though but we moved around some stuff in our docker container and now it seems to be working. However, the "failed to unmarshal" messages are still there, but the workflow can be run successfully.
For the unmarshal error, we’d need to dig deeper into your workflow code. I’m assuming not all workflows show the unmarshal error. If so, we could see what’s wrong with one of your workflows. If you could share the details, that’d be helpful (a redacted version should suffice)!