Hi all, I was wondering if anyone experienced any...
# flyte-support
s
Hi all, I was wondering if anyone experienced any problems with using Azure Blob Storage as flyte storage? It works great for "normal" tasks, but when I define a dynamic workflow, then there are some additional python-tasks started before executing the actual workflow tasks, and those containers do not get azure credentials and are failing. Some more details: for "normal" raw-container tasks, there will be a downloader (InitContainer) and a co-pilot (SideCar Container) - both using the same image
flyte-copilot
. Both the downloader and a co-pilot receive automatically command-line arguments
--storage.stow.config account=<AzureStorageAccount>
and
--storage.stow.config key=<AzureAccessKey>
which then allow the contianers to download and upload the data to Azure. However marking a workflow as @dynamic leads to some additional python-tasks (i.e. I do not define them explicitely in my workflow); those use the
flytekit
image and the corresponding executable are of course very different and do not get any credentials, but still try to download some data from Azure. As a result the task fails with sth like
Copy code
[1/1] currentAttempt done. Last Error: USER::
[almhfbtcdk4tzqfwxpqt-n0-0] terminated with exit code (1). Reason [Error]. Message: 
ct to account for Must provide either a connection_string or account_name with credentials!!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/pyflyte-fast-execute", line 8, in <module>
    sys.exit(fast_execute_task_cmd())
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
which is of course to be expected if no credentials are passed to the container. Has anybody seen this problem? Is it possible to provide the necessary credentials to the flytekit container somehow? Alternatively, is it possible to avoid these extra python-tasks to be started at all? (but retaining the dynamic nature of the workflow? Thanks a lot for any advice!
f
So dynamic needs a python task to run that generates the workflow to run
So it is using fsspec
You need to use a service account with creds. Azure folks please help
👍 1
s
Thanks for your quick response! So flytekit is using fsspec, correct? And fsspec should be able to get the credentials, as long as they are provided by the service account, correct? Do you mean the k8s ServiceAccount here?
f
yes ServiceAccount
s
Hi @freezing-airport-6809, I included the environmental variables AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY to the pod and access seems to work now! However, there seems to be a new error:
Copy code
{"asctime": "2024-07-08 12:00:44,343", "name": "flytekit", "levelname": "ERROR", "message": "Exception when executing task workflows.test-training.test_training_wf, reason 'PythonFunctionWorkflow' object has no attribute 'dispatch_execute'"}
{"asctime": "2024-07-08 12:00:44,343", "name": "flytekit", "levelname": "ERROR", "message": "!! Begin Unknown System Error Captured by Flyte !!"}
{"asctime": "2024-07-08 12:00:44,343", "name": "flytekit", "levelname": "ERROR", "message": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py\", line 99, in _dispatch_execute\n    outputs = _scoped_exceptions.system_entry_point(task_def.dispatch_execute)(ctx, idl_input_literals)\n                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^\nAttributeError: 'PythonFunctionWorkflow' object has no attribute 'dispatch_execute'\n"}
{"asctime": "2024-07-08 12:00:44,343", "name": "flytekit", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
Seems more like an internal error.. Perhaps not related to Azure at all (dispatch_execute seems to be very basic functionality of flytekit https://docs.flyte.org/en/latest/api/flytekit/generated/flytekit.PythonInstanceTask.html) Any ideas? Thanks!
f
Woah what? That is indeed weird? Can you help with the code snippet for us to reproduce
s
@freezing-airport-6809 This might be tricky... I could try to build a minimal example (we have a pretty complex setup with custom images in a private container registry etc. etc.), but what about testing this with azure? I guess I'd have to set up some dummy (public) storage account or sth like that. Or am I making things too complicated here? :-D
In the end of the day its a simple definition of a dynamic workflow:
Copy code
@workflow()
@dynamic
def test_training_wf(
    list_of_models_to_train: list[str] = MODELS_TO_TRAIN, run_etl: bool = False):
  ...
  <some stuff here>
The
@dynamic
decorator is needed to e.g. loop over the input parameter
list_of_models_to_train
(not known at compile time). The flyte-backend was configured via helm chart to use azure as storage. That's about it I think. What is in the workflow itself should be irrelevant, the workflow never reaches the downstream tasks...
Would be cool if you could reproduce it!
Not sure this helps but here is a snippet form the pod definitions showing the commands executed in this python task (one of which seems to fail with the above error)
Copy code
spec:
  affinity: {}
  containers:
  - args:
    - pyflyte-fast-execute
    - --additional-distribution
    - <abfs://test/test/development/MEXZ5PXKZIVE6SCWXNKNWSR7HI======/fastaf9a6c8504c87b1b6e0accc207d57fe0.tar.gz>
    - --dest-dir
    - /root
    - --
    - pyflyte-execute
    - --inputs
    - <abfs://test/metadata/propeller/test-development-akn7q7bbpxrst2s6bskn/n0/data/inputs.pb>
    - --output-prefix
    - <abfs://test/metadata/propeller/test-development-akn7q7bbpxrst2s6bskn/n0/data/3>
    - --raw-output-data-prefix
    - <abfs://test/data/dj/akn7q7bbpxrst2s6bskn-n0-3>
    - --checkpoint-path
    - <abfs://test/data/dj/akn7q7bbpxrst2s6bskn-n0-3/_flytecheckpoints>
    - --prev-checkpoint
    - <abfs://test/data/ok/akn7q7bbpxrst2s6bskn-n0-2/_flytecheckpoints>
    - --resolver
    - flytekit.core.python_auto_container.default_task_resolver
    - --
    - task-module
    - workflows.test-training
    - task-name
    - test_training_wf
    image: <http://cr.flyte.org/flyteorg/flytekit:py3.11-1.12.1b4|cr.flyte.org/flyteorg/flytekit:py3.11-1.12.1b4>
Update: it seems that the problem is only there when both @dynamic and @workflow are used. Previously I was using @dynamic only for subworkflows and there seems to be no problem for those, i.e. works also with azure! Thanks a lot! (i havent compared in detail whether the flytekit-tasks has different arguments... can have a look!)
f
You cannot use workflow and dynamic both
Just choose one