Hi folks I am attempting to set up azure blob thr...
# ask-the-community
m
Hi folks I am attempting to set up azure blob through stow. My values for storage settings looks like so:
Copy code
storage:
  type: custom
  enableMultiContainer: true
  limits:
    maxDownloadMBs: 500000
    type: custom
  bucketName: "{{ .Values.userSettings.azure.containerName }}"
  custom:
    container: "{{ .Values.userSettings.azure.containerName }}"
    enable-multicontainer: true
    connection: {}
    type: stow
    stow:
      kind: azure
      config:
        account: "{{ .Values.userSettings.azure.storageAccountName }}"
        key: "{{ .Values.userSettings.azure.storageAccountKey }}"
Trying to run the init workflow examples throws:
Copy code
Traceback (most recent call last):                                                                                                                                                                                                                            │
│   File "/opt/venv/bin/pyflyte-execute", line 8, in <module>                                                                                                                                                                                                   │
│     sys.exit(execute_task_cmd())                                                                                                                                                                                                                              │
│   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__                                                                                                                                                                          │
│     return self.main(*args, **kwargs)                                                                                                                                                                                                                         │
│   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main                                                                                                                                                                              ││     rv = self.invoke(ctx)                                                                                                                                                                                                                                     ││   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke                                                                                                                                                                            ││     return ctx.invoke(self.callback, **ctx.params)                                                                                                                                                                                                            ││   File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke                                                                                                                                                                             ││     return __callback(*args, **kwargs)                                                                                                                                                                                                                        ││   File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 470, in execute_task_cmd                                                                                                                                                      ││     _execute_task(                                                                                                                                                                                                                                            ││   File "/opt/venv/lib/python3.8/site-packages/flytekit/exceptions/scopes.py", line 160, in system_entry_point                                                                                                                                                 ││     return wrapped(*args, **kwargs)                                                                                                                                                                                                                           ││   File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 348, in _execute_task                                                                                                                                                         ││     _handle_annotated_task(ctx, _task_def, inputs, output_prefix)                                                                                                                                                                                             ││   File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 291, in _handle_annotated_task                                                                                                                                                ││     _dispatch_execute(ctx, task_def, inputs, output_prefix)                                                                                                                                                                                                   ││   File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 160, in _dispatch_execute                                                                                                                                                     ││     ctx.file_access.put_data(ctx.execution_state.engine_dir, output_prefix, is_multipart=True)                                                                                                                                                                ││   File "/opt/venv/lib/python3.8/site-packages/flytekit/core/data_persistence.py", line 476, in put_data                                                                                                                                                       ││     raise FlyteAssertion(                                                                                                                                                                                                                                     ││ flytekit.exceptions.user.FlyteAssertion: Failed to put data from /tmp/flyte-z58pqpy5/sandbox/local_flytekit/engine_dir to <abfs://flyte-workflows/metadata/propeller/>...

Original exception: No plugin found for matching protocol of path <abfs://flyte-workflows/metadata/propeller/>...
Are there specific steps to enable/support the abfs protocol, beyond configuring stow as above?
Calling @Nick Müller (MorpheusXAUT). It seems you have the closest related issue on github. Maybe you know what's up? 🙂
n
Hi @Mathias Andersen! your configuration looks fine to me, we're using basically the same (expect you have an extra
type: custom
in the
limits
blocks that should not be required, I believe. We're using a fork of adlfs in our flyte deployments since some
FlyteDirectory
handling was broken with the upstream version (our PR has finally been merged, but no new release was created yet, so we still couldn't contribute it back to upstream flytekit), however that should not be the issue you're seeing right here.
I'm not a python/flytekit expert unfortunately, but it looks to me that abfs protocol support was potentially broken over the last few months (we're internally still based on flytekit 1.1.0). These two commits come to mind: https://github.com/flyteorg/flytekit/commit/6f19d7435b11df6384cf9ed099ad74ce9fbe651f https://github.com/flyteorg/flytekit/commit/28da983bba36e243bc671f9ba1aa53a0791efd62 Especially the second one (related to https://github.com/flyteorg/flytekit/pull/1526) looks like it might be the culprit since handling was changed quite a bit, but that's still a beta release if I saw that correctly
Azure support is still somewhat experimental I believe, we're still working out the fsspec/adlfs issues internally before contributing our changes back upstream, plus documentation on a "proper" Azure setup is still outstanding as well 😅
which version of flytekit are you running?
m
Good observations. Thanks! - I'm running 1.4 and will give 1.1 a go. Should I consider which chart version I use for flyte-core deployment as well?
n
I'm honestly not sure what version of the rest of the flyte components you'll have to pick in order to be compatible with flytekit 1.1, we're running an internal fork of most components to allow for in-house flyte development before we push changes back upstream, so we're a bit out of sync with the upstream. we're also quite a few months back on updates due to lack of time on my side 😅
I think we're roughly based on the 1.1.0 release of flyte (https://github.com/flyteorg/flyte/tree/v1.1.0/charts/flyte-core), although we're using slightly newer versions of flyteadmin/flytepropeller, I would estimate around Q3/Q4 of last year, but cannot put it to an exact version as we're working based on
master
branch state at the time of forking
we have internal tasks to update to the latest versions of flyte using the upstream published releases soon, so we'll probably run into a similar issue like you've seen. I assume we'll need to re-add abfs support like I did a while ago (https://github.com/flyteorg/flytekit/pull/1109)
no ETA on that from my side though unfortunately 😕 it's on my plate, but so is a lot else unfortunately 😅
m
Roger that. I'll give it a jab and report back. Maybe we should add some docs for experimental AKS/azure blob setup? - Just to indicate the state of maturity and (hopefully) a known path to something that isn't smoking 😄
Thank you for the good feedback!
n
that's been on my todo list for a while now too (adding some basic docs), but I haven't gotten around to doing so unfortunately 🙈
m
Changing to flytekit 1.1 swapped the protocol to HTTPS, but results in response 400.
Copy code
pyflyte run --remote workflows/example.py wf --name 'hello'

File "/home/user/.local/share/virtualenvs/first-attempt-to-flyte-Ae80d762/lib/python3.8/site-packages/flytekit/core/data_persistence.py", line 447, in put_data
    raise FlyteAssertion(
flytekit.exceptions.user.FlyteAssertion: Failed to put data from /tmp/tmprmqf8xzd/script_mode.tar.gz to <https://my.blob.core.windows.net/flyte-workflows/flytesnacks/development/FOVDCPZCJ447LTKJTX5HCLBKPA======/scriptmode.tar.gz?se=2023-04-04T13%3A50%3A53Z&sig=V7%2B9c7Y5e4Oc0sf2XbxVXOtWssJIdCmxst4wtWLzzyI%3D&sp=aw&spr=https&sr=b&sv=2019-12-12> (recursive=False).

Original exception: Value error!  Received: 400. Request to send data failed.
k
So @Nick Müller (MorpheusXAUT) @Mathias Andersen Flyte backend and flytekit are no tightly dependent on a version
But we have migrated all flytekit storage subsystem to fsspec now
And worked with fsspec to fix a lot of bugs
But. Sadly my team has no access to azure and would love help
The fsspec team is very receptive to us
n
the only bug we encountered with fsspec or adlfs more precisely was fixed by this: https://github.com/fsspec/adlfs/pull/398 unfortunately there's no new tag yet, so it's not fully available. but that should not cause the issue Mathias is seeing
k
@Nick Müller (MorpheusXAUT) has been promising me to get azure support upstreamed, but I have not seen anything and this has been months
Hmm that seems like a ln issue with signed urls
n
we've been waiting for the adlfs fix to finally be merged, had another PR open for a few months before that - without that,
FlyteDirectory
didn't work properly, so contributing all changes upstream would've not been very useful
k
Also @Nick Müller (MorpheusXAUT) you should get onto flytekit 1.5 beta. Can you help test it. Supports lots of cool things Streaming files and directories Multi list maps Better caching on map tasks Pyflyte run multi file etc
n
I'd love to, there's been people requesting an update to a newer version anyways, unfortunately my time is quite limited atm. but that's a different topic/not related to Mathias' issue
to rule out an invalid azure blob storage configuration: did you test the account/key in e.g. azure storage explorer to make sure the credentials work @Mathias Andersen? are they IP restricted by any chance?
m
I tired a bad key, which causes pods to fail at startup. I just tried degrading to flyte-core 1.1.0 as well, same issue, but I also note that when running pyflyte run --remote, I get the respose 400 on HTTPS error, however when I package and register and execute the workflow on flyte, it throws the plugin (for abfs) error.
I'll check out storage explorer
I see inputs and user_inputs files. So something is in fact working! (still on 1.1.0 for both core and kit)
Flyte reports (same as in OP):
Copy code
Original exception: No plugin found for matching protocol of path abfs://...awqq2pc7mkjlnmql7vcf/n0/data/0
Which I would interpret as a failure to write data, but this path contains an inputs.pb file with expected content!
k
Ohh ya you have to install 2 additional dependencies
Flytekitplugins-data-fsspec
And abfs
m
I see. In the package image?
n
adlfs, not abfs, right? 🤔 at least that's what we've been using so far to handle the azure blob storage protocol
we're installing these two in the projects that use flytekit itself, yeah
so in the same image your code is running in
m
Roger
Copy code
Original exception: unable to connect to account for Must provide either a connection_string or account_name with credentials!!
I guess I need to review the fsspec config Files were still produced though.
No luck so far. I'm shutting down for the day. It feels like we are getting close though! 🙂
Ok. - So the backend uses stow as configured through chart values and flytekit uses fsspec. This explains why I see files correctly written as well as an error about credentials. fsspec/adbl want an account-name and key, which it will attempt to collect in various ways. - the credentials as far as I can gather, is also unrelated to the stow config. Do you recommend/prefer a way to offer the azure name and key to fsspec?
k
Ohh we recommend using serviceaccount to get the role On AWS we use eks serviceaccount for Iam role, On gcp workload identity On azure @Nick Müller (MorpheusXAUT)
n
Azure has pod identity (still in preview, as usual on Azure, but already being superseded 😄) and workload identity (also preview, but the new way to do things), however we have not used them in combination with Flyte yet, so I don't have any experience in that regard unfortunately
mounting service principal credentials might work as well if fsspec has an Azure implementation using MSAL (which most libraries built for Azure do) since that can read credential files mounted in a well-known location as well, however you'd probably have to mount that to every point executed by flyte, which is a bit cumbersome, so pod/workload identities are probably the preferred way to do that
k
It should not be cumbersome - you can use default pod templates for this right
n
oh good point, fair enough, completely forgot about that. so yeah, if fsspec supports that (I'd have to check, don't know off the top of my head), you could also try mounting a secret with the service principal credentials to your pod and it might be able to pick it up and authenticate using that
workload identity would be the "cleaner" way probably, but that for sure is a bit annoying to configure in Azure, not as straight-forward as in AWS unfortunately
m
fsspc+adlfs does indeed implement MSAL through the azure-identity package. I am now able to run the init example workflow on flyte without errors. - I see no "output" files on blob. I expected the output to be stored like the input, but can't spot it in azure storage explorer. Thank you for your help this far. - I think the remaining work on my end will be to implement workload identities in a nice manner and document the setup. Hopefully we will be able to share a getting started preview on AKS 🙂
Oh, I need to mention that:
pyflyte register workflows
and
pyflyte run --remote workflows/example.py wf --name myname
throws my original error: "Value error! Received: 400. Request to send data https://... Failed to put data from /tmp/tmp52zl9lno/script_mode.tar.gz to https://..." I guess pyflyte cli does not use the data persistence layer from flytekitplugins-data-fsspec (or 1.5, I just updated)? @Ketan (kumare3) Update:
pyflyte register workflows --non-fast
works. No idea why fast-register breaks, but it explains why
run --remote
fails as well.
Some AKS/azure blob observations regarding 1.5 The functioning setup with flytekit 1.4.2, flytekitplugins-data-fsspec 1.4.2 and adlfs is not mirrored in the current fsspc implementation in flytekit 1.5. It seems the packages and plugins are there, but not used the same way. I get error:
No plugin found for matching protocol of path abfs://...
j
Hey, yep I am getting similar error Received 400: Request to send data ... failed when running
pyflyte run --remote
I am on flyte 1.4.3 and flytekit 1.4.2. I am not sure if this is because the signed url collides with blob url dns.
@Mathias Andersen I am pretty new to flyte... were you able to run a workflow otherwise? I did try minio instead of azure blob storage which I can start up a pod, but getting "Flytekit.exceptions.user.FlyteAssertion: Failed to get data from s3://my-s3-bucket/Original exception: Unable to locate credentials"
m
I investigated if flytekit just needs some configuration locally, for targeting azure blob storage via fsspec and abfs//, but I have not found the "missing link"
k
Thank you for raising this, we should not need data plugin anymore
Cc @Yee - but we do not have any azure experience
If you folks can help fix 🙏
m
@Jeongwon SongYou can see a working storage setup towards azure blob in OP. It is fairly basic on the azure side of things. You may also need to config propeller to point raw data (different from meta data) to this blob storage: https://github.com/flyteorg/flyte/blob/master/charts/flyte-core/values-eks.yaml#L228 should be abfs://{{ .Values.userSettings.bucketName }}/ in the AKS/Azure blob scenario. So, the abfs// protocol name points flytekit towards the sfspc/adlf data layer plugin, which attempt to auth via azure identity/defaultcredentials. As a test I just added the storage account name and key to a custom contaniers env vars, but there is a nicer solution.
At this point I only experience problems with fast registration via pyflyte. I have not spotted the issue yet.
y
hey @Mathias Andersen do you have a sec to help me understand this?
in the old code we just iterated through
known_implementations
which is this map here
the correct string to use is
abfs
not
adlfs
and that should work
Copy code
(flytekit) ytong@Yees-MBP:~/go/src/github.com/flyteorg/flytekit [flyte-sandbox] (master) $ ipython
Python 3.10.8 (main, Oct 13 2022, 09:48:40) [Clang 14.0.0 (clang-1400.0.29.102)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.11.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from flytekit.core.context_manager import FlyteContextManager

In [2]: ctx = FlyteContextManager.current_context()

In [3]: ctx.file_access.get_filesystem("abfs")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/envs/flytekit/lib/python3.10/site-packages/adlfs/spec.py:447, in AzureBlobFileSystem.do_connect(self)
    446     else:
--> 447         raise ValueError(
    448             "Must provide either a connection_string or account_name with credentials!!"
    449         )
    451 except RuntimeError:
m
It looks to me like you set it up correctly, but fsspec/adfls unsuccessfully looks for account_name and creds in the container. How you choose to pass these credentials is up to you: https://github.com/fsspec/adlfs#setting-credentials
164 Views