Hello team, I am trying to pass s3_client object t...
# ask-the-community
v
Hello team, I am trying to pass s3_client object to a function. But it is giving the following error:
Copy code
Failed with Unknown Exception <class 'AssertionError'> Reason: Error encountered while executing 'read_buckets':
  Failed to Bind variable s3_client for function files.list_buckets.
Error encountered while executing 'read_buckets':
  Failed to Bind variable s3_client for function files.list_buckets
I suppose it is because, it is not one of the native type supported? Will I have to write a custom FlyteType class as explained here? Any alternatives?
k
This will use python pickle and picking clients http is dangerous
v
can you please elaborate?
Also any tutorial examples we have for interacting with AWS in tasks and workflows. I am unable to find them😶 @Ketan (kumare3)
k
You are always interacting
All data is stored in s3
Just create a boto client and go for it
v
I am sorry @Ketan (kumare3) but this is not very explanatory to me. Could you please provide additional context - perhaps with some code snippets?
r
@Varshit Dusad I think what @Ketan (kumare3) is saying is that all artifacts are stored in object storage. In your case, are you still running the single pod deployment of Flyte? If so, it's likely that your artifact store is in fact the
minio
instance inside your deployment, which your workflows are always interacting with. Beyond that, if you are trying to interact with an external s3 bucket outside of your deployment, then yes, I think the approach you would have to take is what you are trying to do there.
v
Thanks @Ryan Russon - that's what I meant. Essentially I am testing simple procedure such as reading in a dataframe and then uploading it as a csv to my own private bucket. But I keep getting permission error - I have added the entire code and error log from there in this channel below. I will link here as well. I am aware I can solve my above problem with native boto implementation but then fail to see how to leverage FlyteFile for the same (if that's the purpose)
k
Aah you want the remote path to be your s3 bucket?
v
Oh @Ketan (kumare3) - I am a little new to flyte - so I get tripped when you mention terms such as "remote path". But yes, I am talking about my data that I will process and not the metadata itself
Also, following on that - what exactly is the purpose of FlyteFile? I thought it was flyte's recommendation to process our target data from S3 and databases.
k
All your data will be stored in your desired bucket automatically
So when you use pyflyte run set the —raw-output-prefix like s3://…
v
Well unfortunately, that doesn't seem to be the case. I keep running into authentication issues.
Wait can't I just add output_path
k
Or if you want to do It per task then in flytefile(remote_path=s3://my-path)
k
Authentication has to be set using - service account
You will have to create an Iam role
v
So I have my IAM credentials
k
And bind it to a service account
v
i can run them in native python
what I am struggling with is how to move them to flyte
k
Then in pyflyte run —serviceaccount set the bound account
And this is a fair question
Just not near a computer
@Samhita Alla can you please help
s
here's what Ketan's referring to:
you should be able to run this command:
pyflyte run --remote --service-account <your-sa> <python-file.py> <task-or-workflow>
v
Thanks @Samhita Alla - <your-sa> -> refers to the location of json/csv key we get from aws ?
can you please give a sample file (or input to command line) - because how will flyte identify relevant bucket, cloud provider etc. from just access and secret key.
s
yes, good point. if you've spin up a demo cluster, you can run
kubectl edit configmap flyte-sandbox-config -n flyte
to include the s3 config:
Copy code
003-storage.yaml: |
    propeller:
      rawoutput-prefix: <s3://my-s3-bucket/data>
    storage:
      type: stow
      stow:
        kind: s3
        config:
          region: us-east-1
          disable_ssl: true
          v2_signing: true
          endpoint: <http://flyte-sandbox-minio.flyte:9000>
          auth_type: accesskey
      container: my-s3-bucket
  100-inline-config.yaml: |
    plugins:
      k8s:
        default-env-vars:
        - FLYTE_AWS_ENDPOINT: <http://flyte-sandbox-minio.flyte:9000>
        - FLYTE_AWS_ACCESS_KEY_ID: minio
        - FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
    storage:
      signedURL:
        stowConfigOverride:
          endpoint: <http://localhost:30002>
this configuration is for minio. if you want to update it to an s3 bucket, you'll need to change this. if you want to use both minio and s3, i don't think it's possible today: https://discuss.flyte.org/t/16068582/hi-all-i-just-wanted-to-expand-my-last-question-about-access#9b87c938-6010-48b0-8686-d3519c894c45.
v
Thanks for the update. Personally, no I don't want both minio and s3. Just S3 works perfect for me
Just to confirm once again this config file will go in the ~/.flyte folder right?
s
nope, you need to edit the configmap:
kubectl edit configmap flyte-sandbox-config -n flyte
and then save it.
v
hmm, thanks again. But it seems like learning kubernetes is a must precursor for getting started with flyte😕 I hope I was not too pestering with my comments. You and @Ketan (kumare3) have been quite patient and helpful.
s
nope, you are not! we definitely need to work on our docs as well. 🙂 and yes, we need to get acquainted with kubernetes a little. i'm not a kubernetes expert myself, but i've found it quite manageable to tinker with the config aspects. we'll definitely work towards abstracting away kubernetes a little more.
v
Thanks looking forward to it🙂