Hi all. I just wanted to expand my last question a...
# ask-the-community
g
Hi all. I just wanted to expand my last question about accessing files on S3 using FlyteFile class. Here's the code of the Flyte task that I'm trying to run:
Copy code
@task(
        container_image="<http://ghcr.io/flyteorg/flytekit:py3.10-1.10.1b0|ghcr.io/flyteorg/flytekit:py3.10-1.10.1b0>",
        environment={
            "AWS_ACCESS_KEY_ID": "some-value"
            "AWS_SECRET_ACCESS_KEY": "some-value",
            "AWS_DEFAULT_REGION": "some-value"
        }
)
def consume() -> None:
    f = FlyteFile("<s3://some-s3/file|s3://some-s3/file>") 
    f.download() # fails here
    
    # in the `user_script.sh` I want to do some manipulations with 
    # the file downloaded to the local filesystem
    subprocess.run(["./user_script.sh"])
I'm getting the following error when calling `f.download()`:
Copy code
packages/flytekit/types/file/file.py\", line 209, in __fspath__\n        self._downloader()\n      File \"/usr/local/lib/python3.10/site-packages/flytekit/types/file/file.py\", line 434, in _downloader\n        return ctx.file_access.get_data(uri, local_path, is_multipart=False)\n      File \"/usr/local/lib/python3.10/site-packages/flytekit/core/data_persistence.py\", line 467, in get_data\n        raise FlyteAssertion(\n\nMessage:\n\n    Failed to get data from <s3://some-s3/file|s3://some-s3/file> to /tmp/flyteslc48j5z/local_flytekit/0d47388e2e9630d969bb337a5dfb83ac/file (recursive=False).\n\nOriginal exception: Access Denied.\n\nUser error."}
{"asctime": "2023-11-16 05:33:31,327", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
I'm able to download the same file using the same credentials with aws cli. Also my workflow works fine when I run it with pyflyte on a local machine and not a demo cluster. What might be the problem here?
k
could you describe your pod to check those envs are in the container? kubectl describe pods <name> -n flytesnacks-developement
g
Yeah, looks like environment variables are in place
Copy code
Containers:
  f194f319452404a1b9ac-n1-0-0:
    Container ID:  <containerd://7b1698bd4e79be0727fc4bb71fa2c1353b7b53b3e278f1f639d99243f37053a>b
    Image:         <http://ghcr.io/flyteorg/flytekit:py3.10-1.10.1b0|ghcr.io/flyteorg/flytekit:py3.10-1.10.1b0>
    Image ID:      <http://docker.io/ghcr.io/flyteorg/flytekit@sha256:a8006be3e778208d964fefbd0a880909aca1295881938401e62798cb770c5fb5|docker.io/ghcr.io/flyteorg/flytekit@sha256:a8006be3e778208d964fefbd0a880909aca1295881938401e62798cb770c5fb5>
    Port:          <none>
    Host Port:     <none>
    Args:
      pyflyte-fast-execute
      --additional-distribution
      <s3://my-s3-bucket/flytesnacks/development/ESI3LPXXDLUQA66NHZECMRPDBM======/script_mode.tar.gz>
      --dest-dir
      /root
      --
      pyflyte-map-execute
      --inputs
      <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f194f319452404a1b9ac/n1/data/inputs.pb>
      --output-prefix
      <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f194f319452404a1b9ac/n1/data/0>
      --raw-output-data-prefix
      <s3://my-s3-bucket/data/4b/f194f319452404a1b9ac-n1-0/0/0>
      --checkpoint-path
      <s3://my-s3-bucket/data/4b/f194f319452404a1b9ac-n1-0/0/0/_flytecheckpoints>
      --prev-checkpoint
      ""
      --resolver
      MapTaskResolver
      --
      vars
      
      resolver
      flytekit.core.python_auto_container.default_task_resolver
      task-module
      fastqc2
      task-name
      consume4
    Environment:
      AWS_ACCESS_KEY_ID:                  redacted
      AWS_SECRET_ACCESS_KEY:              redacted
      AWS_DEFAULT_REGION:                 us-east-1
k
Could you update these envs in the sandbox config map instead? I guess those envs override your secret and key id.
g
I don't see it those variables being set in the sandbox config at all. Here os the content of my
~/.flyte/config-sandbox.yaml
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: localhost:30080
  authType: Pkce
  insecure: true
console:
  endpoint: <http://localhost:30080>
logger:
  show-source: true
  level: 0
Is this the wrong config file I'm looking at?
k
not this one. kubectl get cm flyte-sandbox -n flyte
g
Thanks. I tried editing the ConfigMap. So, if I replace these values with my AWS access keys, then my other tasks fail because Flyte is not able to access MinIO.
Copy code
- FLYTE_AWS_ACCESS_KEY_ID: minio
      - FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
If I just append my aws creds like this
Copy code
default-env-vars:
        - FLYTE_AWS_ENDPOINT: <http://flyte-sandbox-minio.flyte:9000>
        - FLYTE_AWS_ACCESS_KEY_ID: minio
        - FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
        - AWS_ACCESS_KEY_ID: redacted
        - AWS_SECRET_ACCESS_KEY: redacted
then I still get access denied when trying to download file from aws
Interestingly, if I just directly use boto3 inside the task to download the file, it works just fine. So something strange is going on within the FlyteFile abstraction
k
hmm, which version of flytekit you are using?
g
The container used by the task is flytekit:py3.10-1.10.1b0, so 1.10.1b0
locally 1.10.1 is installed as well
y
are you using minio or real s3? i can’t tell
nix the aws version of the key/secret if you’re using minio
and vice versa
g
I'm using S3 in my workflows. But my understanding is that Flyte sandbox cluster is using minio for it's internal stuff by default
y
Mmm. I don’t think that’s possible. At least not without a fair bit of hacking and code change.
You can use different buckets. Can’t use different S3s
g
I see. Thank you