Hi Team, We are getting the below error when we ar...
# ask-the-community
j
Hi Team, We are getting the below error when we are using pandascursor to get the Athena query result in the flyte workflow, which is only occurring when we use PyAthena >3.0.10. However below python code works fine with other application but doesn't work inside flyte. ERROR in Organization Cluster:
__init__() got an unexpected keyword argument 'connection'
We are using the 'production flow' approach of registration pattern. 1. We have poetry project with the workflows and Docker file. 2. In poetry project we are using the below dependencies.
Copy code
[tool.poetry.dependencies]
python = "~3.9"
botocore = "1.31.17"
flytekit = "1.10.2"
flytekitplugins-deck-standard = "1.10.2"
flytekitplugins-papermill = "1.10.2"
flytekitplugins-pod = "1.10.2"
kubernetes = "^28.1.0"
pyathena = "3.1.0"
3. When we build the docker and running the python code inside the docker its working fine.
Copy code
from flytekit import task, workflow
from pyathena import connect
from pyathena.pandas.cursor import PandasCursor

@task
def test1():
    access_key = ""
    secret_key = ""
    conn  = connect(aws_access_key_id=access_key,
               aws_secret_access_key=secret_key,
                  s3_staging_dir="<s3://drd-029652062076-28e0/>",region_name="us-east-1",
                              cursor_class=PandasCursor).cursor()
    query = "SELECT * FROM default.flytetesting limit 10;"
    df = conn.execute(query).as_pandas()
    print(df.head(5))

@workflow
def hello_world_wf():
    test1()
    print("success")


if __name__ == "__main__":
    print(f"Running wf() {hello_world_wf()}")
4. We are registering workflows using below commands
Copy code
- pyflyte --pkgs workflows package --image public.ecr.aws/<application-name>/flyte:latest -f

- git rev-parse HEAD 

- flytectl register files --project my-project --domain development --archive flyte-package.tgz --version "${git rev-parse HEAD}"
Is there internal some circular dependency from flyte causing this issue? I tried the above project in my local setup to test with basic installation. I am getting below error in local setup. ERROR in local flyte cluster:
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f24327f2f9d41438aaef-n0-0] terminated with exit code (128). Reason [StartError]. Message: 
failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "pyflyte-fast-execute": executable file not found in $PATH: unknown.
Anybody faced this issue before and knows what could be the reason ?
j
Hi @Jayanth S N, I’ve gotten similar errors before when trying to use
pyflyte run
command with poetry. Normally running
poetry udpate
resolves the issue.
j
Hi @Jake Dodd, yes we tried that poetry update basic dependency conflicts are handled by poetry. This image is working fine normally as screenshot of python result its running inside docker image. However in flyte cluster I get that error
__init__() got an unexpected keyword argument 'connection'
We've been stuck here; any assistance is greatly appreciated. Thank you.
s
@Jayanth S N let's first get your workflow running in the local flyte cluster. the
executable file not found in $PATH
error you're seeing has to do with your dockerfile. would you mind sharing it?
pyflyte-fast-execute
should get automatically installed with flytekit. it seems like the path isn't being set correctly. here's an example dockerfile for your reference: https://github.com/flyteorg/flytesnacks/blob/master/examples/basics/Dockerfile
regarding
__init__() got an unexpected keyword argument 'connection'
, i'm not sure what
connection
is here. could you share the full log?
j
hi @Samhita Alla Thanks for helping out. I am able to make the workflow run in the local with the help of that docker file changes. Please find the error logs which I am getting. This is exactly the error I am facing in our organization cluster as well.
Copy code
Traceback (most recent call last):

      File "/opt/venv/lib/python3.9/site-packages/flytekit/exceptions/scopes.py", line 219, in user_entry_point
        return wrapped(*args, **kwargs)
      File "/root/workflows/flytetest.py", line 41, in test1
        df = conn.execute(query).as_pandas()
      File "/opt/venv/lib/python3.9/site-packages/pyathena/pandas/cursor.py", line 138, in execute
        self.result_set = AthenaPandasResultSet(
      File "/opt/venv/lib/python3.9/site-packages/pyathena/pandas/result_set.py", line 143, in __init__
        df = self._as_pandas()
      File "/opt/venv/lib/python3.9/site-packages/pyathena/pandas/result_set.py", line 389, in _as_pandas
        df = self._read_csv()
      File "/opt/venv/lib/python3.9/site-packages/pyathena/pandas/result_set.py", line 305, in _read_csv
        raise OperationalError(*e.args) from e

Message:

    __init__() got an unexpected keyword argument 'connection'

User error.
I am also attaching the error logs from the pod for the reference.
s
looks like the error got to do with the incompatibility between aiobotocore and s3fs versions. could you please check that?
j
Hi @Samhita Alla, thanks for looking into the issue. I too agree there is some dependency issue with s3fs causing this however its working fine outside flyte with the same dependencies. Please find the attached dependency list of our project. It looks like in flyte we are seeing this circular dependency issue. Is there a way we can know like exactly what libraries is conflicting with flyte1.10.2 ? Screenshot: Working in docker image without any dependency issue.
s
this is weird. if it's working in the docker image, it has to work on the flyte cluster too. you're using the same docker image to register your flyte workflows, right?
j
Hi @Samhita Alla Yes, I am using the same docker image to register. Please find the attached screenshots for the same. 1. Docker build 2. Flyte Registration 3. Docker Image 4. Flyte Error We wanted to understand as pyflyte wraps the image and register it, is it doing anything under the hood for the version which is creating an issue?
s
have you also double-checked the image that's included in the task details on the UI?
We wanted to understand as pyflyte wraps the image and register it, is it doing anything under the hood for the version which is creating an issue?
that shouldn't happen.
j
Hi @Samhita Alla yes I double checked. This is me recreating the issue occurring in our organization cluster, which is causing issue in all the environments and exactly the same issue. Please find the attached docker image.
s
not sure what's happening. if you could debug and check what's that additional
connection
param, it'd make things easier. may be log kwargs? meanwhile, you can also try using the athena integration (backend setup).
j
Hi @Samhita Alla I have tested this this is not issue in aiobotocore, its issue with Pyathena version. If we use anything above Pyathena>2.5.2, it breaks in Flyte. So I have attached the new log if you can see at the end of the log, the error was directly pointed out by the error found on pyathena functions. Context: So we are doing an Athena query by connecting to AWS and we want the result in the pandas df. We are able to see query getting executing in Athena only issue is while reading it from S3. So I separately tried to call s3 bucket or run aws commands using python subprocess. I do have access and able fetch the s3 objects. But my assumption is when its trying to fetch from Pyathena connection is getting lost in flyte. Flow: 1. It calls execute function in Pyathena https://github.com/laughingman7743/PyAthena/blob/2d88c3e6172295880d011a5f151bc273b9d577ad/pyathena/pandas/cursor.py#L121C9-L121C16 2. tries to get Athena pandas result set https://github.com/laughingman7743/PyAthena/blob/2d88c3e6172295880d011a5f151bc273b9d577ad/pyathena/pandas/cursor.py#L160 3. AthenaPandasResultSet has as_pandas function which internally calls read_csv function and the error is causing there https://github.com/laughingman7743/PyAthena/blob/2d88c3e6172295880d011a5f151bc273b9d577ad/pyathena/pandas/result_set.py#L295 I don't know why the flyte is unable to establish a connection to s3 when we call from Pyathena.
s
good to know that you're at least unblocked for now. not sure why flyte would throw an error for specific pyathena versions when you're just encapsulating the code in a flyte task. this is weird.
i think it still has to do with aiobotocore 'cause an exception is being raised by pyathena since that particular call's failing.
j
Hi @Samhita Alla Can I raise the Github issue for this because if you see Pyathena 2.5.1 and Pyathena 3.1.0, aiobotocore version is same in them as per the dependency list, so its wierd it working in docker image and not inside flyte. We have some application which it only uses new version of pyathena and this will block us from using that application. Hence this is very necessary for us.
s
cc @Eduardo Apolinario (eapolinario) @Kevin Su i'm not sure what the root cause of this issue is. any idea?
@Jayanth S N pyathena 2.5.1 works with flyte, whereas pyathena 3.10 doesn't, correct? if that's the case, please file an issue.
j
Hi @Samhita Alla thank you, please find the issue link https://github.com/flyteorg/flyte/issues/4855 let me know if need any further info for the same.
Hi Team, did you guys get any update on this issue https://github.com/flyteorg/flyte/issues/4855?
Hi @Samhita Alla , I understand that you may be busy with other tasks, but I wanted to inquire about the progress or any updates on the above opened issue. I greatly appreciate the work you and the team put into maintaining flyte. I understand that addressing bugs and issues takes time, and I'm thankful for your efforts in improving the software's reliability and functionality. If there's anything further I can provide to assist with resolving this issue or if you need any additional information, please don't hesitate to let me know. I'm more than willing to help in any way I can.
s
hi Zlak! thanks for creating an issue. the resolution might take some time as the team's working on other issues, but we'll look into it. if you're interested in contributing a fix, that'd be amazing!
116 Views