Hello, I am running bigquery task locally. I am us...
# flyte-support
s
Hello, I am running bigquery task locally. I am using the latest flytekit 1.7.0 and I have flytekitplugins-bigquery 1.7.0 and its dep *google-cloud-bigquery==*3.11.3, fsspec==2023.3.0 etc. installed locally. The bigquery_task succeeded in retrieving dataset from bigquery via select statement. However, it failed to convert it to pd.DataFrame in a subsequent task. error: Protocol not known: bq stack trace:
Copy code
/venv/lib/python3.8/site-packages/fsspec/registry.py:209 in get_filesystem_class
   if protocol not in registry:
        if protocol not in known_implementations:
           raise ValueError("Protocol not known: %s" % protocol)
My code:
Copy code
# from typing import Tuple
try:
    from typing import Annotated
except ImportError:
    from typing_extensions import Annotated


import pandas as pd
from flytekit import task, workflow, StructuredDataset, kwtypes
from flytekitplugins.bigquery import BigQueryConfig, BigQueryTask
import google.cloud.bigquery


bigquery_task = BigQueryTask(
    name="sql.bigquery.test",
    inputs=kwtypes(version=int),
    query_template="SELECT * FROM `bigquery-public-data.crypto_dogecoin.transactions` WHERE version = @version LIMIT 2;",
    task_config=BigQueryConfig(ProjectID=""),
    output_structured_dataset_type=pd.DataFrame

)

@task
def preproc(df: pd.DataFrame) -> None:
    print(df.head())


@workflow
def wf(version: int = 1) -> None:
    preproc(df = bigquery_task(version=version))
Do you have any idea?
I also tried this and got same error:
Copy code
bigquery_task = BigQueryTask(
...    output_structured_dataset_type=StructuredDataset

)

@task
def preproc(sd: StructuredDataset) -> None:
    df = sd.open(pd.DataFrame).all()
    print(df.head())


@workflow
def wf(version: int = 1) -> None:
    sd = bigquery_task_1(version=version)
    preproc(sd = sd)
t
@glamorous-carpet-83516, shouldn't the conversion to pandas dataframe be handled by StructuredDataset? @salmon-refrigerator-32115, is it possible for you to share the full stack trace?
s
@tall-lock-23197, @glamorous-carpet-83516, Sure. Here is the full stack trace for the last piece of code.
stacktrace.txt
g
did you install these packages?
s
Hi @glamorous-carpet-83516, I did.
Copy code
(env_flyte_1_7) ➜  hbo-code pip list | grep google
google-api-core             2.11.1
google-auth                 2.21.0
google-auth-oauthlib        1.0.0
google-cloud-bigquery       3.11.3
google-cloud-core           2.3.2
google-cloud-storage        2.10.0
google-crc32c               1.5.0
google-resumable-media      2.5.0
googleapis-common-protos    1.59.1
g
google-cloud-bigquery-storage and google-cloud-storage are different
168 Views