https://flyte.org logo
#ask-the-community
Title
# ask-the-community
f

Frank Shen

06/29/2023, 10:13 PM
Hello, I am running bigquery task locally. I am using the latest flytekit 1.7.0 and I have flytekitplugins-bigquery 1.7.0 and its dep *google-cloud-bigquery==*3.11.3, fsspec==2023.3.0 etc. installed locally. The bigquery_task succeeded in retrieving dataset from bigquery via select statement. However, it failed to convert it to pd.DataFrame in a subsequent task. error: Protocol not known: bq stack trace:
Copy code
/venv/lib/python3.8/site-packages/fsspec/registry.py:209 in get_filesystem_class
   if protocol not in registry:
        if protocol not in known_implementations:
           raise ValueError("Protocol not known: %s" % protocol)
My code:
Copy code
# from typing import Tuple
try:
    from typing import Annotated
except ImportError:
    from typing_extensions import Annotated


import pandas as pd
from flytekit import task, workflow, StructuredDataset, kwtypes
from flytekitplugins.bigquery import BigQueryConfig, BigQueryTask
import google.cloud.bigquery


bigquery_task = BigQueryTask(
    name="sql.bigquery.test",
    inputs=kwtypes(version=int),
    query_template="SELECT * FROM `bigquery-public-data.crypto_dogecoin.transactions` WHERE version = @version LIMIT 2;",
    task_config=BigQueryConfig(ProjectID=""),
    output_structured_dataset_type=pd.DataFrame

)

@task
def preproc(df: pd.DataFrame) -> None:
    print(df.head())


@workflow
def wf(version: int = 1) -> None:
    preproc(df = bigquery_task(version=version))
Do you have any idea?
I also tried this and got same error:
Copy code
bigquery_task = BigQueryTask(
...    output_structured_dataset_type=StructuredDataset

)

@task
def preproc(sd: StructuredDataset) -> None:
    df = sd.open(pd.DataFrame).all()
    print(df.head())


@workflow
def wf(version: int = 1) -> None:
    sd = bigquery_task_1(version=version)
    preproc(sd = sd)
s

Samhita Alla

06/30/2023, 5:05 AM
@Kevin Su, shouldn't the conversion to pandas dataframe be handled by StructuredDataset? @Frank Shen, is it possible for you to share the full stack trace?
f

Frank Shen

06/30/2023, 4:26 PM
@Samhita Alla, @Kevin Su, Sure. Here is the full stack trace for the last piece of code.
stacktrace.txt
k

Kevin Su

07/01/2023, 8:01 PM
did you install these packages?
f

Frank Shen

07/06/2023, 6:36 PM
Hi @Kevin Su, I did.
Copy code
(env_flyte_1_7) ➜  hbo-code pip list | grep google
google-api-core             2.11.1
google-auth                 2.21.0
google-auth-oauthlib        1.0.0
google-cloud-bigquery       3.11.3
google-cloud-core           2.3.2
google-cloud-storage        2.10.0
google-crc32c               1.5.0
google-resumable-media      2.5.0
googleapis-common-protos    1.59.1
k

Kevin Su

07/06/2023, 10:47 PM
google-cloud-bigquery-storage and google-cloud-storage are different
4 Views