Hello,
I am running bigquery task locally.
I am using the latest flytekit 1.7.0
and I have
flytekitplugins-bigquery 1.7.0 and its dep *google-cloud-bigquery==*3.11.3, fsspec==2023.3.0 etc.
installed locally.
The bigquery_task succeeded in retrieving dataset from bigquery via select statement.
However, it failed to convert it to pd.DataFrame in a subsequent task.
error: Protocol not known: bq
stack trace:
/venv/lib/python3.8/site-packages/fsspec/registry.py:209 in get_filesystem_class
if protocol not in registry:
if protocol not in known_implementations:
raise ValueError("Protocol not known: %s" % protocol)
My code:
# from typing import Tuple
try:
from typing import Annotated
except ImportError:
from typing_extensions import Annotated
import pandas as pd
from flytekit import task, workflow, StructuredDataset, kwtypes
from flytekitplugins.bigquery import BigQueryConfig, BigQueryTask
import google.cloud.bigquery
bigquery_task = BigQueryTask(
name="sql.bigquery.test",
inputs=kwtypes(version=int),
query_template="SELECT * FROM `bigquery-public-data.crypto_dogecoin.transactions` WHERE version = @version LIMIT 2;",
task_config=BigQueryConfig(ProjectID=""),
output_structured_dataset_type=pd.DataFrame
)
@task
def preproc(df: pd.DataFrame) -> None:
print(df.head())
@workflow
def wf(version: int = 1) -> None:
preproc(df = bigquery_task(version=version))
Do you have any idea?