https://flyte.org logo
#ask-the-community
Title
# ask-the-community
s

Stefan Avesand

05/04/2022, 12:32 PM
Good morning team! Trying to use the new StructuredDataset type to read data from BQ but can’t figure it out.
Copy code
sd = StructuredDataset(uri='<bq://sp-one-model.quarterly>_forecast_2022F1.premium_revenue_tab_input_vat')
sd.open(pd.DataFrame).all()

AttributeError: 'NoneType' object has no attribute 'uri'
s

Samhita Alla

05/04/2022, 1:23 PM
Hi @Stefan Avesand! What’s the flytekit version you’re using and can you share the import line of yours?
s

Stefan Avesand

05/04/2022, 1:29 PM
Hi Samhita! I’m using 1.0.0.
Copy code
from flytekit.types.structured import StructuredDataset
s

Samhita Alla

05/04/2022, 1:39 PM
Can you send StructuredDataset to the task as an argument?
cc: @Kevin Su
👀 1
s

Stefan Avesand

05/04/2022, 1:53 PM
Like this?
k

Kevin Su

05/04/2022, 1:59 PM
something like:
Copy code
@task
def my_task(sd: StructuredDataset) -> StructuredDataset:
    return sd


res = my_task(sd=StructuredDataset(uri='<bq://sp-one-model.quarterly>_forecast_2022F1.premium_revenue_tab_input_vat'))
print(res.open(pd.DataFrame).all())
👍 1
s

Stefan Avesand

05/04/2022, 2:05 PM
That seems to work! I get a permission error due to the wrong service account being used. The GOOGLE_APPLICATION_CREDENTIALS environment variable does not seem to get picked up.
s

Samhita Alla

05/04/2022, 2:05 PM
Stefan, you can also open and read the dataframe within the task, right @Kevin Su?
k

Kevin Su

05/04/2022, 2:12 PM
@Stefan Avesand nice, Could you run some basic example to make sure your google credentials has enough permission to read/write BQ table. Like this one https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python We use google cloud python SDK, and it should automatically pick up GOOGLE_APPLICATION_CREDENTIALS. @Samhita Alla Yeah, we are able to do that.
s

Stefan Avesand

05/04/2022, 2:27 PM
That example works fine:
Copy code
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT *
    FROM `sp-one-model.quarterly_forecast_2022F1.premium_revenue_tab_input_vat`
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print(row)
While the StructuredDataset request fails with: request failed: the user does not have ‘bigquery.readsessions.create’ permission for ‘projects/sp-one-model’
👀 1
k

Kevin Su

05/04/2022, 2:56 PM
sorry, let me debug it.
s

Stefan Avesand

05/04/2022, 3:02 PM
Ah, it’s a different service altogether, called the BigQuery Storage Read API
k

Kevin Su

05/04/2022, 3:14 PM
Seems like you IAM role doesn’t have
bigquery.readsessions.create
permission
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html?highlight=read_#pandas-read-gbq
Use the BigQuery Storage API to download query results quickly, but at an increased cost. To use this API, first enable it in the Cloud Console. You must also have the bigquery.readsessions.create permission on the project you are billing queries to.
s

Stefan Avesand

05/04/2022, 3:34 PM
Thanks, it’s enabled already though
Added the BigQuery Read Session User role and now it works
👍 1
Might be good to add to the docs?
s

Samhita Alla

05/04/2022, 4:50 PM
@Smriti Satyan/@Alekhya, can you please add the above to the StructuredDataset API doc?
s

Stefan Avesand

05/04/2022, 6:06 PM
I think a code example would be really helpful as well 🙂 On that notion, is casting between StructuredDataset and dataframes supported, e.g. like this?
a

Alekhya

05/04/2022, 6:58 PM
Sure @Samhita Alla. Will add it in the StructuredDaraset API doc.
s

Samhita Alla

05/05/2022, 5:48 AM
I think a code example would be really helpful as well
We have a code example already: https://docs.flyte.org/projects/cookbook/en/latest/auto/core/type_system/structured_dataset.html. Will this suffice or are you talking about having a BigQuery-related example?
On that notion, is casting between StructuredDataset and dataframes supported, e.g. like this?
Yes!
s

Stefan Avesand

05/05/2022, 12:04 PM
A code example showing how to create and use a StructuredDataset from a bq, gs or s3 url was my thought.
👍 2
s

Stefan Avesand

05/05/2022, 2:58 PM
Yes, I looked at that example, but it only shows how to ingest data from BQ using BigQueryTask, not StructuredDataset
77 Views