Good morning team! Trying to use the new Structure...
# ask-the-community
s
Good morning team! Trying to use the new StructuredDataset type to read data from BQ but can’t figure it out.
Copy code
sd = StructuredDataset(uri='<bq://sp-one-model.quarterly>_forecast_2022F1.premium_revenue_tab_input_vat')
sd.open(pd.DataFrame).all()

AttributeError: 'NoneType' object has no attribute 'uri'
s
Hi @Stefan Avesand! What’s the flytekit version you’re using and can you share the import line of yours?
s
Hi Samhita! I’m using 1.0.0.
Copy code
from flytekit.types.structured import StructuredDataset
s
Can you send StructuredDataset to the task as an argument?
cc: @Kevin Su
👀 1
s
Like this?
k
something like:
Copy code
@task
def my_task(sd: StructuredDataset) -> StructuredDataset:
    return sd


res = my_task(sd=StructuredDataset(uri='<bq://sp-one-model.quarterly>_forecast_2022F1.premium_revenue_tab_input_vat'))
print(res.open(pd.DataFrame).all())
👍 1
s
That seems to work! I get a permission error due to the wrong service account being used. The GOOGLE_APPLICATION_CREDENTIALS environment variable does not seem to get picked up.
s
Stefan, you can also open and read the dataframe within the task, right @Kevin Su?
k
@Stefan Avesand nice, Could you run some basic example to make sure your google credentials has enough permission to read/write BQ table. Like this one https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python We use google cloud python SDK, and it should automatically pick up GOOGLE_APPLICATION_CREDENTIALS. @Samhita Alla Yeah, we are able to do that.
s
That example works fine:
Copy code
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT *
    FROM `sp-one-model.quarterly_forecast_2022F1.premium_revenue_tab_input_vat`
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print(row)
While the StructuredDataset request fails with: request failed: the user does not have ‘bigquery.readsessions.create’ permission for ‘projects/sp-one-model’
👀 1
k
sorry, let me debug it.
s
Ah, it’s a different service altogether, called the BigQuery Storage Read API
k
Seems like you IAM role doesn’t have
bigquery.readsessions.create
permission
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html?highlight=read_#pandas-read-gbq
Use the BigQuery Storage API to download query results quickly, but at an increased cost. To use this API, first enable it in the Cloud Console. You must also have the bigquery.readsessions.create permission on the project you are billing queries to.
s
Thanks, it’s enabled already though
Added the BigQuery Read Session User role and now it works
👍 1
Might be good to add to the docs?
s
@Smriti Satyan/@Alekhya, can you please add the above to the StructuredDataset API doc?
s
I think a code example would be really helpful as well 🙂 On that notion, is casting between StructuredDataset and dataframes supported, e.g. like this?
a
Sure @Samhita Alla. Will add it in the StructuredDaraset API doc.
s
I think a code example would be really helpful as well
We have a code example already: https://docs.flyte.org/projects/cookbook/en/latest/auto/core/type_system/structured_dataset.html. Will this suffice or are you talking about having a BigQuery-related example?
On that notion, is casting between StructuredDataset and dataframes supported, e.g. like this?
Yes!
s
A code example showing how to create and use a StructuredDataset from a bq, gs or s3 url was my thought.
👍 2
s
Yes, I looked at that example, but it only shows how to ingest data from BQ using BigQueryTask, not StructuredDataset
164 Views