<#3339 [BUG] Structured Dataset BQ decoder only re...
# flytekit
c
#3339 [BUG] Structured Dataset BQ decoder only reads partial data for big datasets (&gt;100MB) Issue created by gitgraghu Describe the bug The Structured Dataset BigQuery decoder reads only partial data for big datasets (>100MB). Currently the read session is not configured with a max stream setting. Due to this, for larger datasets, multiple streams will be used to read the data. However the BQ decoder only reads the first stream as shown in code below:
Copy code
read_session = client.create_read_session(parent=parent, read_session=requested_session)
    stream = read_session.streams[0]
    reader = client.read_rows(stream.name)
    frames = []
    for message in reader.rows().pages:
        frames.append(message.to_dataframe())
    return pd.concat(frames)
Expected behavior The BQ decoder should be able to read all data for bigger datasets as well. This decoder code needs to be changed to either set the max stream to 1 in the read session or read from all streams as below snippet:
Copy code
frames = []
   for stream in read_session.streams:
       reader = client.read_rows(stream.name)
       for message in reader.rows().pages:
           frames.append(message.to_dataframe())
   if len(frames) > 0:
       df = pd.concat(frames)
   else:
       schema = pyarrow.ipc.read_schema(
           pyarrow.py_buffer(read_session.arrow_schema.serialized_schema)
       )
       df = schema.empty_table().to_pandas()
   return df
Additional context to reproduce No response Screenshots No response Are you sure this issue hasn't been raised already? • Yes Have you read the Code of Conduct? • Yes flyteorg/flyte