does anyone know which config to change to increas...
# flyte-support
c
does anyone know which config to change to increase the max remote output size?
[0]: Remote output size exceeds max, err: [output file @[s3://TK/metadata/propeller/eval-development-an8t5mxsghcbl5xl7vt4/n4/data/0/0/1/outputs.pb] is too large [227056479] bytes, max allowed [2097152] bytes: remote file exceeds max size]
why is the default
maxDownloadMBs
set so low (2 MBs)? what are the implications of increasing this to be several hundred gigabytes?
only recommendation I could find online is:
Yes check out propeller config. But preferably offload large lists etc
https://discuss.flyte.org/t/9888328/hi-there-do-we-have-any-options-to-increase-limits-failed-at?t=v2 not sure I understand what offload means in this context. is it suggesting not to return large amounts of data from tasks that are stored in the intermediate storage?
the user that ran into this error here is using
map_task
like so:
Copy code
# 3. For each day's DailyDataPayload, process them concurrently.
list_of_list_of_results_promise = map_task(process_day_level_files_wf, concurrency=10)(
    daily_payload=daily_payloads_promise, # Pass the list of DailyDataPayload objects
    action_to_find=repeated_actions_promise
)
and the type definition of the task being mapped (
process_day_level_files_wf
) is:
Copy code
@task(container_image=custom_image)
def process_day_level_files_wf(
    daily_payload: DailyDataPayload,
    action_to_find: str
) -> List[SingleFileResult]:
where
SingleFileResult
is a dataclass defined as:
Copy code
@dataclass
class SingleFileResult:
    daily_prefix_for_grouping: str
    matched_json_objects: List[Any]
I'm trying to read through this article but I still don't quite understand what needs to be tweaked here: https://www.union.ai/docs/flyte/user-guide/data-input-output/task-input-and-output/#metadata-and-raw-data
I see the workflow is using lots of
List[str]
and
List[int]
-- perhaps that is getting saved as metadata instead of a pointer to raw data?
Can someone point me to the code where we determine whether a type is considered primitive or complex?: > Primitive values (
int
,
str
, etc.) are stored directly in the metadata store, while complex data objects (
pandas.DataFrame
,
FlyteFile
, etc.) are stored by reference, with the reference pointer in the metadata store and the actual data in the raw data store.
f
This can be increased. But this is just metadata. If you are using files, directories or dataframes that get offloaded and do not impact the metadata - primitives and in lines lists do impact it
c
thanks @freezing-airport-6809! can you share a code pointer to how the decision to store something in metadata vs offloading to blob store works? like where is the if statement that checks for “is primative”?
f
@curved-whale-1505 good question, there is a doc for this - https://docs-legacy.flyte.org/en/latest/user_guide/concepts/main_concepts/data_management.html#divedeep-data-management cc @powerful-gold-59386 in the new docs structure i do not seem to find this concept - can you help?