Ok, managed to move a bit forward. Had to execute
remote.sync_execution(execution, sync_nodes = True)
in order to be able to get the task input and outputs. I ran into other problem. I get the node execution (
node_execution = execution.node_executions['n3']
) and then, if I get the output using
node_execution.outputs['o0']
then output is
StructuredDataset(uri=None, file_format='parquet')
. To get the actual uri, then I have to use
node_execution.outputs.data['o0'].value.structured_dataset.uri
which returns
<s3://my-s3-bucket/data/ow/f2d443d3adb604c94869-n3-0/26f65254ebd42579e5f67433d38efc01>
. This does not look right, it seems like I am not using the right API. At least, I can then load the data from this URI using boto.