Hi, I am running a workflow locally on my machine ...
# ask-the-community
n
Hi, I am running a workflow locally on my machine using
pyflyte run
or
python -m
but, I am getting this error -
Copy code
2023-06-20 17:04:51,354876 ERROR    {"asctime": "2023-06-20 17:04:51,354", "name": "flytekit", "levelname":  base_task.py:587
                                    "ERROR", "message": "Failed to convert outputs of task 'read_dataset' at                 
                                    position 0:\n  [Errno 28] Error writing bytes to file. Detail: [errno                    
                                    28] No space left on device"}
Is there a virtual directory or disk flyte creates when running on a local machine which I can purge? The data for training is being pulled from s3.
c
I've seen this error when I tried
flytectl demo start
command then I purged all the unused docker images, containers, volumes, etc. then error disappeared
n
I am not using any docker image at the moment, I mean I am not passing in an image with
--image
. I am just running with something like
pyflyte run sample.py sample_wf
.
k
hmm, are you using image spec in the workflow?
n
No, no image spec in my workflow.
Looks like this is happening when creating a StructuredDataset -
Copy code
❱  865 │   │   lv = transformer.to_literal(ctx, python_val, python_type, expected)
 sd = StructuredDataset(dataframe=python_val, metadata=meta)    
 600 │   │   return self.encode(ctx, sd, python_type, protocol, fmt, sdt)    
627 │   │   sd_model = handler.encode(ctx, sd, structured_literal_type) 
53 │   │   df.to_parquet(                                                       │                 
                                    │    54 │   │   │   path,                                                            │                 
                                    │    55 │   │   │   coerce_timestamps="us",                                          │                 
                                    │    56 │   │   │   allow_truncated_timestamps=False,  
 2976 │   │   return to_parquet(                                                 │                 
                                    │    2977 │   │   │   self,                                                          │                 
                                    │    2978 │   │   │   path,                                                          │                 
                                    │    2979 │   │   │   engine,  

 430 │   impl.write(                                                              │                 
                                    │   431 │   │   df,                                                                  │                 
                                    │   432 │   │   path_or_buf,                                                         │                 
                                    │   433 │   │   compression=compression,  
204 │   │   │   │   self.api.parquet.write_table(                                │                 
                                    │   205 │   │   │   │   │   table, path_or_handle, compression=compression, **kwargs │                 
                                    │   206 │   │   │   │   )         
2985 │   │   │   writer.write_table(table, row_group_size=row_group_size) 
1054 │   │   self.writer.write_table(table, row_group_size=row_group_size)
I think the StructuredDataset was being created multiple times, i.e the writing to parquet files. So I moved that to a separate task and added a
cache=True
But now I am getting this error -
*FlyteScopedUserException:* database or disk is full
What local database does flyte use and is there a way to flush it? I am running on my mac and I don't think my disk is full 🙂
k
are you able to share your workflow code
we didn’t run any database when you run workflow locally
223 Views