Hi all, new here. I didn't see a beginner's channe...
# ask-the-community
g
Hi all, new here. I didn't see a beginner's channel, so I'll ask what seems like a silly question here. I followed the instructions described here. I used Python 3.10 on MacBook Pro with M2 Pro chipset. I'm getting the following error:
Failed with Unknown Exception <class 'TypeError'> Reason: Encountered error while executing workflow 'example.training_workflow':
Error encountered while executing 'training_workflow':
Failed to convert outputs of task 'example.get_data' at position 0:
[Errno 2] Failed to open local file '/var/folders/vs/v9fd0vyx6b735n5_l607sq440000gn/T/flyte-nl8jmq53/raw/2bb31bdcb6e0b0733d1bad9b87a3e886/00000'. Detail: [errno 2] No such file or directory
So I did a little digging. From the exception stacktrace it seems like the error happens when attempting to write the output. While debugging I found that indeed this folder isn't created. The temp folder
/var/folders/vs/v9fd0vyx6b735n5_l607sq440000gn/T/flyte-nl8jmq53/
does exist, but the
raw
sub folder isn't. I kept debugging until I noticed several things (in reversed order πŸ™‚). The method
flytekit.core.data_persistence.FileAccessProvider.get_random_remote_path
- the default remote is expectedly
LocalFileSystem
.
self._default_protocol
is
"file"
but in that method, this following line returns `('file', 'local')`:
Copy code
default_protocol = self._default_remote.protocol
This causes
get_random_remote_path
to eventually return a path that starts with
file:///...
. Right after getting the random path, in
flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler.encode
these lines happen:
Copy code
if not ctx.file_access.is_remote(uri):
    Path(uri).mkdir(parents=True, exist_ok=True)
is_remote
returns
True
but apparently
Path.mkdir
when operating on
file://...
path, does NOTHING. So the necessary folder hierarchy does not exist and the code fails. Potential root causes: 1.
fsspec
for some reason, returns
('file', 'local')
instead of
file
. 2. Another option, that
Flyte
code in
flytekit.core.data_persistence.FileAccessProvider.get_random_remote_path
Copy code
if type(default_protocol) == list:
            default_protocol = default_protocol[0]
was supposed to test
tuple
condition as well - this would have probably solved the issue 3. Another possiblity is my Mac... maybe
Path
behaves weirdly on Mac
HUH! It seems like that
Path
understands the
file://..
uri as a relative path, where
file:
is a folder! si it essentially created the whole hierarchy under the current working directory!!!
Copy code
~/src/ext/flyte/getting-started
β”œβ”€β”€ __pycache__
β”‚   └── example.cpython-310.pyc
β”œβ”€β”€ example.py
└── file:
    └── var
        └── folders
            └── vs
                └── v9fd0vyx6b735n5_l607sq440000gn
                    └── T
                        └── flyte-acgpkxgo
                            └── raw
                                └── e6a9aed58dcb402830573514c3b21d6a
HUH #2! Seems like
fsspec
changed the
LocalFileSystem
implementation just recently add added the
"local"
essentially converting the
protocol
to a tuple. See here. Interesting... I can create a PR against
Flyte
checking if it's a tuple and taking the first argument. WDYT?
y
thank you for the investigation!
yes please, could you please put in a pr?
g
@Yee turns out the
fsspec
problematic version was published just 2 days ago
I wonder if it broke other things in Flyte as well
y
yeah it has
at least unit tests
if not actual logic
which it looks like it has
g
that's my concern
the fix can be locking down fsspec version, until someone more versed in Flyte than me can take a look
I only today started playing with it
I just have good debugging skills πŸ™‚
y
yup, trying the same in one of our PRs
g
from what I examined
flytekit
it seems like it mostly impacts that specific point so testing if the protocol is either a list or a tuple and then taking the first component should work.
From briefly looking at fsspec code, it seems like that was their intention - the protocol is a list of protocols
y
there’s some edge-case handling code for windows
yeah but not always.
i know s3 is like that though. we’ll be more careful with logic around that
g
should I go ahead and try fixing it, or you want to take it?
tensureflow tests are being run when running
make test
but nothing installs tensorflow. Am I missing something? I'm ran
make setup
and then
make test
ah! it doesn't install tensoreflow when running on arm (M2). Got it.
y
if you want to by all means!
but if not we’ll take it
g
I got it seems like that in most cases (cloud storage classes) you already treat the protocol as a list just the local case seems to be neglected. I'll try to do something
k
I missed it, cc @Yee does the Flyte:// pr affect this
g
@Ketan (kumare3) the new version is failing many unit tests. If your PR passes with the latest
fsspec
I believe you are good to go.
When are you planning to merge and release the next version? (I think you implicitly fixed it)
k
ohh nice
i merged it, there is a new beta already released if you want to try it
1.10.0.b1
a
hi @Ketan (kumare3) where should i download the beta version? i have similar problem
g
@Adhi Setiawan
pip install flytekit==1.10.1b0
a
ohh b0, okay thank you
g
Since it's a beta version, you need to specify the version explicitly. Once a non-beta version is released you can simply run
pip install --upgrade flytekit
to get the latest version
a
thank you @Guy Arad
k
You can also use β€”pre
g
@Ketan (kumare3) apparently someone (maybe @Yee) already pinned
fsspec
version so you might not have fixed the issue. I will rebase on top of your changes and double check
"fsspec>=2023.3.0,<=2023.9.2"
y
yeah i didn’t fix it, i just pinned it.
it was pinned in that the large pr that went in recently
g
seems like it's working with recent changes (they don't rely directly on the file system protocol)
tests pass locally with
fsspec==2023.10.0
y
oh that’s good to know
but still want to be a bit careful when unpinning
g
although that's weird...
y
what’s weird
g
let's me gather some information and I'll add more details
g
Just had this today (first time trying flyte) and
fsspec>=2023.3.0,<=2023.9.2
fixes it for now without having to get a beta version of flytekit
g
@Gauthier Castro yeah, we are aware (the beta version contains this fix as well)
@Yee seems like there's a commented-out test -
Copy code
# def test_transformer_to_literal_localss():
#     random_dir = context_manager.FlyteContext.current_context().file_access.get_random_local_directory()
#     fs = FileAccessProvider(local_sandbox_dir=random_dir, raw_output_prefix=os.path.join(random_dir, "raw"))
#     ctx = context_manager.FlyteContext.current_context()
#     with context_manager.FlyteContextManager.with_context(ctx.with_file_access(fs)) as ctx:
#
#         tf = FlyteDirToMultipartBlobTransformer()
#         lt = tf.get_literal_type(FlyteDirectory)
#         # Can't use if it's not a directory
#         with pytest.raises(FlyteAssertion):
#             p = "/tmp/flyte/xyz"
#             path = pathlib.Path(p)
#             try:
#                 path.unlink()
#             except OSError:
#                 ...
#             with open(p, "w") as fh:
#                 fh.write("hello world\n")
#             tf.to_literal(ctx, FlyteDirectory(p), FlyteDirectory, lt)
This test was failing before. There's another test with a similar name (without the typo)
@Ketan (kumare3) ☝️ why did you comment it out?
k
uhho - i took over the PR from @Yee, let me take a look
178 Views