One of the tests on GA is failing due to pytest no...
# hacktoberfest-2022
c
One of the tests on GA is failing due to pytest not being recognized as a command on Windows build (Python 3.7 -spark2) (https://github.com/flyteorg/flytekit/actions/runs/3326738618/jobs/5511139310#step:6:12)
e
If you scroll up, the previous action (install dependencies) is failing to install
ipython==8.5.0
it looks like ipython 8+ dropped support for python 3.7
c
@Eduardo Apolinario (eapolinario) Okay but I've rest of the tests failing due to the previous discussion we'd in here (https://flyte-org.slack.com/archives/CREL4QVAQ/p1666541751922589) Why exactly is it happening for me?
Is it because my serialized file is not present in the local path?
e
The comment I left still applies. The call to
put_data
should take a path to the file you wrote, but right now it's pointing to the newly created directory.
c
Okay, but I've made the changes left in the comment in my local repo, but still I get the same error with my tests
e
what error are you seeing after that change? Also, can you push the change to the PR?
c
Sure I'll be doing it now, just in few minutes
I don't think, I've to re-run the setup as we did pip in "-e" mode
Is the error code 1 at line 43 of Makefile related to file permissions?
e
you can repro this locally by running
make lint
(or by running
pre-commit run --all-files
)
c
tf.io.write_file
doesn't support paths, so I did a
os.chdir(str(local_path))
, but that didn't work out @Eduardo Apolinario (eapolinario)
e
how are you invoking
tf.io.write_file
with a path? What error do you see if you pass a path?
c
I mean it only accepts a filename, so my idea is to change to the directory (local_path) and then do tf.io.write_file to save the file
e
oh, I see. What I was suggesting was to change that code to:
Copy code
-        filename = "tensor_data"
+        filename = os.path.join(local_path, "tensor_data")
         tf.io.write_file(filename, tf.io.serialize_tensor(python_val))

         remote_path = ctx.file_access.get_random_remote_path(local_path)
-        ctx.file_access.put_data(local_path, remote_path, is_multipart=False)
+        ctx.file_access.put_data(filename, remote_path, is_multipart=False)
c
Let me quickly test that
It's throwing a file not found error
e
which file?
c
the line in tests
lv = <http://tf.to|tf.to>_literal(ctx, _python_val_, type(_python_val_), lt)
raises the error
Copy code
tests/flytekit/unit/extras/tensorflow/test_transformations.py:56:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
flytekit/extras/tensorflow/tensor/tensor_transformer.py:58: in to_literal
    ctx.file_access.put_data(local_path, remote_path, is_multipart=False)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <flytekit.core.data_persistence.FileAccessProvider object at 0x7fd612f9a520>
local_path = '/tmp/flyte-4tpz88yv/sandbox/local_flytekit/b7b927f373aed9b6b10c2e32f318c016'
remote_path = '/tmp/flyte-4tpz88yv/raw/ac09632c1597fcbf7684bad8698fc46f/b7b927f373aed9b6b10c2e32f318c016'
is_multipart = False

    def put_data(self, local_path: Union[str, os.PathLike], remote_path: str, is_multipart=False):
        """
        The implication here is that we're always going to put data to the remote location, so we .remote to ensure
        we don't use the true local proxy if the remote path is a file://

        :param Text local_path:
        :param Text remote_path:
        :param bool is_multipart:
        """
        try:
            with PerformanceTimer(f"Writing ({local_path} -> {remote_path})"):
                DataPersistencePlugins.find_plugin(remote_path)(data_config=self.data_config).put(
                    local_path, remote_path, recursive=is_multipart
                )
        except Exception as ex:
>           raise FlyteAssertion(
                f"Failed to put data from {local_path} to {remote_path} (recursive={is_multipart}).\n\n"
                f"Original exception: {str(ex)}"
            ) from ex
E           flytekit.exceptions.user.FlyteAssertion: Failed to put data from /tmp/flyte-4tpz88yv/sandbox/local_flytekit/b7b927f373aed9b6b10c2e32f318c016 to /tmp/flyte-4tpz88yv/raw/ac09632c1597fcbf7684bad8698fc46f/b7b927f373aed9b6b10c2e32f318c016 (recursive=False).
E
E           Original exception: [Errno 2] No such file or directory: '/tmp/flyte-4tpz88yv/sandbox/local_flytekit/b7b927f373aed9b6b10c2e32f318c016'

flytekit/core/data_persistence.py:455: FlyteAssertion
=================================== short test summary info ====================================FAILED tests/flytekit/unit/extras/tensorflow/test_transformations.py::test_to_python_value_and_literal[transformer0-Tensor-TensorflowTensor-python_val0]
Copy code
E           FileNotFoundError: [Errno 2] No such file or directory: '/tmp/flyte-4tpz88yv/sandbox/local_flytekit/b7b927f373aed9b6b10c2e32f318c016'

/usr/lib/python3.8/shutil.py:261: FileNotFoundError
I tried looking in the above mentioned directory and couldn't find the file
e
the
- / +
at the beginning of each row are from git diff
c
yes I did make the changes
I don't understand why the file is not being created
and coming to the make lint failure, these are the errors on my local repo
Copy code
# Exclude setup.py to fix error: Duplicate module named "setup"
mypy plugins --exclude setup.py || true
flytekit-kf-mpi is not a valid Python package name
pre-commit run --all-files
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/flytekit/unit/extras/tensorflow/test_transformations.py:1:1: F401 'numpy as np' imported but unused
flytekit/extras/tensorflow/tensor/tensor_transformer.py:52:9: E265 block comment should start with '# '

black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted flytekit/extras/tensorflow/tensor/tensor_transformer.py

All done! ✨ 🍰 ✨
1 file reformatted, 525 files left unchanged.

isort....................................................................Passed
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
shellcheck...............................................................Failed
- hook id: shellcheck
- exit code: 1
@Eduardo Apolinario (eapolinario) Thank you, I made a silly mistake while implementing your suggested changes. I still have one more error to fix in my tests.
@Eduardo Apolinario (eapolinario) I don't want to use get_random_local_path, is there a way to load from existing path?
e
Can you say more? Why would you want to load from existing path? What if your task returns multiple objects of the tf type?
c
@Eduardo Apolinario (eapolinario) I did something like this in
to_literal
of my tensor.py
Copy code
global local_file_path
        local_file_path = os.path.join(local_path, "tensor_data")
        tf.io.write_file(local_file_path, tf.io.serialize_tensor(python_val))

        remote_path = ctx.file_access.get_random_remote_path(local_file_path)
        ctx.file_access.put_data(local_file_path, remote_path, is_multipart=False)
        return Literal(scalar=Scalar(blob=Blob(metadata=meta, uri=remote_path)))
and I did
Copy code
def to_python_value(self, ctx: FlyteContext, lv: Literal, python_val: T, expected_python_type: Type[T]) -> T:
        try:
            uri = lv.scalar.blob.uri
        except AttributeError:
            TypeTransformerFailedError(f"Cannot convert from {lv} to {expected_python_type}")

        local_path = ctx.file_access.get_random_local_path()
        ctx.file_access.get_data(uri, local_path, is_multipart=False)

        #local_file_path = os.path.join(local_path, "tensor_data")
        read_serial = tf.io.read_file(local_file_path, name=None)

        return tf.io.parse_tensor(read_serial, out_type=python_val.dtype, name=None)
e
I see, but why do you need this?
c
I was getting an error as local_file_path was getting set to some new random path in to_python function, so I made it global in the previous function to keep it @Eduardo Apolinario (eapolinario)
e
@cryptic, you have to keep in mind that usually those pairs of functions
to_literal / to_python_value
are run in separate containers, so it doesn't make sense to use global variables to store state. What error are you seeing in your test? Also, keep in mind that we have several examples of type transformers that do something along the lines of what you're doing, so you could mimic exactly what they are doing.
c
@Eduardo Apolinario (eapolinario) Wouldn't a call to random_local_path again in
to_python_value
pick a new random path? The error that I see is it is trying to load from a new random path in
to_python_value
function and not from the one that was created by
to_literal
as an example I created my serialized object in suppose say
/tmp/sandbox/flytekit/xhjdhiihghj/tensor_data
and in the next function it is trying to load from a different path eg.
/tmp/sandbox/flytekit/cxzzdffghjjk/tensor_data
but
cxzzdffghjjk
happens to be a file and not a folder and throws me the error "Not a file"
157 Views