Hi <#CP2HDHKE1|>, We are testing the new flytekit...
# flyte-support
a
Hi #CP2HDHKE1, We are testing the new flytekit (1.14.6) with Databricks plugin, the entrypoint.py is failing with this error on Databricks side:
Copy code
{"asctime": "2025-02-14 17:05:58,888", "name": "flytekit", "levelname": "ERROR", "message": "Trace:\n\n    Traceback (most recent call last):\n      File \"/databricks/python/lib/python3.10/site-packages/flytekit/bin/entrypoint.py\", line 179, in _dispatch_execute\n        outputs = task_def.dispatch_execute(ctx, idl_input_literals)\n      File \"/databricks/python/lib/python3.10/site-packages/flytekit/core/base_task.py\", line 728, in dispatch_execute\n        new_user_params = self.pre_execute(ctx.user_space_params)\n      File \"/databricks/python/lib/python3.10/site-packages/flytekitplugins/spark/task.py\", line 209, in pre_execute\n        shutil.make_archive(file_name, file_format, os.getcwd())\n      File \"/usr/lib/python3.10/shutil.py\", line 1124, in make_archive\n        filename = func(base_name, base_dir, **kwargs)\n      File \"/usr/lib/python3.10/shutil.py\", line 1009, in _make_zipfile\n        zf.write(path, arcname)\n      File \"/usr/lib/python3.10/zipfile.py\", line 1754, in write\n        zinfo = ZipInfo.from_file(filename, arcname,\n      File \"/usr/lib/python3.10/zipfile.py\", line 523, in from_file\n        zinfo = cls(arcname, date_time)\n      File \"/usr/lib/python3.10/zipfile.py\", line 366, in __init__\n        raise ValueError('ZIP does not support timestamps before 1980')\n    ValueError: ZIP does not support timestamps before 1980\n\nMessage:\n\n    ValueError: ZIP does not support timestamps before 1980"}
{"asctime": "2025-02-14 17:05:58,891", "name": "flytekit", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
Obviously, passing
strict_timestamps = False
to
zipfile.ZipFile
call would do the trick, but as I understand it, flytekitplugins / spark relies on
shutil.make_archive
which still does not support the
strict_timestamp
param (see this open PR). I have also seen this open Flyte issue: https://github.com/flyteorg/flyte/issues/4711 (that's about removing datetime metadata from files) - that would probably solve the problem too. Anyway all these issues are open for a while. Do you have any recommendations how we can use fast registration with
flytekit 1.14.6
and
Spark
?
f
We have been using fast registration with Spark
I would love to understand what problem do you see
a
When we are running
Spark
task on
Databricks
(using flytekit 1.14.6) - the entrypoint.py fails with this error:
Copy code
{"asctime": "2025-02-14 17:05:58,888", "name": "flytekit", "levelname": "ERROR", "message": "Trace:\n\n    Traceback (most recent call last):\n      File \"/databricks/python/lib/python3.10/site-packages/flytekit/bin/entrypoint.py\", line 179, in _dispatch_execute\n        outputs = task_def.dispatch_execute(ctx, idl_input_literals)\n      File \"/databricks/python/lib/python3.10/site-packages/flytekit/core/base_task.py\", line 728, in dispatch_execute\n        new_user_params = self.pre_execute(ctx.user_space_params)\n      File \"/databricks/python/lib/python3.10/site-packages/flytekitplugins/spark/task.py\", line 209, in pre_execute\n        shutil.make_archive(file_name, file_format, os.getcwd())\n      File \"/usr/lib/python3.10/shutil.py\", line 1124, in make_archive\n        filename = func(base_name, base_dir, **kwargs)\n      File \"/usr/lib/python3.10/shutil.py\", line 1009, in _make_zipfile\n        zf.write(path, arcname)\n      File \"/usr/lib/python3.10/zipfile.py\", line 1754, in write\n        zinfo = ZipInfo.from_file(filename, arcname,\n      File \"/usr/lib/python3.10/zipfile.py\", line 523, in from_file\n        zinfo = cls(arcname, date_time)\n      File \"/usr/lib/python3.10/zipfile.py\", line 366, in __init__\n        raise ValueError('ZIP does not support timestamps before 1980')\n    ValueError: ZIP does not support timestamps before 1980\n\nMessage:\n\n    ValueError: ZIP does not support timestamps before 1980"}
{"asctime": "2025-02-14 17:05:58,891", "name": "flytekit", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
If I'm right, that's because: • Using
pyflyte register
to fast register a workflow strips the datetime metadata from all the files that are packaged into the
.tar.gz
file (see here) AND • flytekitplugins-spark relies on
shutil.make_archive
which does not support the
strict_timestamp
param for
zipfile.ZipFile
(PR opened, but still not merged)
cc @full-toddler-5766
a
not sure if it matches what you're looking for but you could try
pyflyte register --copy none
to skip zipping and uploading the serialized code. It effectively disables fast registration but would let you then run the workflow from UI. Also there's this workaround in ZipFile that you could try and let us know the result In any case, agree with ketan that fast registration for Spark has been out for a while (added here) so curious to learn if this is an enhancement you'd be open to contribute
a
@average-finland-92144 thanks for the hints. Yeah, I think that workaround could work - replacing
shutil.make_archive
with
zipfile.*ZipFile(...,* strict_timestamps=False)
here would probably do the job. We're open to contribute, let us check this and open a PR.