aloof-painting-18735
02/19/2025, 8:11 AM{"asctime": "2025-02-14 17:05:58,888", "name": "flytekit", "levelname": "ERROR", "message": "Trace:\n\n Traceback (most recent call last):\n File \"/databricks/python/lib/python3.10/site-packages/flytekit/bin/entrypoint.py\", line 179, in _dispatch_execute\n outputs = task_def.dispatch_execute(ctx, idl_input_literals)\n File \"/databricks/python/lib/python3.10/site-packages/flytekit/core/base_task.py\", line 728, in dispatch_execute\n new_user_params = self.pre_execute(ctx.user_space_params)\n File \"/databricks/python/lib/python3.10/site-packages/flytekitplugins/spark/task.py\", line 209, in pre_execute\n shutil.make_archive(file_name, file_format, os.getcwd())\n File \"/usr/lib/python3.10/shutil.py\", line 1124, in make_archive\n filename = func(base_name, base_dir, **kwargs)\n File \"/usr/lib/python3.10/shutil.py\", line 1009, in _make_zipfile\n zf.write(path, arcname)\n File \"/usr/lib/python3.10/zipfile.py\", line 1754, in write\n zinfo = ZipInfo.from_file(filename, arcname,\n File \"/usr/lib/python3.10/zipfile.py\", line 523, in from_file\n zinfo = cls(arcname, date_time)\n File \"/usr/lib/python3.10/zipfile.py\", line 366, in __init__\n raise ValueError('ZIP does not support timestamps before 1980')\n ValueError: ZIP does not support timestamps before 1980\n\nMessage:\n\n ValueError: ZIP does not support timestamps before 1980"}
{"asctime": "2025-02-14 17:05:58,891", "name": "flytekit", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
Obviously, passing strict_timestamps = False
to zipfile.ZipFile
call would do the trick, but as I understand it, flytekitplugins / spark relies on shutil.make_archive
which still does not support the strict_timestamp
param (see this open PR).
I have also seen this open Flyte issue: https://github.com/flyteorg/flyte/issues/4711 (that's about removing datetime metadata from files) - that would probably solve the problem too.
Anyway all these issues are open for a while. Do you have any recommendations how we can use fast registration with flytekit 1.14.6
and Spark
?freezing-airport-6809
freezing-airport-6809
aloof-painting-18735
02/19/2025, 3:31 PMSpark
task on Databricks
(using flytekit 1.14.6) - the entrypoint.py fails with this error:
{"asctime": "2025-02-14 17:05:58,888", "name": "flytekit", "levelname": "ERROR", "message": "Trace:\n\n Traceback (most recent call last):\n File \"/databricks/python/lib/python3.10/site-packages/flytekit/bin/entrypoint.py\", line 179, in _dispatch_execute\n outputs = task_def.dispatch_execute(ctx, idl_input_literals)\n File \"/databricks/python/lib/python3.10/site-packages/flytekit/core/base_task.py\", line 728, in dispatch_execute\n new_user_params = self.pre_execute(ctx.user_space_params)\n File \"/databricks/python/lib/python3.10/site-packages/flytekitplugins/spark/task.py\", line 209, in pre_execute\n shutil.make_archive(file_name, file_format, os.getcwd())\n File \"/usr/lib/python3.10/shutil.py\", line 1124, in make_archive\n filename = func(base_name, base_dir, **kwargs)\n File \"/usr/lib/python3.10/shutil.py\", line 1009, in _make_zipfile\n zf.write(path, arcname)\n File \"/usr/lib/python3.10/zipfile.py\", line 1754, in write\n zinfo = ZipInfo.from_file(filename, arcname,\n File \"/usr/lib/python3.10/zipfile.py\", line 523, in from_file\n zinfo = cls(arcname, date_time)\n File \"/usr/lib/python3.10/zipfile.py\", line 366, in __init__\n raise ValueError('ZIP does not support timestamps before 1980')\n ValueError: ZIP does not support timestamps before 1980\n\nMessage:\n\n ValueError: ZIP does not support timestamps before 1980"}
{"asctime": "2025-02-14 17:05:58,891", "name": "flytekit", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
aloof-painting-18735
02/19/2025, 3:38 PMpyflyte register
to fast register a workflow strips the datetime metadata from all the files that are packaged into the .tar.gz
file (see here)
AND
• flytekitplugins-spark relies on shutil.make_archive
which does not support the strict_timestamp
param for zipfile.ZipFile
(PR opened, but still not merged)aloof-painting-18735
02/19/2025, 3:39 PMaverage-finland-92144
02/19/2025, 4:27 PMpyflyte register --copy none
to skip zipping and uploading the serialized code. It effectively disables fast registration but would let you then run the workflow from UI.
Also there's this workaround in ZipFile that you could try and let us know the result
In any case, agree with ketan that fast registration for Spark has been out for a while (added here) so curious to learn if this is an enhancement you'd be open to contributealoof-painting-18735
02/19/2025, 4:51 PMshutil.make_archive
with zipfile.*ZipFile(...,* strict_timestamps=False)
here would probably do the job.
We're open to contribute, let us check this and open a PR.