• Jake Neyer

    Jake Neyer

    7 months ago
    I have a workflow that is successfully writing outputs but the CRD remains in "Workflow Started", the UI shows "Running" and the pod falls into a "NotReady" state. What triggers the CRD to fall into a error or completed status?
  • Pod logs:
    Welcome to Flyte! Version: 0.25.0
    Attempting to run with flytekit.core.python_auto_container.default_task_resolver...
    WARNING:root:No config file provided or invalid flyte config_file_path flytekit.config specified.
    Using user directory /tmp/flyte/20211221_024309/sandbox/local_flytekit/2d933b88aec39e10291d8e3cfc947f63
    No images specified, will use the default image
    Running native-typed task
    /usr/local/lib/python3.8/site-packages/papermill/iorw.py:50: FutureWarning: pyarrow.HadoopFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
      from pyarrow import HadoopFileSystem
    INFO:root:Entering timed context: Copying (<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb> -> /tmp/flyteocmljc6v/local_flytekit/inputs.pb)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb>', '/tmp/flyteocmljc6v/local_flytekit/inputs.pb']':
    b''
    
    ERROR:root:Error from command '['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb>', '/tmp/flyteocmljc6v/local_flytekit/inputs.pb']':
    b'fatal error: Could not connect to the endpoint URL: "<http://minio.flyte:9000/my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb>"\n'
    
    ERROR:root:Exception when trying to execute ['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb>', '/tmp/flyteocmljc6v/local_flytekit/inputs.pb'], reason: Called process exited with error code: 1.  Stderr dump:
    
    b'fatal error: Could not connect to the endpoint URL: "<http://minio.flyte:9000/my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb>"\n'
    INFO:root:Sleeping before retrying again, after 5 seconds
    INFO:root:Retrying again
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb>', '/tmp/flyteocmljc6v/local_flytekit/inputs.pb']':
    b'Completed 38 Bytes/38 Bytes (418 Bytes/s) with 1 file(s) remaining\rdownload: <s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb> to ../tmp/flyteocmljc6v/local_flytekit/inputs.pb\n'
    
    INFO:root:Exiting timed context: Copying (<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/inputs.pb> -> /tmp/flyteocmljc6v/local_flytekit/inputs.pb) [Wall Time: 14.268664408940822s, Process Time: 0.01002349399999991s]
    INFO:root:Hijacking the call for task-type nb-python-task, to call notebook.
    INFO:papermill:Input Notebook:  /root/new.ipynb
    INFO:papermill:Output Notebook: /root/new-out.ipynb
    INFO:blib2to3.pgen2.driver:Generating grammar tables from /usr/local/lib/python3.8/site-packages/blib2to3/Grammar.txt
    INFO:blib2to3.pgen2.driver:Writing grammar tables to /root/.cache/black/21.12b0/Grammar3.8.12.final.0.pickle
    INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmpfzjdnj3l'
    INFO:blib2to3.pgen2.driver:Generating grammar tables from /usr/local/lib/python3.8/site-packages/blib2to3/PatternGrammar.txt
    INFO:blib2to3.pgen2.driver:Writing grammar tables to /root/.cache/black/21.12b0/PatternGrammar3.8.12.final.0.pickle
    INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmp_utliw7_'
    Executing:   0%|          | 0/9 [00:00<?, ?cell/s]INFO:papermill:Executing notebook with kernel: python3
    Executing: 100%|██████████| 9/9 [00:04<00:00,  2.20cell/s]
    INFO:root:Entering timed context: Writing (/root/new-out.ipynb -> <s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/c419877debe7762daf9b0f3eb8aa1048/new-out.ipynb>)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '--acl', 'bucket-owner-full-control', '/root/new-out.ipynb', '<s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/c419877debe7762daf9b0f3eb8aa1048/new-out.ipynb']'>:
    b'Completed 7.5 KiB/7.5 KiB (27.8 KiB/s) with 1 file(s) remaining\rupload: ./new-out.ipynb to <s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/c419877debe7762daf9b0f3eb8aa1048/new-out.ipynb>\n'
    
    INFO:root:Exiting timed context: Writing (/root/new-out.ipynb -> <s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/c419877debe7762daf9b0f3eb8aa1048/new-out.ipynb>) [Wall Time: 0.7355221509933472s, Process Time: 0.005417939999999621s]
    INFO:root:Entering timed context: Writing (/root/new-out.html -> <s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/eec2a9a683603685b7f81fd785bf349c/new-out.html>)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '--acl', 'bucket-owner-full-control', '/root/new-out.html', '<s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/eec2a9a683603685b7f81fd785bf349c/new-out.html']'>:
    b'Completed 256.0 KiB/567.4 KiB (7.8 MiB/s) with 1 file(s) remaining\rCompleted 512.0 KiB/567.4 KiB (15.4 MiB/s) with 1 file(s) remaining\rCompleted 567.4 KiB/567.4 KiB (6.4 MiB/s) with 1 file(s) remaining \rupload: ./new-out.html to <s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/eec2a9a683603685b7f81fd785bf349c/new-out.html>\n'
    
    INFO:root:Exiting timed context: Writing (/root/new-out.html -> <s3://my-s3-bucket/bu/ywo2ujf75w-n0-0/eec2a9a683603685b7f81fd785bf349c/new-out.html>) [Wall Time: 0.5461678958963603s, Process Time: 0.006106751999999993s]
    INFO:root:Entering timed context: Writing (/tmp/flyteocmljc6v/local_flytekit/engine_dir -> <s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/0>)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '--recursive', '--acl', 'bucket-owner-full-control', '/tmp/flyteocmljc6v/local_flytekit/engine_dir', '<s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/0']'>:
    b'Completed 304 Bytes/304 Bytes (5.0 KiB/s) with 1 file(s) remaining\rupload: ../tmp/flyteocmljc6v/local_flytekit/engine_dir/outputs.pb to <s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/0/outputs.pb>\n'
    
    INFO:root:Exiting timed context: Writing (/tmp/flyteocmljc6v/local_flytekit/engine_dir -> <s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/0>) [Wall Time: 0.5930803131777793s, Process Time: 0.005052518999999922s]
    INFO:root:Engine folder written successfully to the output prefix <s3://my-s3-bucket/metadata/propeller/chariot-sdk-test-development-ywo2ujf75w/n0/data/0>
    WARNING:root:No config file provided or invalid flyte config_file_path flytekit.config specified.
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    7 months ago
    Is this running in Sandbox? It looks like it's trying to read/write from minio...
  • Jake Neyer

    Jake Neyer

    7 months ago
    Not running in the sandbox here -- I confirmed the outputs.pb is written properly to the minio instance
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    7 months ago
    Mind running
    pod describe
  • Jake Neyer

    Jake Neyer

    7 months ago
    Hey thanks for the help! Reporting back. The root cause was a sidecar that was remaining up that caused the CRD to staying in "Workflow Started" rather than cleaning up. Mea culpa