Thread for usability fixes <@U07SB8L1F6C> <@U05FFA...
# slurm-flyte-wg
e
Thread for usability fixes @creamy-shampoo-53278 @damp-lion-88352 🧵
Hey gents, getting back into this in earnest. I'm still trying to get the basic case of script tasks to run in a wf. Latency issues were causing the wf to fail so I added some retries and raised this draft PR to capture more. It's a naive implementation but unblocked me, the error is below, and caused by the output file being specified in the job but not actually existing yet
Copy code
17:53:00.447844 INFO     utils.py:341 - AsyncTranslate literal to python value. [Time: 0.000002s]                                                                                                                  
17:53:00.448361 INFO     utils.py:341 - Translate literal to python value. [Time: 0.000537s]                                                                                                                       
17:53:00.448723 INFO     base_task.py:752 - Invoking slurm-task with inputs: {}                                                                                                                                    
17:53:00.449243 INFO     ssh_utils.py:164 - SSH connection key not found, creating new connection                                                                                                                  
17:53:04.028756 INFO     Re-using new connection                                                                                                                                                   ssh_utils.py:171
17:53:04.712393 INFO     Successfully read stdout file: /home/ubuntu/slurm-31.out                                                                                                                      agent.py:122

17:53:04.715987 INFO     Execute user level code. [Time: 4.266911s]                                                                                                                                    utils.py:341
17:53:04.717645 INFO     Translate the output to literals. [Time: 0.000016s]                                                                                                                           utils.py:341
17:53:04.718477 INFO     dispatch execute. [Time: 0.000988s]                                                                                                                                           utils.py:341
17:53:04.720366 INFO     AsyncTranslate literal to python value. [Time: 0.000003s]                                                                                                                     utils.py:341
17:53:04.721160 INFO     Translate literal to python value. [Time: 0.000805s]                                                                                                                          utils.py:341
17:53:04.721763 INFO     Invoking slurm-task with inputs: {}                                                                                                                                       base_task.py:752
17:53:05.060839 INFO     Re-using new connection                                                                                                                                                   ssh_utils.py:171
17:53:06.753449 INFO     Re-using new connection                                                                                                                                                   ssh_utils.py:171
17:53:07.438310 INFO     Failed to read stdout file: /home/ubuntu/slurm-32.out. Will retry 2 more times.                                                                                               agent.py:119
17:53:12.790252 INFO     Failed to read stdout file: /home/ubuntu/slurm-32.out. Will retry 1 more times.                                                                                               agent.py:119
17:53:18.144389 INFO     Failed to read stdout file: /home/ubuntu/slurm-32.out. Will retry 0 more times.                                                                                               agent.py:119
17:53:24.496725 INFO     Re-using new connection                                                                                                                                                   ssh_utils.py:171
17:53:25.182557 INFO     Failed to read stdout file: /home/ubuntu/slurm-32.out. Will retry 2 more times.                                                                                               agent.py:119
17:53:30.539936 INFO     Failed to read stdout file: /home/ubuntu/slurm-32.out. Will retry 1 more times.                                                                                               agent.py:119
17:53:35.888570 INFO     Successfully read stdout file: /home/ubuntu/slurm-32.out                                                                                                                      agent.py:122
17:53:37.238473 INFO     Re-using new connection                                                                                                                                                   ssh_utils.py:171
17:53:37.922402 INFO     Successfully read stdout file: /home/ubuntu/slurm-32.out                                                                                                                      agent.py:122

17:53:37.925383 INFO     Execute user level code. [Time: 33.203176s]                                                                                                                                   utils.py:341
17:53:37.926798 INFO     Translate the output to literals. [Time: 0.000020s]                                                                                                                           utils.py:341
17:53:37.928263 INFO     dispatch execute. [Time: 0.001667s]
Also, should we add @creamy-shampoo-53278’s Flyte-Demos as examples / functional tests alongside the plugin itself?