Hi everyone, I’m running a Flyte task that includes
scipy.optimize.minimize
, but it behaves unexpectedly:
• When I run the script
locally on my computer (using
python script.py
), it completes in minutes.
• When I run the
Flyte workflow on the cloud (deployed on Flyte-binary on EKS), it
gets stuck for hours between Step 1 and Step 2 even
before reaching the
minimize
function.
• If I
comment out minimize
, the workflow runs fast. Why does commenting out minimize make the execution between Step 1 and Step 2 significantly faster?
from scipy.optimize import minimize
# Cost function (simplified)
def cost_function(thetas):
# Some tensor contractions and numpy ops here
return np.linalg.norm(thetas) # Simplified
@task(
requests=Resources(cpu="32", mem="64Gi"),
limits=Resources(cpu="32", mem="64Gi"),
)
def my_flyte_task():
# print("Step 1")
# Some preprocessing here
# print("Step 2") # The workflow gets stuck **before this line**
result = minimize(cost_function, x0, args=(arguments), tol=1e-6, options=options)
@workflow
def my_flyte_wf():
my_flyte_task()
if __name__ == "__main__":
my_flyte_wf()
Any help is appreciated! 🙏