I got freezing when try to register my workflow pa...
# ask-the-community
r
I got freezing when try to register my workflow package to flyte binary. The process lasts about 30 minutes until it is killed. Here is my enviroment i use in my project, I pull the image ghcr.io/flyteorg/flytekit:py3.9-latest then install external requirements:
Copy code
flytekit>=1.5.0
pandas~=1.5.3
scikit-learn

requests~=2.31.0
numpy~=1.24.4
torch~=2.1.1
albumentations~=1.3.1
torchvision~=0.16.1
boto3~=1.28.64
dataclasses~=0.6
pillow~=10.1.0
deeplake~=3.8.9
matplotlib~=3.7.3
tensorboardx~=2.6.2.2
tqdm~=4.66.1
I have run my project locally successfully. The process registration didn't log any information about this for me to trace the problem.
l
Can you share the code with us? I can take a look at it
Can you use
pyflyte run --remote
?
r
@L godlike Sure, Here is my repo. https://github.com/gone2808/flyte-train
@L godlike I have to register the package from root because i have the external import from outside workflow file.
pyflyte run --remote is just register 1 target file contain the workflow
@L godlike I have test successfuly register and run workflow with sandbox, this problem is on Flyte binary
l
Ok will take a look today Thank you
y
what did you mean by this @Ryuu
I have test successfuly register and run workflow with sandbox, this problem is on Flyte binary
flyte sandbox == flyte binary
r
@Yee flyte sandbox is use ' flytectl demo start' while flyte binary i deploy by helm chart ( flyte/charts/flyte-binary )
y
oh got it… you have a separate deployment of flyte
is it on eks?
r
@Yee yes, I have already run task remote on its
y
and simpler things like hello world work?
r
@Yee yes, for sure. I can run remote. And all re trigger the task on UI which task have been registered when i use pyflyte run --remote ...
y
so this code registers against sandbox, but it hangs when registering against your eks deployment.
weird.
can you
export FLYTE_SDK_LOGGING_LEVEL=10
and run the register command again?
r
@Yee
Copy code
admin@admin:/mnt/data1/hainq/flyte-train$ export FLYTE_SDK_LOGGING_LEVEL=10
admin@admin:/mnt/data1/hainq/flyte-train$ pyflyte register ./
2023-12-29 09:08:55,633908 INFO     {"asctime": "2023-12-29 09:08:55,633", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML  file.py:272
                                    config /home/aiteam/.flyte/config.yaml"}                                                                                    
2023-12-29 09:08:55,639510 DEBUG    {"asctime": "2023-12-29 09:08:55,639", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.endpoint could not be found in yaml config"}                                                             
2023-12-29 09:08:55,641170 DEBUG    {"asctime": "2023-12-29 09:08:55,641", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.access-key could not be found in yaml config"}                                                           
2023-12-29 09:08:55,642768 DEBUG    {"asctime": "2023-12-29 09:08:55,642", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.secret-key could not be found in yaml config"}                                                           
2023-12-29 09:08:55,668514 INFO     {"asctime": "2023-12-29 09:08:55,668", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML  file.py:272
                                    config /home/aiteam/.flyte/config.yaml"}                                                                                    
2023-12-29 09:08:55,671290 DEBUG    {"asctime": "2023-12-29 09:08:55,671", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.insecureSkipVerify could not be found in yaml config"}                                                                
2023-12-29 09:08:55,674539 DEBUG    {"asctime": "2023-12-29 09:08:55,674", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.caCertFilePath could not be found in yaml config"}                                                                    
2023-12-29 09:08:55,676770 DEBUG    {"asctime": "2023-12-29 09:08:55,676", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.command could not be found in yaml config"}                                                                           
2023-12-29 09:08:55,678432 DEBUG    {"asctime": "2023-12-29 09:08:55,678", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.clientId could not be found in yaml config"}                                                                          
2023-12-29 09:08:55,680174 DEBUG    {"asctime": "2023-12-29 09:08:55,680", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.clientSecretLocation could not be found in yaml config"}                                                              
2023-12-29 09:08:55,681888 DEBUG    {"asctime": "2023-12-29 09:08:55,681", "name": "flytekit", "levelname": "DEBUG", "message": "Switch admin.scopes file.py:222
                                    could not be found in yaml config"}                                                                                         
2023-12-29 09:08:55,683648 DEBUG    {"asctime": "2023-12-29 09:08:55,683", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    console.endpoint could not be found in yaml config"}                                                                        
2023-12-29 09:08:55,685305 DEBUG    {"asctime": "2023-12-29 09:08:55,685", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.httpProxyURL could not be found in yaml config"}                                                                      
2023-12-29 09:08:55,687107 DEBUG    {"asctime": "2023-12-29 09:08:55,687", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.endpoint could not be found in yaml config"}                                                             
2023-12-29 09:08:55,688733 DEBUG    {"asctime": "2023-12-29 09:08:55,688", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.access-key could not be found in yaml config"}                                                           
2023-12-29 09:08:55,690359 DEBUG    {"asctime": "2023-12-29 09:08:55,690", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.secret-key could not be found in yaml config"}                                                           
2023-12-29 09:08:55,692297 INFO     {"asctime": "2023-12-29 09:08:55,692", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML  file.py:272
                                    config /home/aiteam/.flyte/config.yaml"}                                                                                    
2023-12-29 09:08:55,774191 INFO     {"asctime": "2023-12-29 09:08:55,774", "name": "flytekit", "levelname": "INFO", "message": "Registering an base_agent.py:122
                                    agent for task type sensor"}                                                                                                
Running pyflyte register from /mnt/data1/hainq/flyte-train with images ImageConfig(default_image=Image(name='default', fqn='cr.flyte.org/flyteorg/flytekit', tag='py3.8-1.9.1'), images=[Image(name='default', fqn='cr.flyte.org/flyteorg/flytekit', tag='py3.8-1.9.1')]) and image destination folder /root on 1 package(s) ('/mnt/data1/hainq/flyte-train',)
2023-12-29 09:08:56,061509 INFO     {"asctime": "2023-12-29 09:08:56,061", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML  file.py:272
                                    config /home/aiteam/.flyte/config.yaml"}                                                                                    
2023-12-29 09:08:56,064550 INFO     {"asctime": "2023-12-29 09:08:56,064", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML  file.py:272
                                    config /home/aiteam/.flyte/config.yaml"}                                                                                    
2023-12-29 09:08:56,066876 DEBUG    {"asctime": "2023-12-29 09:08:56,066", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.insecureSkipVerify could not be found in yaml config"}                                                                
2023-12-29 09:08:56,068563 DEBUG    {"asctime": "2023-12-29 09:08:56,068", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.caCertFilePath could not be found in yaml config"}                                                                    
2023-12-29 09:08:56,070225 DEBUG    {"asctime": "2023-12-29 09:08:56,070", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.command could not be found in yaml config"}                                                                           
2023-12-29 09:08:56,071906 DEBUG    {"asctime": "2023-12-29 09:08:56,071", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.clientId could not be found in yaml config"}                                                                          
2023-12-29 09:08:56,073529 DEBUG    {"asctime": "2023-12-29 09:08:56,073", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.clientSecretLocation could not be found in yaml config"}                                                              
2023-12-29 09:08:56,075053 DEBUG    {"asctime": "2023-12-29 09:08:56,075", "name": "flytekit", "levelname": "DEBUG", "message": "Switch admin.scopes file.py:222
                                    could not be found in yaml config"}                                                                                         
2023-12-29 09:08:56,076688 DEBUG    {"asctime": "2023-12-29 09:08:56,076", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    console.endpoint could not be found in yaml config"}                                                                        
2023-12-29 09:08:56,078326 DEBUG    {"asctime": "2023-12-29 09:08:56,078", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    admin.httpProxyURL could not be found in yaml config"}                                                                      
2023-12-29 09:08:56,080050 DEBUG    {"asctime": "2023-12-29 09:08:56,080", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.endpoint could not be found in yaml config"}                                                             
2023-12-29 09:08:56,081634 DEBUG    {"asctime": "2023-12-29 09:08:56,081", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.access-key could not be found in yaml config"}                                                           
2023-12-29 09:08:56,083435 DEBUG    {"asctime": "2023-12-29 09:08:56,083", "name": "flytekit", "levelname": "DEBUG", "message": "Switch              file.py:222
                                    storage.connection.secret-key could not be found in yaml config"}                                                           
Registering against 192.168.1.205:30081
2023-12-29 09:08:56,235803 DEBUG    {"asctime": "2023-12-29 09:08:56,235", "name": "flytekit", "levelname": "DEBUG", "message": "Common root folder  repo.py:141
                                    detected as /mnt/data1/hainq"}                                                                                              
Detected Root /mnt/data1/hainq, using this to create deployable package...
2023-12-29 09:08:56,245038 WARNING  {"asctime": "2023-12-29 09:08:56,245", "name": "flytekit.cli", "levelname": "WARNING", "message": "Could not    ignore.py:51
                                    determine ignored files due to:\nb'fatal: not a git repository (or any parent up to mount point                             
                                    /mnt)\\nStopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).\\n'\nNot applying any                    
                                    filters"}
y
is this in a git repo?
do you have a .gitignore file?
(i think the git thing is not relevant, asking just in case)
r
@Yee I forgot to push it :v.
image.png
it's just been killed
:v
y
what’s
/home/aiteam/.flyte/config.yaml
?
yeah not sure unf. there’s definitely something incorrect with the deployment (it looks like helm is in the middle of a failed upgrade).
and python is definitely hanging somewhere, though looking at the code i’m not sure where that can be.
can you try adding debug statements to the flytekit code to see where it’s getting stuck? we’re definitely seeing this “detected root” message
can you try seeing what line it’s getting hung up on?
you can edit the python library files directly.
r
I cann't trace where is hung up
Can you give me the path directory where should i edit in flytekit
I will try to understand and trace it myself
And addition to that, i have reconfig alot the helm chart. May be somewhere in helm config make broken. I will try to re build with default config in flyte again
y
what does this print
which python
?
r
@Yee
Copy code
$ python --version
Python 3.8.17
$ which python
/home/aiteam/anaconda3/envs/hainq/bin/python
y
the files should be under
/home/aiteam/anaconda3/envs/hainq/lib/site-packages/…
Its hung up in here
There are something very weird in function flytekit/tools/fast_registetration.py compute_digest I have some change to test on it with same registerd package
Copy code
# Here is the fuction i have chaged to debug, with the same package code, but i in my flyte binary react 300000 while on other sandbox is just 9000=> Very weird here
def compute_digest(source: os.PathLike, filter: Optional[callable] = None) -> str:
    """
    Walks the entirety of the source dir to compute a deterministic md5 hex digest of the dir contents.
    :param os.PathLike source:
    :param Ignore ignore:
    :return Text:
    """
    hasher = hashlib.md5()
    i = 1
    for root, _, files in os.walk(source, topdown=True):
        files.sort()

        for fname in files:
            print(i)
            i+=1
            abspath = os.path.join(root, fname)
            relpath = os.path.relpath(abspath, source)
            if filter:
                if filter(relpath):
                    continue

            _filehash_update(abspath, hasher)
            _pathhash_update(relpath, hasher)
    print('1')
    print(hasher.hexdigest())
    return hasher.hexdigest()
@Yee
y
do you have some very large file that it’s trying to hash?
can you print the filename in addition to
i
?
it feels like it’s walking symlinks or something.
i think this has to do with a difference in your environment.
r
look likes the problem is here, the fname in flyte binary register print has many file outside a package i want to regist ( In the picture have some pytorch weight file of my other workspace which is about 4Gb per 1 .pth file)
Why did this happend. I though the hashing is only hash the target package i choose
I using this command from /mnt/data1/hainq/flyte-train
Copy code
pyflyte register ./
##or
pyflyte reigster  /mnt/data1/hainq/flyte-train
## both of above command do registration from root which contain a lot of other out space file => broken because total size for hashing is too large, all file from root /mnt/root/ ....
It's do register from root, I have change to use
Copy code
pyflyte register train_workflows ## train_workflow is the sub-folder in flyte-train folder 
## this script make the registration with train_workflow folder only
@Yee
y
can you show your directory structure?
and what’s working and what’s not?
agree that this is bad behavior
just not sure how to prevent it
r
@Yee
here
'it's just a simple project
y
but what is
pwd
?
r
Absolutely /mnt/data1/hainq/flyte-train
So The problem is why the registration is process for entire of root.
Copy code
pyflyte register ./
pyflyte reigster  /mnt/data1/hainq/flyte-train
This two 2 bad, both of 2 is register from /mnt/root
y
does it work if you register
./train_workflows
?
r
@Yee yes, this can register with target package
y
can you do that for now?
this is a bug we’ll have to fix
r
@Yee Ok, my problem is end here with me 🤣. I have succesfully registed my package
y
no worries. registration should be better behaved.
r
thanks a lot, i cann't debug without your help 🤣
y
no worries, filed https://github.com/flyteorg/flyte/issues/4652 to keep track