As we tested out our docker container submission on our own, the docker container ran the python script as an entry point just fine, but as we submitted our docker submission via TIRA, it ran into the following issue running the script:
What could be the issue to this?
Our Dockerfile pulls from the following image:
Ah, yes, indeed, now that I see your entrypoint, it is clear to me that the entrypoint works but the software not.
In your entrypoint, you execute the program python, which has /run_task_1.py as parameter (and python in that case runs your script).
But the command that you specified was just /run_task_1.py, so it thinks that this is an executable program. You can resolve this by either changing the command that is executed, e.g., to python /run_task_1.py, or (and I think this is the better approach) specifying the Shebang of /run_task_1.py and make this executable.
Ah sorry, concerning the /run_task_1.py command thats a typo and is meant to be /run_task_2.py.
Concerning your email, we can confirm that we did not initially add a ‘shebang’, so we will try it out. Any specific tips on finding the python executable within the docker image?
On the second point, I assume we should try:
ENTRYPOINT [ “python”, “ python /run_task_1.py” ]
in that format?
On a third note, in the baselines examples, the code tests out the docker containers locally via the tira.run command from the tira library. Are there any instructions on where to get and install this tira python library? A quick search yields no useful results.
I think the best way to figure out the python executable in your docker image would be to run which python.
There is not yet a library for tira, the baseline used this script that you can use to test your submission locally.
I also saw that you are now one step further, and your software now failed because it could not parse the shebang
So we are almost there
I think this is now an encoding problem: “/usr/local/lib/python3.7^M” is not a valid python executable because of the “^M”. The “^M” is the windows linebreak, but as the docker image runs some linux, it does not pay attention to the “^M”. Can you maybe convert the line breaks to linux ones? I think one can configure this for all editors?
I also know that some tools (i think git) automatically convert linebreaks to the linux linebreaks.
Thank you, we managed to change the linebreaks of the run_task_2.py file from the CRLF to the LF format with our editor configuration.
The current error we are sitting on involves a pemission issue, which we do not yet know how to solve yet.
/scripts-2761-18357/step_script: /run_task_2.py: /usr/local/lib/python3.7: bad interpreter: Permission denied
I just tested this inside your container and it worked, so can cou please make an additional submission with the shebang #!/usr/bin/env python3? Then it should work (I have not tested your script, only that this as shebang works in your pod).
Thanks, the provided shebang #!/usr/bin/env python3
worked just fine.
The current problem now lies at our script in the uploaded image executing but the script not finding the local model
01/24/2023 12:38:38 - INFO - farm.modeling.language_model - Could not find saved_models/qa-model-task2 locally.
01/24/2023 12:38:38 - INFO - farm.modeling.language_model - Looking on Transformers Model Hub (in local cache and online)…
but when we run a container locally with this image, the script executed in the container runs just fine
2023-01-24 13:44:32 01/24/2023 12:44:32 - INFO - farm.modeling.language_model - Model found locally at saved_models/qa-model-task2
2023-01-24 13:44:35 01/24/2023 12:44:35 - INFO - farm.modeling.language_model - Loaded saved_models/qa-model-task2
What could be the cause of the local container being able to find the model within the image, while the uploaded image fails to do so?
Suggestions would be appreciated! Our current dockerfile just uses the following copy command for copying the files of the repo into the image
I think the problem that you now face can be resolved by specifying the absolute path where the models are stored. E.g., the path saved_models/qa-model-task2 is a relative path, so they might become invalid when the environment changes. (which also happened here.)
I looked into your image, and it seems like the saved_models directory is directly in the root / of the system. I.e., it should work if you switch the path from saved_models/qa-model-task2 to /saved_models/qa-model-task2
Can you please try it again with this adjusted path?