Best practice for integrating transformers

Hello,
while submitting for the Multi-Author Writing Style Analysis task I ran into a problem. I am following the provided baseline: pan-code/clef25/multi-author-analysis/naive-baseline at master · pan-webis-de/pan-code · GitHub
Everything works locally, however, when I am executing a dry-run, I receive an error that the model is not found:

tira-cli code-submission --dry-run --path . --task multi-author-writing-style-analysis-2025 --dataset multi-author-writing-spot-check-20250503-training --command '/predict.py --dataset $inputDataset --output $outputDir --predict 0'

OSError: Incorrect path_or_model_id: ‘path/to/model/’. Please provide either the path to a local folder or the repo_id of a model on the Hub.

When using the provided baseline, what is the best way to integrate huggingface/pytorch models? I looked around in the forum, but honestly, I am a bit confused.

I would really appreciate some help on this matter.
Best,
Philipp

Dear Philipp,

Thanks for reaching out!

We have two ways, one way would be to add the models into the Docker image. For smaller transformer models, adding the weights of the transformer to the image does not add too much overhead, so this is fine. For instance, this is how a mini-lm transformer is added to the touche ads detection baseline:

# Add a line like this to your Dockerfile to ensure the tokenizer is in the docker image:
RUN python3 -c 'from transformers import AutoTokenizer, AutoModel; AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2"); AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2");'

For larger models, this is not possible, and for this we have a --mount-hf-model flag so that you can mount a model from hugging face. For instance, here is an example on how meta-llama/Llama-3.1-8B-Instruct and meta-llama/Llama-3.1-8B are mounted to the Gen AI Detection baseline: https://github.com/pan-webis-de/pan-code/tree/master/clef25/generative-authorship-verification/pan25_genai_baselines#submit-llm-approaches-to-tira. (more details for this are available here.

Does this help you?

Best regards,

Maik

Dear Maik,
thank you very much for your fast help, this solved the issue! :slight_smile:
However I run into another problem while doing a dry-run:

I was using this command:

tira-cli code-submission --dry-run --path . --task multi-author-writing-style-analysis-2025 --dataset multi-author-writing-spot-check-20250503-training --command '/predict.py --dataset $in
putDataset --output $outputDir --predict 0'

There is no log message before.

Best,
Philipp

Hi, this error message (which can be improved :)) indicates that there are no predictions written to the expected directory.

Could you share the repository with me in which you run the example?

Thanks in advance!

Best regards,

Maik

Hi Maik,
thank you, thats very kind! I have sended you an inivitation to “mam10eks”.
Best,
Philipp

1 Like

Awesome, thanks!

I now have it, there was an error message above the traceback:

The sh: 1: /predict.py: Permission denied indicates that the file is not executable.

You can resolve this via:

chmod +x predict.py

After this change, everything seems to work, i.e., the software makes predictions.

(update: i just committed the modified permissions to the git repo, so it should be resolved)

Does this solve your problem?

Best regards,

Maik

Hi Maik,
thanks for you help again!
I pulled and the file predictions for predictions.py are

-rwxrwxrwx 1 root root 12408 May 22 10:37 predict.py

Unfortunately, I am still encountering an exception when doing a dry-run:
Traceback (most recent call last):

File “/predict.py”, line 279, in
main()
~~~~^^
File “/usr/local/lib/python3.13/site-packages/click/core.py”, line 1442, in call
return self.main(*args, **kwargs)
~~~~~~~~~^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.13/site-packages/click/core.py”, line 1363, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.13/site-packages/click/core.py”, line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.13/site-packages/click/core.py”, line 794, in invoke
return callback(args, **kwargs)
File “/predict.py”, line 273, in main
input_df = tira.pd.inputs(dataset, formats=[“multi-author-writing-style-analysis-problems”])
File “/usr/local/lib/python3.13/site-packages/tira/pandas_integration.py”, line 149, in inputs
return pd.DataFrame(dataset_items)
~~~~~~~~~~~~^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.13/site-packages/pandas/core/frame.py”, line 843, in init
data = list(data)
File “/usr/local/lib/python3.13/site-packages/tira/tira_client.py”, line 250, in iter_dataset
for i in lines_if_valid(Path(dataset_dir), format):
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.13/site-packages/tira/check_format.py”, line 1125, in lines_if_valid
raise ValueError(msg)
ValueError: There are no files matching the multi-author-style file pattern of 'problem-
.txt’ in the directory /tira-data/input.

There are no files matching the multi-author-style file pattern of ‘solution-problem-.json’ in the directory /tmp/tira-qji9ae0b.
Traceback (most recent call last):
File “/usr/local/bin/tira-cli”, line 8, in
sys.exit(main())
~~~~^^
File “/usr/local/lib/python3.13/site-packages/tira/tira_cli.py”, line 360, in main
return args.executable(**vars(args))
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File “/usr/local/lib/python3.13/site-packages/tira/tira_cli.py”, line 235, in code_submission_command
client.submit_code(
~~~~~~~~~~~~~~~~~~^
Path(path),
^^^^^^^^^^^
…<5 lines>…
mount_hf_model=mount_hf_model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File “/usr/local/lib/python3.13/site-packages/tira/tira_client.py”, line 404, in submit_code
raise ValueError(msg)
ValueError: There are no files matching the multi-author-style file pattern of 'solution-problem-
.json’ in the directory /tmp/tira-qji9ae0b.

Best,
Philipp

Hi Philipp,

I tried it from a fresh clone of your repository and there it worked. It looks to me that the predict.py file does not get the correct path to the data. The --dataset $inputDataset flag should do this.

I.e., when I run this command, it works (I uploaded the software to TIRA with that, it produced valid outputs there as well):

tira-cli code-submission \
    --dry-run \
    --path . \
    --task multi-author-writing-style-analysis-2025 \
    --dataset multi-author-writing-spot-check-20250503-training \
    --command '/predict.py --dataset $inputDataset --output $outputDir

Does this resolve the issue for you?

(Otherwise we could continue in a private chat, but with the command above I was able to upload the submission to TIRA.)

Best regards,

Maik

Hi Maik,
glad to hear it works on your side! I also did a fresh clone but the mentioned issue still persists.
We can continue in a private chat.
Best,
Philipp

I think we resolved the issue, just in case others also stumble upon this, the example above executed /predict.py, and this predict.py had a shebang that defined that the script is executed as python3 script, but this had a non-linux line break, so it tried to execute python3\r instead of python3. So the solution (we are still testing, so I am not yet sure) should be to use python3 /predict.py instead of /predict.py.

Best regards,

Maik