Troubles when running local tira dry run

Hello,

I have created a repository of my code and a docker file and have installed Tira via pip. The authentication and verification was successful.

I wanted to test the submission with a local dry run. This is the following command, that I executed: tira-cli code-submission --dry-run --path . --task generative-ai-authorship-verification-panclef-2025 --dataset pan25-generative-ai-detection-smoke-test-20250428-training

I receive three confirmations: 1) The dataset … is available locally 2) The code is in a git repository 3) The code is embedded into the docker image …

However, after that I receive the following error: ValueError: No unique *.jsonl file was found, only the files were available. (see attachment).

I would be more than thankful if anyone could provide a hint how to resolve that issue.

Kind regards.

Dear Felix,

Thank you for reaching out!

This error message could indeed be improved a bit, the message was intended to say that the software was expected to produce a jsonl file in the output directory, but there were no files in the output directory. I.e., there is very likely an error earlier in the log that explains what failed.

Does this help you? You can also invite me to your git repository (my account is mam10eks), then I can try to help to solve this.

Best regards,

Maik

Dear Maik,

thank you for your swift reply! Unfortunately, I am not much further with this issue.

Thank you for offering to help me out with my repo. I did invite you to the repo.

Kind regards,
Felix.

Dear Maik,

I am enclosing a second screenshot. It shows at the bottom the command that I executed and on top the received error. I tried a slightly changed command as before but ended up in the same error.

Maybe this sheds some more light on it.

Perfect, this helps indeed.

It tries to execute /tira-data/input/dataset.jsonl as a binary, but this file is not executable.

The part

--command '$inputDataset/dataset.jsonl $outputDir'

describes which command is executed within the software (where $inputDataset points to the directory where the input data is mounted). Hence, the command above fails, because it tries to execute the input data.

This parameter should rather go in this direction:

--command '/path-to-my-script.py $inputDataset/dataset.jsonl $outputDir'

E.g., for the Generative Authorship Verification baseline, the flag looks like:

--command '/usr/local/bin/pan25-baseline tfidf $inputDataset/dataset.jsonl $outputDir'

Does this help to resolve the problem?

Best regards,

Maik

Dear Maik,

thank you very much! That was indeed the issue and I could successfully run the dry run.

I have one more question though. I would like to execute the final code submission, to see if it works. Can I do this multiple times or is it allowed only once and the submission will count as the final one?

Thank you very much, kind regards.
Felix.

Dear Felix,

awesome, nice that it worked!

You can do multiple submissions, we usually do not have a strict maximum number of submissions. (if you have many submissions, I would say more than 10, then the organizers may ask you to prioritize them, but this happens very rarely)

Best regards,

Maik

Dear Maik,

thank you very much for your help!

I have tried the code-submission. However, depending on the respective tira command I execute, I run into one of the following errors.

First error: “git repo is not clean”.


However, I can ensure that it is. My git status shows nothing to commit and a clean working tree.

Do you have any idea how to fix this?

Kind regards,
Felix.

If I exclude the --dataset from the command I receive, however, and MD5 error like this:

Hi Felix,

Sorry for the MD5 error, the problem here is that the dataset is not available on Zenodo anymore, this causes the MD5 error. We updated our database, but you likely have the old URL still in some local cache. Could you please run rm -Rf ~/.tira/.archived/? This should resolve the error.

(and use --dataset pan25-generative-ai-detection-smoke-test-20250428-training)

Best regards,

Maik

Thank you very much! That helped indeed and the process progressed much farther.

Unfortunately, it seems that shortly before the final submission I run into another error. I post a screenshot here.

Do you have any idea?

Hi, this is a gataway timeout of the server (calling the kubernetes pods), there are (because of the deadline), some spikes in the load.

I think it is fine if you just re-run the command, the layers that are already pushed are not pushed again, so it should get better :slight_smile:

I also experienced this a few times yesterday, but when I tried it a second time it usually worked.

If this persists for you, we can have a more detailed look, but for the moment I think just trying this a second time should do it.

Best regards,

Maik


Hi, Thank you very mush ! could you tell me what happen? Do you have any idea?

Hi @fosu_bruan,

Sorry for the inconvenience!

The tira client currently does not automatically detect expired keys (the authentication key for the docker registry connected to your team was from last year and expired).

I now refreshed the authentication to the docker registry, so it should work now.

Could you please test again?

Thanks in advance!

Best regards,

Maik


Thank you very mush! I test again just before, It worked!
but now I run into another error ,That is my screenshot.
could you told me what happen? Thank you !!! :face_with_peeking_eye:

Hi,

this might be a short hiccup, as the servers are currently under load.

Could you please try again?

Best regards,

Maik

how nice of you.
Thank you for your help!

Hello, I have a few different questions: I want to run a programme that uses (like binoculars) two LLMs. Unfortunally I can’t run my programme on my local PC. I developed the code on a cluster, however I can’t use docker there. I used following command to submit my approach: tira-cli code-submission --path . --mount-hf-model tiiuae/falcon-7b tiiuae/falcon-7b-instruct --task generative-ai-authorship-verification-panclef-2025 --command “python3 evaluate_tira.py $inputDataset/dataset.jsonl $outputDir”. Although I have run the command rm -Rf ~/.tira/.archived/, I’m still getting a MD5 error. I can’t use --dataset. With a smaller LLM that was included in the dockerfile i could successfully perform a dry-run: tira-cli code-submission --dry-run --path . --task generative-ai-authorship-verification-panclef-2025 --dataset pan25-generative-ai-detection-smoke-test-20250428-training --command 'python3 evaluate_tira.py $inputDataset/dataset.jsonl $outputDir’. My new approach is to mount the larger models and trying not to run it locally. Is the command correct? I’d also like to know whether config_a = PretrainedConfig.from_pretrained(“/ tiiuae/falcon-7b”, local_files_only=True) in my evaluate_tira script is correct or if the mounted models in the container are in a subfolder like /models?
Thank you for your efforts!
Kind regards,
Sophie Titze

Hi Sophie,

To prevent the md5 error, please pass --dataset pan25-generative-ai-detection-smoke-test-20250428-training, as otherwise it tries to run on the validation dataset that is not public on Zenodo.

When the code is ready, you could give me access to the repository (my account on Github is mam10eks), and then I could finalize the submission.

The command looks correct.

You usually do not need to modify the from_pretrained commands, as we mount the models to the location where huggingface searches for them (still there are some special cases where one has to modify the code slightly, but usually it works out of the box, and making the models configurable via parameters should help so that the code must not be modified).

I.e., I think the correct way to load this model would be:

config_a = PretrainedConfig.from_pretrained(“tiiuae/falcon-7b”, local_files_only=True)

(i.e., I removed the starting /)

Does this help you?
(We also can continue in a private chat to finalize everything)

Best regards,

Maik

Dear Maik,

thank you very much for all your great help. That is very much appreciated!

I was able to make a submission. However, when I try to run it on the server on the smoke test dataset the process is scheduled but the execution is trapped in a loop. It does not seem to finish.

Is this because of the current load or are there any issues with my submission?

Here is a screenshot.