Troubles when running local tira dry run

FelixV · May 20, 2025, 1:24pm

Hello,

I have created a repository of my code and a docker file and have installed Tira via pip. The authentication and verification was successful.

I wanted to test the submission with a local dry run. This is the following command, that I executed: tira-cli code-submission --dry-run --path . --task generative-ai-authorship-verification-panclef-2025 --dataset pan25-generative-ai-detection-smoke-test-20250428-training

I receive three confirmations: 1) The dataset … is available locally 2) The code is in a git repository 3) The code is embedded into the docker image …

However, after that I receive the following error: ValueError: No unique *.jsonl file was found, only the files were available. (see attachment).

I would be more than thankful if anyone could provide a hint how to resolve that issue.

Kind regards.

maik_froebe · May 20, 2025, 1:32pm

Dear Felix,

Thank you for reaching out!

This error message could indeed be improved a bit, the message was intended to say that the software was expected to produce a jsonl file in the output directory, but there were no files in the output directory. I.e., there is very likely an error earlier in the log that explains what failed.

Does this help you? You can also invite me to your git repository (my account is mam10eks), then I can try to help to solve this.

Best regards,

Maik

FelixV · May 20, 2025, 1:46pm

Dear Maik,

thank you for your swift reply! Unfortunately, I am not much further with this issue.

Thank you for offering to help me out with my repo. I did invite you to the repo.

Kind regards,
Felix.

FelixV · May 20, 2025, 2:25pm

Dear Maik,

I am enclosing a second screenshot. It shows at the bottom the command that I executed and on top the received error. I tried a slightly changed command as before but ended up in the same error.

Maybe this sheds some more light on it.

maik_froebe · May 20, 2025, 3:03pm

Perfect, this helps indeed.

It tries to execute /tira-data/input/dataset.jsonl as a binary, but this file is not executable.

The part

--command '$inputDataset/dataset.jsonl $outputDir'

describes which command is executed within the software (where $inputDataset points to the directory where the input data is mounted). Hence, the command above fails, because it tries to execute the input data.

This parameter should rather go in this direction:

--command '/path-to-my-script.py $inputDataset/dataset.jsonl $outputDir'

E.g., for the Generative Authorship Verification baseline, the flag looks like:

--command '/usr/local/bin/pan25-baseline tfidf $inputDataset/dataset.jsonl $outputDir'

Does this help to resolve the problem?

Best regards,

Maik

FelixV · May 20, 2025, 3:49pm

Dear Maik,

thank you very much! That was indeed the issue and I could successfully run the dry run.

I have one more question though. I would like to execute the final code submission, to see if it works. Can I do this multiple times or is it allowed only once and the submission will count as the final one?

Thank you very much, kind regards.
Felix.

maik_froebe · May 20, 2025, 4:31pm

Dear Felix,

awesome, nice that it worked!

You can do multiple submissions, we usually do not have a strict maximum number of submissions. (if you have many submissions, I would say more than 10, then the organizers may ask you to prioritize them, but this happens very rarely)

Best regards,

Maik

FelixV · May 21, 2025, 3:49pm

Dear Maik,

thank you very much for your help!

I have tried the code-submission. However, depending on the respective tira command I execute, I run into one of the following errors.

First error: “git repo is not clean”.

However, I can ensure that it is. My git status shows nothing to commit and a clean working tree.

Do you have any idea how to fix this?

Kind regards,
Felix.

FelixV · May 21, 2025, 3:50pm

…

If I exclude the --dataset from the command I receive, however, and MD5 error like this:

maik_froebe · May 21, 2025, 4:15pm

Hi Felix,

Sorry for the MD5 error, the problem here is that the dataset is not available on Zenodo anymore, this causes the MD5 error. We updated our database, but you likely have the old URL still in some local cache. Could you please run rm -Rf ~/.tira/.archived/? This should resolve the error.

(and use --dataset pan25-generative-ai-detection-smoke-test-20250428-training)

Best regards,

Maik

FelixV · May 22, 2025, 8:20am

Thank you very much! That helped indeed and the process progressed much farther.

Unfortunately, it seems that shortly before the final submission I run into another error. I post a screenshot here.

Do you have any idea?

maik_froebe · May 22, 2025, 8:43am

Hi, this is a gataway timeout of the server (calling the kubernetes pods), there are (because of the deadline), some spikes in the load.

I think it is fine if you just re-run the command, the layers that are already pushed are not pushed again, so it should get better

I also experienced this a few times yesterday, but when I tried it a second time it usually worked.

If this persists for you, we can have a more detailed look, but for the moment I think just trying this a second time should do it.

Best regards,

Maik

fosu_bruan · May 22, 2025, 9:54am

Hi, Thank you very mush ! could you tell me what happen? Do you have any idea?

maik_froebe · May 22, 2025, 11:01am

Hi @fosu_bruan,

Sorry for the inconvenience!

The tira client currently does not automatically detect expired keys (the authentication key for the docker registry connected to your team was from last year and expired).

I now refreshed the authentication to the docker registry, so it should work now.

Could you please test again?

Thanks in advance!

Best regards,

Maik

fosu_bruan · May 22, 2025, 11:45am

Thank you very mush! I test again just before, It worked!
but now I run into another error ,That is my screenshot.
could you told me what happen? Thank you !!!

maik_froebe · May 22, 2025, 12:03pm

Hi,

this might be a short hiccup, as the servers are currently under load.

Could you please try again?

Best regards,

Maik

fosu_bruan · May 22, 2025, 12:38pm

how nice of you.
Thank you for your help!

s_titze25 · May 22, 2025, 1:08pm

Hello, I have a few different questions: I want to run a programme that uses (like binoculars) two LLMs. Unfortunally I can’t run my programme on my local PC. I developed the code on a cluster, however I can’t use docker there. I used following command to submit my approach: tira-cli code-submission --path . --mount-hf-model tiiuae/falcon-7b tiiuae/falcon-7b-instruct --task generative-ai-authorship-verification-panclef-2025 --command “python3 evaluate_tira.py $inputDataset/dataset.jsonl $outputDir”. Although I have run the command rm -Rf ~/.tira/.archived/, I’m still getting a MD5 error. I can’t use --dataset. With a smaller LLM that was included in the dockerfile i could successfully perform a dry-run: tira-cli code-submission --dry-run --path . --task generative-ai-authorship-verification-panclef-2025 --dataset pan25-generative-ai-detection-smoke-test-20250428-training --command 'python3 evaluate_tira.py $inputDataset/dataset.jsonl $outputDir’. My new approach is to mount the larger models and trying not to run it locally. Is the command correct? I’d also like to know whether config_a = PretrainedConfig.from_pretrained(“/ tiiuae/falcon-7b”, local_files_only=True) in my evaluate_tira script is correct or if the mounted models in the container are in a subfolder like /models?
Thank you for your efforts!
Kind regards,
Sophie Titze

maik_froebe · May 22, 2025, 1:44pm

Hi Sophie,

To prevent the md5 error, please pass --dataset pan25-generative-ai-detection-smoke-test-20250428-training, as otherwise it tries to run on the validation dataset that is not public on Zenodo.

When the code is ready, you could give me access to the repository (my account on Github is mam10eks), and then I could finalize the submission.

The command looks correct.

You usually do not need to modify the from_pretrained commands, as we mount the models to the location where huggingface searches for them (still there are some special cases where one has to modify the code slightly, but usually it works out of the box, and making the models configurable via parameters should help so that the code must not be modified).

I.e., I think the correct way to load this model would be:

config_a = PretrainedConfig.from_pretrained(“tiiuae/falcon-7b”, local_files_only=True)

(i.e., I removed the starting /)

Does this help you?
(We also can continue in a private chat to finalize everything)

Best regards,

Maik

FelixV · May 22, 2025, 1:50pm

Dear Maik,

thank you very much for all your great help. That is very much appreciated!

I was able to make a submission. However, when I try to run it on the server on the smoke test dataset the process is scheduled but the execution is trapped in a loop. It does not seem to finish.

Is this because of the current load or are there any issues with my submission?

Here is a screenshot.