MD5 Mismatch for PAN 2025 Dataset on Zenodo

Hi TIRA team,

I’m trying to submit to the PAN 2025 Task 1 (generative-ai-authorship-verification-panclef-2025), but the CLI is failing with:

csharp

CopyEdit

MD5 is unexpected: I expected "fd12cbb06a882276278655acc949b91d" but got "237de7f2289e34646935c31788d450ad"

This happens on both regular and --dry-run submissions. Could you please update the expected hash or advise?

Dear @OjaswaVarshney ,

thanks for reaching out!

I think what you describe happens when you the --dataset pan25-generative-ai-detection-smoke-test-20250428-training argument is missing when submitting to --task generative-ai-authorship-verification-panclef-2025.

The command here is correct: pan-code/clef25/generative-authorship-verification/pan25_genai_baselines at master · pan-webis-de/pan-code · GitHub

(if the --dataset pan25-generative-ai-detection-smoke-test-20250428-training argument is missing, it tries to run the evaluation on the validation data from zenodo, but this is restricted in its access.)

Does this resolve your problem?

Best regards,

Maik

For all who encounter the same problem, here is a colab notebook that shows how to load via the dataset id above.

https://colab.research.google.com/drive/12_pGh02ToXvLaFPIuO8np7UfnyJNu0Be?usp=sharing

Best regards,

Maik

Hi
still getting errors.
i’ve attached the error for your reference

Hi,

the error message Resouce stopwords not found indicates that they are likely not installed in the Docker image.

Could you please invite me to your github repository (my account name is mam10eks), then I can help to finalize the submission.

Thanks in advance!

Best regards,

Maik

Hi
I’ve sent you invitation for my repo.

1 Like

Awesome, thanks, I will look into this and will report back soon :slight_smile: