Result for PAN software

Hi team,

When can we expect the result of the software submitted for PAN 2025?

Dear @OjaswaVarshney,

This depends on the task, the final results will be in the overview papers published at CLEF.
For some tasks, some final evaluations are still running.

For the GenAI Authorship verification task, the results on the individual datasets are available for the test dataset, the eloquent-01 dataset, the eloquent-02 dataset. Janek works on a unified evaluation that combines all three datasets, but this will take a while until this is finished. (but the individual results might already be interesting, but the complete interpretation with explanations will be in the overview paper).

Best regards,

Maik

Hi @maik_froebe ,
I was wondering why the pan25-generative-ai-detection-eloquent-02 results have inconsistent means (see, e.g., baseline binoculars: 0.962 1 1 1, mean: 0.792; no ROC result given).

Also it is not clear to me how the mean in the final results was computed (it is not an arithmetic mean of the other scores given in the final results; but maybe it is an arithmetic mean of the means across different test+eloquent-01+eloquent-02?).

It would be great to have an understanding of these individual results before camera-ready submission.

Thank you for clarifying,
Best regards
Jeremi

Hi Jeremi,

Thanks for reaching out, I do not know the details, @jbevendorff can clarify this.

Best regards,
Maik

Please use the values from the website: PAN at CLEF 2025 - Voight-Kampff Generative AI Detection

The eloquent-2 dataset contains only one late ELOQUENT submission, so the values are very different than eloquent-1 and the PAN ranking is inaccurate. The score on the website is calculated on all individual datasets contained in the test and the two ELOQUENT collections.

The Mean score is a macro-average of all Mean scores over all individual data sources, so the other columns don’t necessarily sum up to that value.

1 Like