Is pan25 test data set byte-in-byte the same as it was used in 25?

Dear Maik,

I’m doing some post-analysis and have a question about one of the datasets on TIRA.

I have a run result here:

  • Dataset: pan25-generative-ai-detection-20260508-test
  • Run: 2026-05-22-19-42-33 (submission alternating-contingency)

As far as I recall, the PAN’25 test set was never released publicly (it stays blinded on TIRA), so I want to confirm what this dataset actually is.

Is pan25-generative-ai-detection-20260508-test exactly the same data that was used as the official PAN’25 test set — i.e., the identical documents and ground-truth labels, with
no resampling, additions, re-obfuscation, or relabeling?

The reason I ask: I’d like to know whether my score on this dataset is directly comparable to the published PAN’25 leaderboard results. If the composition differs in any way
from the 2025 official test set, the comparison wouldn’t be apples-to-apples.

Thanks very much,
– Yurii

Hi, yes, the pan25-generative-ai-detection-20260508-test dataset is exactly the 2025 test dataset.

(I checked this, as others also asked, and to ensure we did no mistakes, the md5sum is identical, so it is the same dataset.)

The results should be comparable, I am not sure if the dataset was released, but I think it is still private. (which is good now, because this means that the evaluation is still meaningful.)

@jbevendorff I am not sure if we released the 2025 dataset, I think we did keep it private, or?

Best regards,

Maik

2 Likes

I recall that we also did execute some of the submissions from 2025, and they also received the same evaluation scores, so additionally to the identical md5 sum this is also a good second indicator that the evaluation is also still the same :slight_smile:

Fast and cool, as usual!
Thx

1 Like