New submission to TIRA (or access to test data)

Lingson · February 24, 2023, 2:00am

Hi,

Can we submit new scripts to TIRA or maybe get access to the test data? We found a bug in our implementation and would like to retest and report our fixed results as well. Without being able to submit a new container to TIRA to test it, or having the test data, we won’t be able to have this result in the paper.

Thanks.

maik_froebe · February 24, 2023, 5:26am

Dear Dirk,

Yes, you can still submit to TIRA to get the evaluations, and we are, of course, very happy to see more submissions.
To quote @martin-potthast: “Science does not end with Deadlines”

Please make submissions to the validation and the test data, so you can verify that everything was successful on the validation data, as the outputs are unblinded there by default.
As soon as you have all your new submissions ready, please drop me a message, then I can also re-create the detailed report.

We will keep the submission open forever, so that we can keep the test data private and still can allow meaningful evaluations on the test data.

For access to the test data: We had extensive discussions, internal and with participants, on whether we should release the test data. The advantages of releasing the test data would be, that one could do extensive error analysis on the test data, or that one can make additional evaluations. But the problem with this is, that as soon as the test data is released and one makes an error analysis on the test data, it becomes a new validation dataset. So one can not use it for testing anymore, but there went over 400 hours of manual annotation effort into the dataset, so we think it is important to keep the test data reusable. One additional reason for us to keep the test data private is that LLMs are nowadays trained on more and more tasks (e.g., some are already trained on ~2000 tasks), so we think this is also an important reason to keep the test data private, and only allow evaluation through TIRA, because this is the only way to really ensure that LLMs or models distilled from them do not already have seen the test data.

I hope this works well for you?

Best Regards,

Maik

Lingson · February 24, 2023, 10:27am

Hi Maik,

Thank you for letting us submit more tests. I’ll inform you when our tests are done.

Regarding the test data, your reason is understandable. It’s just that using TIRA has quite some limitation. But for the testing that we need, it would be sufficient.

Best regards,
Dirk

maik_froebe · February 24, 2023, 12:43pm

Dear Dirk,

Can you explain what the main limitation for you is?
Is it that you cant access the internet because we remove the internet connection for reproducibility?
We are currently working on a solution to this, so that one can access arbitrary external data, if you want, you can become an early adopter on this

I of course also see that software submissions come with additional efforts as one has to “deploy” the approach in a Docker image, but the benefits in terms of reproducibility and re-usability are huge for comparably small efforts.

Best Regards,

Maik

Lingson · February 25, 2023, 11:06am

Hi Maik,

Yes, the limitation that I mentioned is basically the no access to internet. For example, there was one approach that we wanted to try using GPT3, which required access to their API. However, for current approach it’s okay. It’s sufficient. And your statement regarding reproducibility and re-usability is true.

BTW, In the docker submission page, it’s not possible to select an image. On the selection list, all options are loading. And although I’ve been waiting for quite long, it doesn’t change. Is there a problem with the server?

Lingson · February 25, 2023, 11:10am

Please ignore the last part. I just have to wait longer

It’s finally loading already. So no problems. Thanks anyway.

maik_froebe · February 26, 2023, 9:40pm

Nice that it works now!

A submission with GPT3 or ChatGPT would be very interesting!
Please let me know if you would still like to do something in that direction, that would be very cool!

Best Regards,

Maik