Can we submit new scripts to TIRA or maybe get access to the test data? We found a bug in our implementation and would like to retest and report our fixed results as well. Without being able to submit a new container to TIRA to test it, or having the test data, we won’t be able to have this result in the paper.
Yes, you can still submit to TIRA to get the evaluations, and we are, of course, very happy to see more submissions.
To quote @martin-potthast: “Science does not end with Deadlines”
Please make submissions to the validation and the test data, so you can verify that everything was successful on the validation data, as the outputs are unblinded there by default.
As soon as you have all your new submissions ready, please drop me a message, then I can also re-create the detailed report.
We will keep the submission open forever, so that we can keep the test data private and still can allow meaningful evaluations on the test data.
For access to the test data: We had extensive discussions, internal and with participants, on whether we should release the test data. The advantages of releasing the test data would be, that one could do extensive error analysis on the test data, or that one can make additional evaluations. But the problem with this is, that as soon as the test data is released and one makes an error analysis on the test data, it becomes a new validation dataset. So one can not use it for testing anymore, but there went over 400 hours of manual annotation effort into the dataset, so we think it is important to keep the test data reusable. One additional reason for us to keep the test data private is that LLMs are nowadays trained on more and more tasks (e.g., some are already trained on ~2000 tasks), so we think this is also an important reason to keep the test data private, and only allow evaluation through TIRA, because this is the only way to really ensure that LLMs or models distilled from them do not already have seen the test data.
Can you explain what the main limitation for you is?
Is it that you cant access the internet because we remove the internet connection for reproducibility?
We are currently working on a solution to this, so that one can access arbitrary external data, if you want, you can become an early adopter on this
I of course also see that software submissions come with additional efforts as one has to “deploy” the approach in a Docker image, but the benefits in terms of reproducibility and re-usability are huge for comparably small efforts.
Yes, the limitation that I mentioned is basically the no access to internet. For example, there was one approach that we wanted to try using GPT3, which required access to their API. However, for current approach it’s okay. It’s sufficient. And your statement regarding reproducibility and re-usability is true.
BTW, In the docker submission page, it’s not possible to select an image. On the selection list, all options are loading. And although I’ve been waiting for quite long, it doesn’t change. Is there a problem with the server?