Announcements

johanneskiesel · July 21, 2021, 10:05am

Dataset update version: 2021-07-20

We published a new version of the dataset for the shared task:

We fixed a bug in that the turn numbers were not sequentially in a few cases.
You do not need to change your approaches.
Please use the 2021-07-20 version from now on (from Zenodo and in TIRA). We will remove the old version from TIRA in a few days.

The new version contains the data in the appropriate locations:

We added the QReCC training dataset in the SCAI QReCC format to Zenodo: Please use this to train your approaches. To make clear you should not train your approach in TIRA, this dataset is not available in TIRA.
We removed the test dataset ground-truth from Zenodo and set it as private in TIRA (So you have to send us a short mail at scai-qrecc@googlegroups.com to unblind the results for you. Otherwise you can’t see the numbers or console output of your run). This is to avoid confusion on whether to use the test dataset for training (answer: not).
We again added a toy dataset to both Zenodo and TIRA that contains just the first conversation of the test set. Please use this to ensure your approach works in TIRA. Results are automatically unblinded for this dataset.

Furthermore, the questions file now come in two variants: the normal one and “rewritten”. The latter has the ground-truth rewritten questions of the participants instead of the original questions. We added these as some of you do not want to tackle question rewriting. As this is a separate dataset in TIRA, it corresponds to a separate leaderboard (see this answer)

To summarize (this should be mostly familiar to you):

Train your approach using the scai-qrecc21-training-*.json files
Deploy to TIRA and give it a quick test with the scai-qrecc21-toy-dataset-2021-07-20 and/or scai-qrecc21-toy-dataset-rewritten-2021-07-20 datasets (See Submission and Evaluation)
Run your approach on scai-qrecc21-test-dataset-2021-07-20 and/or scai-qrecc21-test-dataset-rewritten-2021-07-20 the same way. Then mail at scai-qrecc@googlegroups.com with your team name to ask us to unblind the results for you. The leaderboards for the new “normal” dataset is here. We will set up the one for the “rewritten” variant in the next few days.

The new main leaderboard is here. We will add the leaderboard for the “rewritten” variant shortly

All registered participants will get a pointer to this announcement by mail. Please register if you have not already done so.

Happy research!
Johannes