Text on images in '23 data set for image Retrieval

sarah · January 26, 2023, 12:14pm

Dear all,
is there a date when the '23 data set will be augmented with the text that is in the images? As far as we know, we would get the image texts so that we would not need to use OCR like most of the teams last year.
Thank you so much!
Sarah Bachinger

johanneskiesel · January 26, 2023, 4:05pm

Hi Sarah,

I hope I get this done next week. If not it should be ready the week after that.

sarah · January 26, 2023, 6:11pm

Hi Johannes,

Thank you so much, that is probably a lot of work! It would be great if you could reply here when the data is complete.

johanneskiesel · February 10, 2023, 5:22pm

Hi Sarah,

Sorry that it took so long! Quality assurance took me longer than I expected. However, I think the new dataset is now in a good shape. And the OCR worked (from what I saw) really well.

I still need to update some links, but you can the data already here: Touché23-Image-Retrieval-for-Arguments | Zenodo

Thanks to all the queries that were submitted the data now also has a nice size for a retrieval task (a bit over 1000 images per topic on average)!

Thanks!

sarah · February 15, 2023, 5:57pm

Great, thank you so much!

johanneskiesel · February 16, 2023, 8:05am

By the way, if you have feedback on the dataset: I would still be open to change things if necessary. Especially: do you think there is still too much text on the images?