is there a date when the '23 data set will be augmented with the text that is in the images? As far as we know, we would get the image texts so that we would not need to use OCR like most of the teams last year.
Thank you so much!
I hope I get this done next week. If not it should be ready the week after that.
Thank you so much, that is probably a lot of work! It would be great if you could reply here when the data is complete.
Sorry that it took so long! Quality assurance took me longer than I expected. However, I think the new dataset is now in a good shape. And the OCR worked (from what I saw) really well.
I still need to update some links, but you can the data already here: Touché23-Image-Retrieval-for-Arguments | Zenodo
Thanks to all the queries that were submitted the data now also has a nice size for a retrieval task (a bit over 1000 images per topic on average)!
Great, thank you so much!
By the way, if you have feedback on the dataset: I would still be open to change things if necessary. Especially: do you think there is still too much text on the images?