The original training/validation data and trained baselines are available online (details on its construction are available in the paper that constructed this dataset).
This topic serves to collect additional resources that might be helpful for participants (during the shared task but also post-hoc), be it complementary training/evaluation data, code, or trained models.
Please feel free to share your resources here if you think they might be helpful to others.
There is a very cool project of @janetzhong82, Beicheng, and Ollie on clickbait spoiling that used abstractive question answering (here is the thread on Twitter). Their repository (including a supplementary dataset that might be useful for training/validation and code) is available on GitHub.
Markus Sverdvik Heiervang wrote a master thesis on clickbait spoiling that is available online.
The thesis looks very cool, and he also did go the extra mile and published trained models on Hugging Face that might be very helpful in this shared task.
Very interesting: I just stumbled upon the Question Answering tracks that ran on TREC (I think from 2003 to 2007) that had list questions (additional to factoid and definition questions).
Those list questions are not exactly the same as the multipart spoilers in clickbait spoiling, but it is quite similar. For instance [1]:
list question asks for different answer instances that satisfy the information need, such as List the names of chewing gums. Answering such questions requires a system to assemble a response from information located in multiple documents.
Here are some examples of those list questions: “Name titles of movies, other than Superman movies, that Christopher Reeve acted in.”, or “What are the titles of songs written by John Prine?”.
Hence, I think it would be very interesting to check if some of those list question-answering approaches (e.g., [2]) would work for multipart spoilers.