Clickbait Spoiling: Additional Resources

The original training/validation data and trained baselines are available online (details on its construction are available in the paper that constructed this dataset).

This topic serves to collect additional resources that might be helpful for participants (during the shared task but also post-hoc), be it complementary training/evaluation data, code, or trained models.

Please feel free to share your resources here if you think they might be helpful to others.

Best regards,

Maik

There is a very cool project of @janetzhong82, Beicheng, and Ollie on clickbait spoiling that used abstractive question answering (here is the thread on Twitter). Their repository (including a supplementary dataset that might be useful for training/validation and code) is available on GitHub.

Best Regards,

Maik

Markus Sverdvik Heiervang wrote a master thesis on clickbait spoiling that is available online.
The thesis looks very cool, and he also did go the extra mile and published trained models on Hugging Face that might be very helpful in this shared task.

Best Regards,

Maik

This paper (accepted at EMNP 2022) might has some pre-training approaches that might be helpful for spoiling as well [2205.10455] Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

Very interesting: I just stumbled upon the Question Answering tracks that ran on TREC (I think from 2003 to 2007) that had list questions (additional to factoid and definition questions).

Those list questions are not exactly the same as the multipart spoilers in clickbait spoiling, but it is quite similar. For instance [1]:

list question asks for different answer instances that satisfy the information need, such as List the names of chewing gums. Answering such questions requires a system to assemble a response from information located in multiple documents.

Here are some examples of those list questions: “Name titles of movies, other than Superman movies, that Christopher Reeve acted in.”, or “What are the titles of songs written by John Prine?”.
Hence, I think it would be very interesting to check if some of those list question-answering approaches (e.g., [2]) would work for multipart spoilers.

Best regards,

Maik

[1] Overview of the TREC 2007 question answering track
[2] Automatic Set Expansion for List Question Answering

Dear all,

Matthias just pointed out that this paper might be very interesting for multi-part spoilers: [2302.01691] LIQUID: A Framework for List Question Answering Dataset Generation

Best Regards,

Maik