Expanded Passages for the Touchè-22 Task 2: Argument Retrieval for Comparative Questions

annqt · January 17, 2022, 12:49pm

In order to implement the expando-mono-duo design-pattern with sequence-to-sequence models* for Task 2, we predicted a query for each document in the Touché-22 Task 2 corpus. The query can be interpreted as the most likely query, that the passage can potentially answer.

The dataset is available here: https://files.webis.de/corpora/corpora-webis/corpus-touche-task2-22/touche-task2-passages-version-002-expanded-with-doc-t5-query.jsonl.gz

The queries were predicted with the doc2queryT5 model**. In the passages.jsonl file, each query is at the end of the contents field seperated by <query> </query> tags.

The doc2queryT5 was pretrained on the MS MARCO Dataset and we have not fine-tuned the model. We used the following model from the huggingface hub: castorini/doc2query-t5-base-msmarco · Hugging Face
If you want to see the prediction-script, than please feel free to message me :).

*Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021. The Expando-Mono-Duo Design Pattern for Text Ranking withPretrained Sequence-to-Sequence Models.

**Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019).