In order to implement the expando-mono-duo design-pattern with sequence-to-sequence models* for Task 2, we predicted a query for each document in the Touché-22 Task 2 corpus. The query can be interpreted as the most likely query, that the passage can potentially answer.
The queries were predicted with the doc2queryT5 model**. In the passages.jsonl file, each query is at the end of the contents field seperated by <query> </query> tags.
The doc2queryT5 was pretrained on the MS MARCO Dataset and we have not fine-tuned the model. We used the following model from the huggingface hub: castorini/doc2query-t5-base-msmarco · Hugging Face
If you want to see the prediction-script, than please feel free to message me :).
*Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. 2021. The Expando-Mono-Duo Design Pattern for Text Ranking withPretrained Sequence-to-Sequence Models.
**Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019).