Question about query rewriting model input

Thanks a lot for your great work! I have a question regarding the query rewriting input. I think in the QReCC paper, it says the QR model is trained with the input to be “a follow-up question and the preceding question-answer pairs” in Section 7. However, the SCAI-QReCC github page seems to infer that the input should be proceeding questions only.

I looked at a few examples and notice that many human rewritten queries actually need information from the answers or the passages. Then I looked at the baseline QR model results and notice quite a lot examples also contain information that is not very possible to be inferred from questions alone. For example, in conversation 81 turn 3, the model predicts “Why did Hattie McDaniel’s Oscar disappear” with the previous and current questions to be “where is she?, Are there any other interesting aspects about this article?, Why did it disappear?”. In my opinion, it is extremely hard for a model to predict “Hattie McDaniel’s Oscar” correctly if it only sees the questions.

Therefore, it would be helpful if you could help me understand this and clarify what format of input is fed into the baseline during both training and inference. In addition, I notice that in some files, the “Context” field contains all proceeding gold rewritten questions instead of original questions. I wonder whether and how this “Context” information is used in the baseline.

For this task, the input should indeed be the proceeding questions only (like said on the SCAI-QReCC GitHub page). However, approaches can also use other information they gathered (in passage retrieval/question answering). To clarify this, we removed the “Context” field from the newer versions of the dataset (since turns are in order, your approach can keep previous questions in memory until the conversation number changes).

I can’t comment on the baseline. I guess @svakulenk0 can when she is back from vacation?

I hope I was able to help you. Thank you so much for being interested in our task!

I took a look at the example you referred to and also thought that it could not be generated by the QR model. Not sure what happened there. Unfortunately our team member responsible for running the QR experiments is on leave until early September. We will remove these results from the leaderboard for now and will send you an update once the bug is found. Thank you for reporting this issue!

Thank you very much for your quick response! I’ll just keep an eye on the updates then :slight_smile:


I’m just checking in see whether there is any news on the input for your baseline rewrite model. I’m hoping to know the exact inputs to your query rewriting model to compare my own rewriting model to yours. Thanks a lot!