Hi SCAI-QReCC organizer,
Thanks a lot for your great work! I have a question regarding the query rewriting input. I think in the QReCC paper, it says the QR model is trained with the input to be “a follow-up question and the preceding question-answer pairs” in Section 7. However, the SCAI-QReCC github page seems to infer that the input should be proceeding questions only.
I looked at a few examples and notice that many human rewritten queries actually need information from the answers or the passages. Then I looked at the baseline QR model results and notice quite a lot examples also contain information that is not very possible to be inferred from questions alone. For example, in conversation 81 turn 3, the model predicts “Why did Hattie McDaniel’s Oscar disappear” with the previous and current questions to be “where is she?, Are there any other interesting aspects about this article?, Why did it disappear?”. In my opinion, it is extremely hard for a model to predict “Hattie McDaniel’s Oscar” correctly if it only sees the questions.
Therefore, it would be helpful if you could help me understand this and clarify what format of input is fed into the baseline during both training and inference. In addition, I notice that in some files, the “Context” field contains all proceeding gold rewritten questions instead of original questions. I wonder whether and how this “Context” information is used in the baseline.
Thanks a lot in advance!