Milestone 2

irisr · May 9, 2023, 8:56am

Hey,
We are doing the Milestone 2 and have a question about the assignment. It’s said that " You will upload (1) docker images with an updated version of your dataset containing qrels "

Which qrels should we upload for milestone 2? The ones that were sent to us or run.txt in BM25-output?

maik_froebe · May 9, 2023, 9:14am

Dear @irisr,

Thanks for reaching out!
It was intended that you produce qrels for the run.txt that was sent to you.
(Should be roughly 50 documents per topic.)

Best regards,

Maik

ybrenning · May 16, 2023, 9:32am

Hey,

after running steps 0-2, we have generated a working HTML file as well as a run.txt in the first format listed in the assigment (qid 0 docno relevance).

The .txt we recieved via e-mail is in the format of the second one listed in the assigment (qid Q0 docno rank score tag).

After following the steps of the tutorial, we still aren’t really sure where this text file we recieved via mail is supposed to come into play, since you said we produce the qrels for the run file we have been sent? I don’t really see in which steps of the tutorial this file actually gets used?

Thanks, Yannick

maik_froebe · May 20, 2023, 8:50am

Dear Yannick,

Sorry for the late response, I somehow overlooked your message.
The instructions are indeed not very clear.

I will discuss with @theelstner and @hscells how we can improve this, and will report back when we have clarified the instructions.

Best regards,

Maik

maik_froebe · May 23, 2023, 12:47pm

Dear @ybrenning,

We have now clarified and updated the assignment, thanks again for pointing us to the problem.
Theresa and Harry will write a mail with the updated assignment in a moment

Best regards,

Maik

ybrenning · June 1, 2023, 12:55pm

Hey,

if I understand correctly, we are meant to create a new qrels.txt using the file we were sent at the beginning of milestone 2 (judgment-pool-top-10-ir-lab-sose2023-information-retrievers.txt) and by manually checking whether we think the documents are relevant to the assigned topics?

Then we register qrels to the ir_dataset and re-do step 2 in which we previously generated the bm25-output and send that updated run.txt along with the qrels.txt via e-mail?

Thanks,
Yannick

maik_froebe · June 1, 2023, 1:37pm

Dear Yannick,

Yes, that is correct, you create the qrels.txt by manually judging if the documents are relevant to the assigned topic, and register this to ir_datasets (and add this to your Docker image). At the end of step 2, you use this updated ir_datasets image to calculate the effectiveness of your system (e.g., BM25) with measures like nDCG. Please submit the qrels.txt file and also the effectiveness score of your run (e.g., its nDCG score, you can just add this as text in your mail).

Best regards,

Maik

ybrenning · June 2, 2023, 9:03am

Thanks for the clarification. An issue we noticed when updating the docker image with qrels is that the current version of the tutorial has some naming conflicts with the original milestone which might be confusing.
For instance, the tira-run command assumes the dataset is in a different directory that what it was supposed to be originally named in the first milestone, and the ir_datasets_id argument is set to iranthology-<YOUR-GROUP-NAME>, whereas in milestone 1 we were told to do this in the format of iranthology-ir-lab-sose2023-<YOUR-GROUP-NAME>.

I’m not sure whether the different namings are intentional or not but for us, it didn’t work with the ones from the tutorial so we had to go back and find what they were called in step one for the command to work which might be confusing for some groups.

Are we meant to rename the different IDs and tags in the format of these new steps (obviously we renamed the updated image such that it contains -qrels at the end but the rest of the namings as well)? Or is it fine the way we have done it here since the tira-run seems to have worked?

Thanks,
Yannick

maik_froebe · June 2, 2023, 10:00am

Hi,

Ah, ok, this might indeed be confusing, thanks for notifying us, we did not recognize this ambiguity.
As far as I see, there are two lines that we have to update in the tutorial, right?

https://github.com/mam10eks/ir-lab-sose-2023/blob/main/milestone-02/README.md?plain=1#L30
- iranthology-<YOUR-GROUP-NAME> to iranthology-ir-lab-sose2023-<YOUR-GROUP-NAME>
https://github.com/mam10eks/ir-lab-sose-2023/blob/main/milestone-02/README.md?plain=1#L109
- iranthology-<YOUR-GROUP-NAME> to iranthology-ir-lab-sose2023-<YOUR-GROUP-NAME>

Is this correct?

Thanks!

Best regards,

Maik

ybrenning · June 2, 2023, 10:10am

It might make sense to update the tutorial, but I’m not sure if the other groups also followed this naming convention and have this problem as a result or if it’s just us – it might be worth just putting a disclaimer there so people know what names to go with for these steps so it stays consistent.

If you’re updating it to the original namings, then the renaming should also be in the --datasets argument here:

github.com

mam10eks/ir-lab-sose-2023/blob/main/milestone-02/README.md?plain=1#LL93C5-L93C5


      
          --command 'diffir --dataset iranthology-<YOUR-GROUP-NAME> --web $outputDir/run.txt > $outputDir/run.html'

… at least that’s what we also had to update for us.

Thanks

maik_froebe · June 4, 2023, 9:58am

Thanks, maḱes perfect sense!

Best regards,

Maik

maik_froebe · June 4, 2023, 10:01am

I now have updated the tutorial accordingly: Update README.md · mam10eks/ir-lab-sose-2023@50d8df8 · GitHub

Best regards,

Maik