Multi-author writing style analysis path variables

hamed_hashemi · May 26, 2023, 5:51pm

Hi there,
I have a question regarding the instructions for the multi-author writing style task in PAN. According to the instructions provided on the task’s website, the command should be formatted as follows:
mySoftware -i INPUT-DIRECTORY -o OUTPUT-DIRECTORY
based on the instructions, within the INPUT-DIRECTORY , there are three folders (dataset1 , dataset2 , dataset3 ). However, I am confused about the $InputDataset and $OutputDir variables in the context of TIRA. Specifically, if our program is supposed to read from three directories (one for each dataset) and write the outputs in three subfolders, what does it mean when we choose only one dataset in TIRA for running the software? I would appreciate it if you could provide clarification on the specific paths that the variables $InputDataset and $OutputDir are meant to represent based on the instructions provided on the task’s website.

maik_froebe · May 27, 2023, 6:52am

Dear @hamed_hashemi,

Thanks for reaching out and for participating in the task!
From the description on the web page, I was thinking that the dataset provides the following structure:

$inputDataset/
  dataset1/
    problem-1.txt
    ...
    problem-m.txt
  dataset2
    problem-1.txt
    ...
    problem-n.txt
  dataset3
    problem-1.txt
    ...
    problem-o.txt

I.e., that $inputDataset is a directory that itself contains the three directories, and that the output should follow the same structure:

$outputDir
  dataset1/
    solution-problem-1.json
    ...
    solution-problem-m.json
  dataset2
    solution-problem-1.json
    ...
    solution-problem-n.json
  dataset3
    solution-problem-1.json
    ...
    solution-problem-o.json

But from looking at the actual data, it seems like the actual data format is like this:

The input:

$inputDataset/
  problem-1.txt
  ...
  problem-n.txt

The output:

$inputDataset/
  solution-problem-1.json
  ...
  solution-problem-n.json

I will reach out to the organizer of the subtask so that we can clarify this.
Sorry for the inconvenience, we will report back as soon as possible.

Best regards,

Maik

AbeerSaad · May 27, 2023, 1:41pm

Dear Maik,
I have the same question for the same task (the multi-author writing style task).
Thank you for replying and I am looking forward to hearing from you.

Regarding to the same question, is there a possible way to send our solution via email instead of docker?
Because its building takes a long time, and may not be completed before the deadline.

Thanks in advance.

Regards,
Abeer

maik_froebe · May 29, 2023, 8:22am

Dear Hamed, Dear Abeer,

There was indeed an outdated version of the input/output format on the web page, this is now updated: PAN at CLEF 2023 - Style Change Detection and Multi-Author Writing Style Analysis

I.e., The data format is as follows:

The Input Format

$inputDataset/
  problem-1.txt
  ...
  problem-n.txt

The Output Format

$inputDataset/
  solution-problem-1.json
  ...
  solution-problem-n.json

@AbeerSaad For our question: in most cases, it should be no problem to upload the solutions, TIRA allows run submissions. @nikolay.kolyada, @mmayerl, @MattiWiegmann can you please clarify this?

Best regards,

Maik

AbeerSaad · May 29, 2023, 8:28pm

Thank you for your reply and the clarification.

Sorry for bothering you, but this is the first time I used Docker and until now I still have some issues. I started with Jupyter and it took a long time to complete building an image, sometimes up to 18 hours. So I switched to PowerShell. It seems to be faster.
Now, I built the image, and as a next step according to the instruction in the video, I have to test if the image works using the run command below. But another issue related to arguments appeared. Currently, I am working around it for more than 12 hours without getting a result.

docker run --rm <my-registry>

After that, I have to check whether the image will work on Tira (after resolving the submission page problem) before the deadline.

So, I am a little bit afraid that I will face another problem with Docker preventing me from participating. It would be a good chance if it is possible to allow another way to submit the solution.

Thank you again. I really appreciate your help and consideration.

Regards,
Abeer

maik_froebe · May 30, 2023, 5:44am

Dear Abeer,

Thanks for reaching out!
I am sure we will find a solution. I think in that case it makes sense to give you the test data without the ground truth so that you can generate the predictions on your side and upload them.

@nikolay.kolyada, @mmayerl, @MattiWiegmann are involved in this task, so they will contact you with the next steps for this.

Best regards,

Maik

maik_froebe · May 30, 2023, 4:52pm

Dear Abeer,

To also keep you in the loop: The deadline for software submissions is now extended (please note that the deadline for submitting the notebook is now before the software submission deadline).

Best regards,

Maik

maik_froebe · May 30, 2023, 6:52pm

Dear Abeer,

I wanted to update you that our cloud provider has resolved the issue with the Ceph filesystem successfully. TIRA restarted itself without supervision, and we now also did extensive tests that everything works again.

It looks very good, and you can now again make submissions

Best regards,

Maik

mmayerl · May 30, 2023, 9:28pm

Hi everyone!

Apologies for the confusion, the description on the task page is indeed outdated. This year, the dataset folders are “flat”, i.e. contain the relevant files directly - and so should the output.

@AbeerSaad I will contact you via mail and send you the test sets without ground truth, so you can run the inference on your own machine.