Number of spans in the "multi" setting is ambiguous

Hi, we are now currently looking into the given dataset and have a question regarding “multi” setting problem.
In some data, we found that there are fewer numbers of answer spans than given in postText.
For example, even when the postText text says “10 secrets for the beauty” we only have 5 answer spans or so.

Are there any reasons for restricting the number of the answer spans?

Thanks in advance!
Hiroto

Dear Hiroto,

Indeed, this seems like a mistake in the annotation.
I could not find this example in the data via grep, could you please give me the uuid of the example?

I have started the discussion with the team that annotated the data to see if this was done on purpose and will report back soon!

Thanks for bringing this up!

Best Regards,

Maik

1 Like

Sorry if this is too long, but these are the cases we are aware of now :

e26fa6ed-e364-4666-af8e-20fbada53839
37d02016-a798-4ef3-9649-92fb8a7beba8
09de38dd-b1cb-4be4-98b1-0713ebf95d81
e920b3d9-eb55-4968-a125-e9444a23cf7d
f993f923-0c81-4031-b472-af03627d82ea
2ddbf430-b040-4757-8bd2-94bf889a799f
62ee9cc8-b6bc-46e0-90b0-cf206917549b
1b2fc11f-1efe-445b-9550-abbea6331c70
f7dd4922-bb06-4b78-939e-1f2043f4bd0a
6ee11517-8aa0-46b6-bc17-6603b03106a6
c462dc86-ddf5-4696-9471-4762fc26e864
78547857-7d02-44c4-9710-6953bc63416b
5b3e05b9-9239-4f26-b54f-aa66d86fd6ba
db217424-e00c-4022-a763-d71567a32e1b
24fe5481-0a88-400d-affc-3eccb875a3d1
e7f47e84-d64c-458d-b202-4eb04f757e49
56f9e86b-5d40-4cfd-9030-3e4372c1d46d
fc0f025f-6795-44af-920a-4f5651db765b
4fc4e3cf-84bc-4825-b578-cccc0f38b556
7fc75f6b-b642-4487-99c4-dba37c445116
c1d46a94-3d76-44b6-a506-cdd324c5aeb1
4d1f9190-67bb-41eb-99ee-6fae6f96396f
7dc8a6d1-0018-422b-81f4-3139432c1a77
fe58fc73-a128-4877-b5d8-de156aba3b3e
140b8388-8a52-4e28-8d6b-4450f0837842
d0085653-9ac6-47f2-812e-13b74117c514
de6c3687-74b7-4ad3-afa3-54a80edd1ef1
e131536c-67c6-4417-bc2e-5fbbc19a5cf2
dac86f53-6841-4411-9ca7-852dfa85e5ff
7945d61c-d0c0-4ff8-a6a0-78fe1a0ad37e
c47617ae-7df9-4d4e-b2fc-9ad387b1bd60
e8a1cdcd-1d5e-4d9e-979a-214308a1d5a1
d774ff4f-204e-449d-93d6-989741156e6b
2d4dadd7-abc0-4ddf-b829-e458d98763ac
e9691ab3-7a37-4bd0-9b28-10bebf890ef3
d470f76e-cc47-41a0-97e7-c1531ba1d989
e1bbeddf-1b67-41ea-b864-ab5636704423
47d5dca9-fd4a-4be1-a760-63012f055497
23b580a3-9c9b-498b-a96f-919867301642
65781a1e-43c9-42ca-88d5-c179c8c83c3f
3947c856-8624-4754-9be4-56d45091933a
a01ecb09-7d19-4be6-8f18-129a0a34547b
7eb18a55-8839-4801-84b8-b053ad89f913
5f716570-c8b5-41f9-8511-8a4d1f80bd05
bac12bcd-2f02-4d34-8fc5-c96d6aa41858
08a8ff7c-c020-4752-8b29-180f25992bde
c81dd1ac-22a2-4667-a751-9ae046365c74
8d348922-b756-45c9-bc22-ed66719c37ff
03c2bcef-eade-47c4-aa70-54e9d6f7ff59
f6dcfe21-a82c-4c42-8b64-a9d7da45e97f
bd093282-c1b7-44eb-931d-4f1cd6fdb091
632182d1-febe-4561-bc05-67b555b4e4a9
eef3b641-3774-4fe7-a04b-da3b6118f6d3
95878179-f5dd-4b22-a18b-b060fe010f1f
2396a34f-256e-4c73-8cb7-56267321c28e
47fbb26f-38c8-47e0-8b84-9041df3828ee
2cd7bc63-149a-42f9-9062-fc93b3df9dfe
e704d337-2199-4414-9447-1aaf90d3a653
0d7808d5-9404-4e8f-89e1-f8254d50dd46
307ecdff-e027-4d75-9ef7-47fa98dfd78c
aec3502e-ae4a-486d-a21c-58964756ccad
1d846809-340a-41f8-8701-361d2adcc5c4
c61cf416-49f2-45bc-8bbe-03fd5f07f0e8
468d384e-7de9-4ff4-ad8b-96c4db519ad4
0c51e596-4d7d-4dbf-aa83-f884f719aacb
0d37143e-b6d3-4550-b5ac-5ece02558c4f
e1aeb6dc-cd6b-4140-9dfd-61ca5727ab5f
05b8ebf5-c8ac-455d-884a-a7ae28296943
1d4f59dc-58e0-458a-b77e-054070305497
622fd9d5-f781-44fd-ae41-60cc19b04ff8
8f8df563-0dcf-4a9e-a3b4-0db0aa161d77
4b0a5d6a-bf2a-459b-9aac-2d734311ab1e
69d9a473-9969-4165-a617-896cda57826b
3dd12d6e-eaff-4874-b828-b3a33500e650
021d6a01-8acc-46b8-92ab-7d4d93e8f026
886a6aeb-878f-4702-9513-9d39d2000f0f
c31b03d7-0b30-46aa-8401-c8e90de4f554
f7c029f2-3921-4e34-904f-0120992ee99c
760f6ed3-40bb-48bc-8e98-064b7644c219
68d1bbfa-90c5-4bc5-90ac-3d51b77686a1
0bb6a3b9-fc6c-4fc1-811a-1aa799e43904
1ac8e821-6e30-4e34-9db9-22597cc86ffc
4fafab3a-7215-4fc9-ac7a-1c7a3ff67e09
094ef558-7fa9-4de1-8c8c-18795f65490c
0a856399-6932-44cd-9cd0-19da37461b3b
29e4c789-b315-4358-a98f-9676dca27497
95eb323e-ccc2-43e6-9d09-8f084c947377
be9ec7b8-f7ef-499e-8788-1cd6bc1cf3cf
bc403500-41ee-48a0-8a66-edcda44d305f
18f0855d-12f0-4a59-8df2-6bc0b4885cd4
e32aa473-7a98-476a-a477-fe20780a03e6
4c2643e4-f776-4ca6-8198-799fd4e428ac
2883c48b-5e27-4d1b-b094-47805cd244f2
af0088fa-1149-401e-9e84-edd352bfd809
b0160363-131c-4cd8-8d8b-139899f3052a
4287926f-5fa2-46d7-a831-d2dfe1f5c529
c9b83c06-0c80-453e-9c64-67d4505129e2
7e74d931-ddd8-4699-a383-f3c18e6550b6
0c4cac4d-37b2-4dbf-87c8-26d7ca2ede5c
359830ce-2054-4848-bc07-a2896bc85067
8e4f4af8-c546-4dea-abd7-4c4989c84223
d65e4bf1-9e72-45c7-87c7-f7f820b04189
24e52e7e-9807-4178-926e-bde31f4b9071
b1879177-9ac4-4406-aced-540af912d20a
1287ec6b-320f-41f2-9d03-3839896e0b29
f3a38069-4754-4a40-9812-fcfb4f361da7
56bded40-8bfa-419a-a2ea-085d3b8bbd82
a43a4424-79ba-4a79-9b0e-98bb8adb4857
09b66f5a-e2ce-4b29-b119-82ed7b759515
ec50c4b7-e516-48dc-b95b-e2a3970ffe0a
1a958952-24b6-4f31-8dd1-3134a374483c
d745179e-0856-47d5-8f56-bd0232e01f8f
758c5b4a-c7d5-4b05-9df2-bab6742889bd
6f1e2d7d-e8e7-4d79-abd0-378fe3bda2aa
bf6c290f-0c72-40bc-84a4-b2b1683bed62
3c089e84-9d62-42ab-9892-56ca5fb0de9c
a333e6b2-44df-43ff-8316-a18529a86169
fefbf383-1a53-4386-b056-ee9d1b5c6763
e9070088-9d7b-4a6f-a695-4408e404f4b0
e10e3e41-ef65-4b77-97b4-65e91ee15a77
c724e671-dc78-4264-950b-b511d8376e01
0a5ce014-003e-451c-89b4-69c9595db13e
69aeba89-fddf-46ce-bf20-4c157ccbc809
ebb1c646-f519-419e-80f3-e960cc5dc385

Also, these are the cases that there is one span even though it is classified as “multi” :

a71e629a-e3d3-47ba-983a-6b05612f15c3
949efa66-6c8b-4f41-9228-dccd210eb613
c7742ca3-d4f2-4bac-b298-affe0663bdd1
02681a8a-2ab8-467c-81f2-20dfc992c5ce
bfcb996f-cacb-4c0a-8cc3-91031b7a0ca1
cf063554-4d89-465f-9a95-8fea31cdcb20
0f6cdb4a-2970-4052-bf09-07fce87ac640
25a13e35-79b2-444a-b266-16f890431892
be465ee4-29b6-4318-a768-d9f7e5399bab
226491ea-f5d7-47c8-93e7-c0e93403c93f
044bf920-d061-489a-b23b-ffea441dcc6e
5eacc669-401e-479e-9d63-e1691bd319c6
d26ebf6e-c0e9-4e68-9f10-0883676e2911
3f231175-fd0e-4a7d-bb77-419e95c6e2a8
edff3c60-d458-4781-ba0b-e73e4514641b
f8f15c9a-abce-4583-8487-53d207f0de7d
1fb8f7dd-1220-4bf8-9331-85e63ee6c943
1854dcdf-893c-4247-98c3-f1fb3523818a
169b55e0-3062-4472-ae0a-cc085003ec83
a568705e-68ad-476a-bce2-d028d647e2eb

In fact it seems like there are (almost) always at most 5 answers to a multispoiler query like “the x …”.
Most of the time these are the first 5 answers, but there are exceptions.

uuid: 3947c856-8624-4754-9be4-56d45091933a
postText: 30 things to stop doing to yourself. #10 is an absolute must.
answers: #10, #1

This beggs the question how this is going to be evaluated. It seems after finding all the correct answers one has to find the 5 required answer to maximise any score.

Did the data generation or the labelers just cut off after 5 answers? Getting the full lists would be hugely helpful.

Simon

Dear Hiroaki, Dear Simon,

I can now report back: Each multi-part spoiler contains the first five spoilers. If the spoilers in a document are organized in a numbered listicle (e.g., “the 10 most beautiful beaches” that start with position ten and end at position one), those five spoilers follow the numbers in the listicle. In all other cases, those five spoilers appear in the order of their appearance in the linked document.

The intention behind this is that we aimed to have rather short multi-spoilers that still provide the gist.

Does this answer your question?

Best regards,

Maik