Rationale: Responses can contain hallucinated content. For responses that must be correctly grounded in some external references, the effort/time users need to verify/double-check the generated content can vary widely. E.g., I need 10 seconds to verify the birth date of Barack Obama, but I might need hours to verify if an LLM hallucinated some event that did not exist.
Unit of annotation: ‘turn’
Multi-label: yes
Required attributes: [list the attributes that are required of a conversation or turn so that it can be annotated according to the guidelines below]
Guidelines: TODO: [general instructions of what annotators should pay attention to when annotating a conversation].
- [name of first label]: TODO: [instruction of when to choose the first label for a conversation or turn; may include a concise example]
- [name of second label]: TODO: [instruction of when to choose the second label for a conversation or turn; may include a concise example]
- [add more labels as needed]