This is still WIP, but I already wanted to add my favourite evaluation aspect before I forget it
Rationale: Conversational systems deployed in settings where interactions by human experts is required at some point should make it as transparent as possible that they will delegate the task to humans and that the humans will be able to see the whole conversation and that the chatbot is only there to ensure that the user submits all required information and that the conversational system routes everything to a suitable human
Unit of annotation: conversation
Multi-label: [either ‘yes’ if a unit of annotation can have any number of the labels defined below or ‘no’ if it must have exactly one label]
Required attributes: Not sure yet, maybe a range of possible intents that the service will need?
Guidelines: [general instructions of what annotators should pay attention to when annotating a conversation].
- Non-Transparent Delegation: The conversational system hides that the data will be submitted to users and/or tries to mimic a human (e.g., similar to the wizard of Wikipedia).
- [name of second label]: The conversational system says that it is a bot, but not that it will delegate the tasks
- [transparent]: the bot says it is a bot and will delegate it upon request
- [proactive transparency]: tbd