HPD
In recent years, Dialogue-style Large Language Models (LLMs) such as ChatGPT and GPT4 have demonstrated immense potential in constructing open-domain dialogue agents. However, aligning these agents with specific characters or individuals remains a considerable challenge due to the complexities of character representation and the lack of comprehensive annotations. In this paper, we introduce the Harry Potter Dialogue (HPD) dataset, designed to advance the study of dialogue agents and character alignment.
HPD encompasses all dialogue sessions from Harry Potter novels (English and Chinese versions). In total, we obtain 1042 dialogue sessions for training (containing 1 positive response only) and 149 sessions for testing (containing 1-3 positive responses and 9 negative responses in average). We also annotate each conversation with essential background information that we believe is useful for aligning dialogue agents with Harry, including dialogue scenes, speakers, character relationships, and attributes. With the goal of giving a full picture of the speakers in dialogue, we have labeled each speaker with 12 types of relationships with Harry Potter and 13 types of attributes. Please note that even if the speakers in two different dialogues are identical, their relationships and attributes may change due to the story's progression.