Update README.md
Browse files
README.md
CHANGED
@@ -1 +1,124 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- role-playing
|
6 |
+
- character simulation
|
7 |
+
- llama
|
8 |
+
- llama-3.1
|
9 |
+
- persona
|
10 |
+
license: mit
|
11 |
+
datasets:
|
12 |
+
- Neph0s/CoSER
|
13 |
+
---
|
14 |
+
|
15 |
+
# CoSER Models
|
16 |
+
|
17 |
+
CoSER models are state-of-the-art models for role-playing language agents (RPLAs), built upon LLaMA-3.1 base models (8B and 70B). These models are trained on the [CoSER dataset](https://huggingface.co/datasets/Neph0s/CoSER), which contains authentic multi-turn, multi-character dialogues extracted from 771 renowned novels.
|
18 |
+
|
19 |
+
CoSER models exhibit excellent role-playing capabilities. They can produce highly human-like responses across a wide range of personas, including both established fictional characters or original characters. They excel at capturing nuanced personalities, maintaining consistent character traits, and adapting to diverse role-playing scenarios. Results of extensive experiments demonstrate that CoSER models exhibit state-of-the-art role-playing performance across multiple benchmarks.
|
20 |
+
|
21 |
+
|
22 |
+
### Model Variants
|
23 |
+
|
24 |
+
- **CoSER-8B**: Fine-tuned from LLaMA-3.1-8B
|
25 |
+
- **CoSER-70B**: Fine-tuned from LLaMA-3.1-70B
|
26 |
+
|
27 |
+
## Training Data
|
28 |
+
|
29 |
+
The models are trained on the [CoSER dataset](https://huggingface.co/datasets/Neph0s/CoSER), which differs from existing RPLA datasets in two fundamental ways:
|
30 |
+
|
31 |
+
1. It extracts authentic multi-turn, multi-character dialogues from acclaimed literary works, maintaining high source fidelity while exhibiting greater quality and complexity.
|
32 |
+
|
33 |
+
2. It incorporates comprehensive types of data:
|
34 |
+
- Character profiles, dialogues, plot summaries, character experiences, and conversation backgrounds.
|
35 |
+
- Conversations that capture characters' internal thoughts and physical actions beyond surface-level speech
|
36 |
+
|
37 |
+
## Training Methodology
|
38 |
+
|
39 |
+
Our training approach is based on "given-circumstance acting" (GCA):
|
40 |
+
|
41 |
+
Given a conversation with messages M, characters C, and setting S, the actor LLM is required to sequentially portray each character c∈C to recreate the conversation. During training, for each character c, we optimize the language modeling loss on their corresponding messages.
|
42 |
+
|
43 |
+
## Performance and Evaluation
|
44 |
+
|
45 |
+
We evaluate our models via GCA Evaluation. It is a comprehensive approach that includes multi-agent simulation and penalty-based LLM assessment:
|
46 |
+
|
47 |
+
1. We generate conversations via multi-agent simulation, where the actor LLM portrays each character within a given setting, coordinated by a next-actor-prediction model to manage turn-taking.
|
48 |
+
|
49 |
+
2. We assess the generated conversations using penalty-based LLM judges, which are provided detailed rubrics and original conversations for reference.
|
50 |
+
|
51 |
+
### Performance on Given-Circumstance Acting
|
52 |
+
|
53 |
+
CoSER models outperform existing open-source LLMs on multiple RPLA benchmarks and are comparable to state-of-the-art closed-source models like GPT-4o.
|
54 |
+
|
55 |
+
| Model | Storyline Consistency | Anthropomorphism | Character Fidelity | Storyline Quality | Average Score | BLEU | ROUGE-L |
|
56 |
+
|-------|----------------------|------------------|-------------------|------------------|--------------|------|---------|
|
57 |
+
| **Close-source Models** | | | | | | | |
|
58 |
+
| Abab7-preview | 56.81 | 44.23 | 43.83 | 74.83 | 54.92 | 4.96 | 11.50 |
|
59 |
+
| Doubao-pro | 60.95 | 49.72 | 47.02 | 79.28 | 59.24 | 6.38 | 12.95 |
|
60 |
+
| Step-1-Flash | 57.75 | 48.12 | 44.48 | 75.93 | 56.57 | 5.95 | 12.71 |
|
61 |
+
| Step-2 | 61.43 | 49.06 | 47.33 | 77.96 | 58.94 | 5.75 | 12.50 |
|
62 |
+
| GPT-3.5 | 57.22 | 43.30 | 42.29 | 73.91 | 54.18 | 4.58 | 11.80 |
|
63 |
+
| GPT-4o | **61.59** | 48.93 | **48.95** | **80.33** | **59.95** | 5.90 | 12.11 |
|
64 |
+
| GPT-4o Mini | 60.09 | 48.21 | 44.88 | 78.55 | 57.93 | 3.90 | 10.81 |
|
65 |
+
| Gemini Pro | 59.11 | 52.41 | 47.83 | 77.59 | 59.24 | 5.39 | 11.65 |
|
66 |
+
| Claude-3-Haiku | 58.18 | 44.66 | 41.88 | 74.14 | 54.71 | 4.80 | 12.02 |
|
67 |
+
| Claude-3.5-Sonnet | 57.45 | 48.50 | 45.69 | 77.23 | 57.22 | 5.17 | 11.45 |
|
68 |
+
| **Open-source Models** | | | | | | | |
|
69 |
+
| Mistral-7B | 59.90 | 40.00 | 44.75 | 61.93 | 51.64 | 2.71 | 9.28 |
|
70 |
+
| Qwen-2-7B | 51.96 | 35.48 | 31.51 | 63.18 | 45.53 | 4.21 | 10.71 |
|
71 |
+
| LLaMA-3.1-8B | 54.10 | 45.36 | 40.22 | 72.29 | 52.99 | 4.59 | 10.18 |
|
72 |
+
| CoSER-8B | 58.61 | 47.23 | 46.90 | 73.04 | 56.45 | 9.40 | 14.21 |
|
73 |
+
| Vicuna-13B-1.5 | 52.75 | 39.12 | 38.04 | 60.43 | 47.58 | 1.67 | 5.59 |
|
74 |
+
| Mixtral-8x7B | 51.25 | 38.44 | 36.92 | 67.69 | 48.58 | 5.28 | 11.66 |
|
75 |
+
| Qwen-2-72B | 57.75 | 47.28 | 46.62 | 76.60 | 57.06 | 5.38 | 11.85 |
|
76 |
+
| LLaMA-3.1-70B | 57.46 | 45.95 | 43.72 | 74.84 | 55.49 | 4.82 | 10.98 |
|
77 |
+
| Higgs-Llama-3-70B | 57.10 | 43.82 | 42.41 | 75.62 | 54.74 | 3.99 | 10.92 |
|
78 |
+
| CoSER-70B | 58.66 | **53.33** | 48.75 | 75.49 | 59.06 | **10.10** | **14.78** |
|
79 |
+
| DeepSeek-V3 | 56.40 | 47.87 | 44.02 | 76.66 | 56.24 | 4.54 | 11.02 |
|
80 |
+
|
81 |
+
*Note: Bold values indicate best performance across all models.*
|
82 |
+
|
83 |
+
### Performance on Existing RPLA Benchmarks
|
84 |
+
|
85 |
+
| Model | InCharacter Dim | InCharacter Full | Life Choice | CroSS MR |
|
86 |
+
|-------|----------------|------------------|-------------|----------|
|
87 |
+
| LLaMA-3.1-8B | 64.97 | 15.62 | 61.10 | 30.15 |
|
88 |
+
| CoSER-8B | 75.80 | 21.88 | 69.54 | 44.94 |
|
89 |
+
| *CoSER-8B trained w/o I.T.* | 70.70 | 15.62 | 59.92 | 43.14 |
|
90 |
+
| LLaMA-3.1-70B | 72.16 | 31.25 | 86.48 | 61.30 |
|
91 |
+
| Higgs-Llama-3-70B | 74.52 | 28.12 | 74.03 | 60.12 |
|
92 |
+
| CoSER-70B | 75.80 | **34.38** | **93.47** | **64.49** |
|
93 |
+
| *CoSER-70B trained w/o I.T.* | 73.12 | 32.14 | 93.18 | 63.14 |
|
94 |
+
| Qwen-2-72B | 74.52 | 31.25 | 81.14 | 62.57 |
|
95 |
+
| GPT-3.5 | 71.20 | 21.88 | 78.07 | 30.09 |
|
96 |
+
| GPT-4o | **76.54** | 32.62 | 75.96 | **64.49** |
|
97 |
+
| Claude-3.5-Sonnet | 72.61 | 21.88 | 86.07 | 30.59 |
|
98 |
+
|
99 |
+
*Note: Bold values indicate best performance. I.T. denotes inner thoughts. For InCharacter, we report accuracy for individual (Dim) and full (Full) dimensions on BFI.*
|
100 |
+
|
101 |
+
## Ethical Considerations
|
102 |
+
|
103 |
+
We have conducted safety checks on the training dataset and removed potentially problematic content. However, users should be aware that:
|
104 |
+
|
105 |
+
- The models may still generate content that reflects biases present in the literary works they were trained on.
|
106 |
+
- Role-playing as certain characters might involve generating content that includes negative traits or behaviors.
|
107 |
+
- Users should implement appropriate safeguards when deploying these models in applications.
|
108 |
+
|
109 |
+
## Citation
|
110 |
+
|
111 |
+
If you use CoSER models in your research, please cite our paper:
|
112 |
+
|
113 |
+
```
|
114 |
+
@misc{wang2025cosercoordinatingllmbasedpersona,
|
115 |
+
title={CoSER: Coordinating LLM-Based Persona Simulation of Established Roles},
|
116 |
+
author={Xintao Wang and Heng Wang and Yifei Zhang and Xinfeng Yuan and Rui Xu and Jen-tse Huang and Siyu Yuan and Haoran Guo and Jiangjie Chen and Wei Wang and Yanghua Xiao and Shuchang Zhou},
|
117 |
+
year={2025},
|
118 |
+
eprint={2502.09082},
|
119 |
+
archivePrefix={arXiv},
|
120 |
+
primaryClass={cs.CL},
|
121 |
+
url={https://arxiv.org/abs/2502.09082},
|
122 |
+
}
|
123 |
+
```
|
124 |
+
|