Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ We release ChatQA1.5, which excels at RAG-based conversational question answerin
|
|
19 |
Results in ConvRAG are as follows:
|
20 |
|
21 |
| | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
|
22 |
-
| --
|
23 |
| Doc2Dial | 37.88 | 33.51 | 37.88 | 34.16 | 38.9 | 39.33 | 41.26 |
|
24 |
| QuAC | 29.69 | 34.16 | 36.96 | 40.29 | 41.82 | 39.73 | 38.82 |
|
25 |
| QReCC | 46.97 | 49.77 | 51.34 | 52.01 | 48.05 | 49.03 | 51.40 |
|
@@ -33,7 +33,24 @@ Results in ConvRAG are as follows:
|
|
33 |
| Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
|
34 |
| Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
|
35 |
|
36 |
-
Note that ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## How to use
|
39 |
```python
|
@@ -57,6 +74,7 @@ def get_formatted_input(messages, context):
|
|
57 |
|
58 |
for item in enumerate(messages):
|
59 |
if item['role'] == "user":
|
|
|
60 |
item['content'] = instruction + " " + item['content']
|
61 |
break
|
62 |
|
@@ -88,15 +106,17 @@ response = outputs[0][input_ids.shape[-1]:]
|
|
88 |
print(tokenizer.decode(response, skip_special_tokens=True))
|
89 |
```
|
90 |
|
91 |
-
##
|
92 |
Zihan Liu ([email protected]), Wei Ping ([email protected])
|
93 |
|
94 |
## Citation
|
95 |
-
<pre
|
|
|
96 |
title={ChatQA: Building GPT-4 Level Conversational QA Models},
|
97 |
author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
|
98 |
journal={arXiv preprint arXiv:2401.10225},
|
99 |
-
year={2024}}
|
|
|
100 |
|
101 |
|
102 |
## License
|
|
|
19 |
Results in ConvRAG are as follows:
|
20 |
|
21 |
| | ChatQA-1.0-7B | Command-R-Plus | Llama-3-instruct-70b | GPT-4-0613 | ChatQA-1.0-70B | ChatQA-1.5-8B | ChatQA-1.5-70B |
|
22 |
+
| -- |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
23 |
| Doc2Dial | 37.88 | 33.51 | 37.88 | 34.16 | 38.9 | 39.33 | 41.26 |
|
24 |
| QuAC | 29.69 | 34.16 | 36.96 | 40.29 | 41.82 | 39.73 | 38.82 |
|
25 |
| QReCC | 46.97 | 49.77 | 51.34 | 52.01 | 48.05 | 49.03 | 51.40 |
|
|
|
33 |
| Average (all) | 47.71 | 50.93 | 52.52 | 53.90 | 54.14 | 55.17 | 58.25 |
|
34 |
| Average (exclude HybriDial) | 46.96 | 51.40 | 52.95 | 54.35 | 53.89 | 53.99 | 57.14 |
|
35 |
|
36 |
+
Note that ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ConvRAG can be found here.
|
37 |
+
|
38 |
+
|
39 |
+
## Prompt Format
|
40 |
+
<pre>
|
41 |
+
System: {System}
|
42 |
+
|
43 |
+
{Context}
|
44 |
+
|
45 |
+
User: {Question}
|
46 |
+
|
47 |
+
Assistant: {Response}
|
48 |
+
|
49 |
+
User: {Question}
|
50 |
+
|
51 |
+
Assistant:
|
52 |
+
</pre>
|
53 |
+
|
54 |
|
55 |
## How to use
|
56 |
```python
|
|
|
74 |
|
75 |
for item in enumerate(messages):
|
76 |
if item['role'] == "user":
|
77 |
+
## only apply this instruction for the first user turn
|
78 |
item['content'] = instruction + " " + item['content']
|
79 |
break
|
80 |
|
|
|
106 |
print(tokenizer.decode(response, skip_special_tokens=True))
|
107 |
```
|
108 |
|
109 |
+
## Correspondence to
|
110 |
Zihan Liu ([email protected]), Wei Ping ([email protected])
|
111 |
|
112 |
## Citation
|
113 |
+
<pre>
|
114 |
+
@article{liu2024chatqa,
|
115 |
title={ChatQA: Building GPT-4 Level Conversational QA Models},
|
116 |
author={Liu, Zihan and Ping, Wei and Roy, Rajarshi and Xu, Peng and Lee, Chankyu and Shoeybi, Mohammad and Catanzaro, Bryan},
|
117 |
journal={arXiv preprint arXiv:2401.10225},
|
118 |
+
year={2024}}
|
119 |
+
</pre>
|
120 |
|
121 |
|
122 |
## License
|