|
--- |
|
license: mit |
|
language: |
|
- zh |
|
- en |
|
--- |
|
|
|
We fine-tuned our ChemGPT2-QA-72B based on the Qwen2-72B-Instruct model. Our training data, ChemGPT-2.0-Data, has been open-sourced and is available at https://huggingface.co/datasets/ALmonster/ChemGPT-2.0-Data. |
|
We evaluated our model on the three chemistry tasks of C-Eval and compared it with GPT-3.5 and GPT-4. The results are as follows: |
|
|
|
|
|
## C-Eval |
|
|
|
| Models | college_chemistry | high_school_chemistry | middle_school_chemistry | AVG | |
|
|--------|-------------------|-----------------------|-------------------------|-----| |
|
| GPT-3.5 | 0.397 | 0.529 | 0.714 | 0.54666667 | |
|
| GPT4 | 0.594 | 0.558 | 0.811 | 0.65433333 | |
|
| chemgpt| 0.71 | 0.936 | 0.995 | 0.88033333 | |
|
|
|
|
|
## Quickstart |
|
|
|
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
device = "cuda" # the device to load the model onto |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"ALmonster/ChemGPT2-QA-72B", |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("ALmonster/ChemGPT2-QA-72B") |
|
|
|
prompt = "Give me a short introduction to large language model." |
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
model_inputs = tokenizer([text], return_tensors="pt").to(device) |
|
|
|
generated_ids = model.generate( |
|
model_inputs.input_ids, |
|
max_new_tokens=512 |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
``` |
|
|
|
## VLLM |
|
|
|
We recommend deploying our model using 4 A100 GPUs. You can run the vllm server-side with the following code in terminal: |
|
|
|
```python |
|
python -m vllm.entrypoints.openai.api_server --served-model-name chemgpt --model path/to/chemgpt --gpu-memory-utilization 0.98 --tensor-parallel-size 4 --port 6000 |
|
``` |
|
|
|
Then, you can use the following code to deploy client-side: |
|
|
|
```python |
|
import requests |
|
import json |
|
|
|
def general_chemgpt_stream(inputs,history): |
|
url = 'http://loaclhost:6000/v1/chat/completions' |
|
|
|
history+=[{"role": "user", "content": inputs},] |
|
|
|
data = { |
|
"model": "chemgpt", |
|
"messages": history, |
|
} |
|
|
|
headers = { |
|
'Content-Type': 'application/json' |
|
} |
|
|
|
response = requests.post(url, headers=headers, data=json.dumps(data)) |
|
|
|
headers = {"User-Agent": "vLLM Client"} |
|
|
|
pload = { |
|
"model": "chemgpt", |
|
"stream": True, |
|
"messages": history |
|
} |
|
response = requests.post(url, |
|
headers=headers, |
|
json=pload, |
|
stream=True) |
|
|
|
for chunk in response.iter_lines(chunk_size=1, |
|
decode_unicode=False, |
|
delimiter=b"\n"): |
|
if chunk: |
|
string_data = chunk.decode("utf-8") |
|
try: |
|
json_data = json.loads(string_data[6:]) |
|
delta_content = json_data["choices"][0]["delta"]["content"] |
|
assistant_reply+=delta_content |
|
yield delta_content |
|
except KeyError as e: |
|
delta_content = json_data["choices"][0]["delta"]["role"] |
|
except json.JSONDecodeError as e: |
|
history+=[{ |
|
"role": "assistant", |
|
"content": assistant_reply, |
|
"tool_calls": [] |
|
},] |
|
delta_content='[DONE]' |
|
assert '[DONE]'==chunk.decode("utf-8")[6:] |
|
|
|
inputs='介绍一下NaoH' |
|
history_chem=[] |
|
for response_text in general_chemgpt_stream(inputs,history_chem): |
|
print(response_text,end='') |
|
``` |