BAAI
/

File size: 8,441 Bytes
8b3a0e2
 
 
8effa6a
0751545
 
5421fc1
 
 
 
 
 
 
8effa6a
4173554
8effa6a
0a13a12
8effa6a
4173554
8effa6a
4173554
8effa6a
4173554
8effa6a
4173554
8effa6a
 
adf8eba
 
 
 
 
 
a6575c9
 
8effa6a
 
 
bab4a98
 
 
 
 
 
 
 
56baf98
 
2fe8b7c
28e3d45
8effa6a
d3337d7
5267246
 
8effa6a
e77ef02
 
 
99dca70
984405c
7810f72
 
e77ef02
 
99dca70
bbac200
99dca70
2c10640
e77ef02
 
 
 
 
 
 
 
 
 
 
99dca70
 
 
 
450ad65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: other
---

![Aquila_logo](./log.jpeg)

<h4 align="center">
    <p>
        <b>English</b> |
        <a href="https://huggingface.co/BAAI/Aquila-7B/blob/main/README_zh.md">简体中文</a> |
    <p>
</h4>


Aquila Language Model is the first open source language model that supports both Chinese and English knowledge, commercial license agreements, and compliance with domestic data regulations.

- 🌟 **Supports open source commercial licenses**. The source code of the Aquila series models is based on the [Apache 2.0 agreement](https://www.apache.org/licenses/LICENSE-2.0), while the model weight is based on the [BAAI Aquila Model License Agreement](https://huggingface.co/BAAI/Aquila-7B/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf). Users can use it for commercial purposes as long as they meet the licensing restrictions.

- ✍️ **Possesses Chinese and English knowledge**. The Aquila series model is trained from scratch on a high-quality corpus of Chinese and English languages, with Chinese corpora accounting for about 40%, ensuring that the model accumulates native Chinese world knowledge during the pre-training phase, rather than translated knowledge.

- 👮‍♀️ **Complies with domestic data regulations**. The Chinese corpora of the Aquila series models come from Intelligence Source's accumulated Chinese datasets over the years, including Chinese internet data from over 10,000 sources (more than 99% of which are domestic sources), as well as high-quality Chinese literature and book data supported by authoritative domestic organizations. We will continue to accumulate high-quality and diverse datasets and incorporate them into the subsequent training of the Aquila base models.

- 🎯 **Continuous improvements and open sourcing**. We will continue to improve training data, optimize training methods, and enhance model performance, cultivate a flourishing "model tree" on a better base model foundation, and continuously update open-source versions.

The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels, including the [FlagAI GitHub repository](https://github.com/FlagAI-Open/FlagAI/), [FlagAI's Zhihu account](https://www.zhihu.com/people/95-22-20-18) and [FlagAI's official technical communication group](https://github.com/FlagAI-Open/FlagAI/blob/master/wechat-qrcode.jpg).


| Model              | Model Type               | Description                                                                                                                                                                                                                                                                                                                                                                                                     | Status         | GPUs Used    |
| :----------------- | :----------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :--------------| :----------- | 
| Aquila-7B          | Base model, 7 billion parameters   | **Aquila Base Model** inherits the architectural design advantages of GPT-3 and LLaMA. It replaces a batch of more efficient underlying operator implementations, redesigns the implementation of bilingual tokenizer, upgrades BMTrain parallel training method, and achieves nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2.                                   | Released       | Nvidia-A100   |
| Aquila-33B         | Base model, 33 billion parameters   | Same as above                                                                                                                                                                                                                                                                                                                                                                        | Coming soon                                               | Nvidia-A100   |
| AquilaChat-7B      | SFT model, fine-tuned and RL based on Aquila-7B  | **AquilaChat Dialog Model** supports fluent text dialogue and multiple language generation tasks, and realizes the call of AquilaChat to other models and tools by defining an expandable special instruction specification, which is easy to extend. For example, calling the open source **[AltDiffusion](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion-m18) multimodal language image generation model** of Flagship Intelligence achieved smooth image generation capability. Together with Flagship Intelligence's **InstructFace multi-step controllable text-picture model**, it is easy to achieve multi-step controllable editing of human face images. | Released    | Nvidia-A100   |
| AquilaChat-33B     | SFT model, fine-tuned and RL based on Aquila-33B  | Same as above                                                                                                                                                                                                                                                                                                                                                                                                   | Coming soon                                               | Nvidia-A100   |
| AquilaCode-7B-NV   | Base model, "text-code" generation model, further pre-trained based on Aquila-7B, trained on Nvidia  | AquilaCode-7B achieves high performance with small data sets and parameters, and is currently the best open source code model that supports both Chinese and English, trained using training code data with compliant open source licenses after high-quality filtering. AquilaCode-7B has been trained on both Nvidia and domestic chips for code models. | Released on GitHub | Nvidia-A100  |
| AquilaCode-7B-TS   | Base model, "text-code" generation model, further pre-trained based on Aquila-7B, trained on Horizon Robotics chips | Same as above                                                                                                                                                                                                                                                                                                                                                                                                             | Released on GitHub        | Tianshu-BI-V100 |



We will continue to release improved versions of Aquila model as open source. 

- 2023/07/14 :release v0.8
  - Aquila-7B-01  md5: b14329f7314c05dd79d44b2838c315aa
  - Aquila-7B-02  md5: 88aa286283c7b7dd78c0fbb7fae6327d
  - AquilaChat-7B-01 md5: 0a77901af35d3e5ed16eeafa622e2173
  - AquilaChat-7B-02 md5: 6e84423fe2837c79c0ced6817c316bd4


Aquila-7B v0.8 has shown improvements in the FlagEval large model evaluation ("Objective") compared to version 0.7. It achieved improvements of approximately 10.07% on MMLU_Chinese, 14.84% on TruthfulQA, and 7.94% on MMLU datasets. For detailed evaluation results, please refer to the website http://flageval.baai.ac.cn.
For detailed version change history, see [Change Log](https://huggingface.co/BAAI/Aquila-7B/blob/main/change_log.log).


## Quick Start  Aquila-7B

### 1. Inference

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_info = "BAAI/Aquila-7B"
tokenizer = AutoTokenizer.from_pretrained(model_info, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_info, trust_remote_code=True)
model.eval()
model.to("cuda:0")

text = "汽车EDR是什么"

tokens = tokenizer.encode_plus(text)['input_ids'][:-1]

tokens = torch.tensor(tokens)[None,].to("cuda:0")


with torch.no_grad():
    out = model.generate(tokens, do_sample=True, max_length=512, eos_token_id=100007)[0]

    out = tokenizer.decode(out.cpu().numpy().tolist())

    print(out)
```


## License

Aquila-7B and AquilaChat-33B open-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/Aquila-7B/resolve/main/BAAI%20Aquila%20Model%20License%20Agreement.pdf)