Update README.md
Browse files
README.md
CHANGED
@@ -96,6 +96,27 @@ PLM-1.8B is a strong and reliable model, particularly in basic knowledge underst
|
|
96 |
|
97 |
## How to use PLM
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
### llama.cpp
|
100 |
|
101 |
The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
|
@@ -106,7 +127,7 @@ cd llama.cpp
|
|
106 |
pip install -r requirements.txt
|
107 |
```
|
108 |
|
109 |
-
Then we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
|
110 |
|
111 |
- For CPU
|
112 |
|
@@ -122,6 +143,18 @@ cmake -B build -DGGML_CUDA=ON
|
|
122 |
cmake --build build --config Release
|
123 |
```
|
124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
125 |
## Future works
|
126 |
|
127 |
- [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
|
|
|
96 |
|
97 |
## How to use PLM
|
98 |
|
99 |
+
Here we introduce some methods to use PLM models.
|
100 |
+
|
101 |
+
### Hugging Face
|
102 |
+
|
103 |
+
```python
|
104 |
+
import torch
|
105 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
106 |
+
|
107 |
+
# Load model and tokenizer
|
108 |
+
tokenizer = AutoTokenizer.from_pretrained("PLM-Team/PLM-1.8B-Instruct")
|
109 |
+
model = AutoModelForCausalLM.from_pretrained("PLM-Team/PLM-1.8B-Instruct", torch_dtype=torch.bfloat16)
|
110 |
+
|
111 |
+
# Input text
|
112 |
+
input_text = "Tell me something about reinforcement learning."
|
113 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
114 |
+
|
115 |
+
# Completion
|
116 |
+
output = model.generate(inputs["input_ids"], max_new_tokens=100)
|
117 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
118 |
+
```
|
119 |
+
|
120 |
### llama.cpp
|
121 |
|
122 |
The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
|
|
|
127 |
pip install -r requirements.txt
|
128 |
```
|
129 |
|
130 |
+
Then, we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
|
131 |
|
132 |
- For CPU
|
133 |
|
|
|
143 |
cmake --build build --config Release
|
144 |
```
|
145 |
|
146 |
+
Don't forget to download the GGUF files of the PLM. We use the quantization methods in `llama.cpp` to generate the quantized PLM.
|
147 |
+
|
148 |
+
```bash
|
149 |
+
huggingface-cli download --resume-download PLM-Team/PLM-1.8B-Instruct-gguf --local-dir PLM-Team/PLM-1.8B-Instruct-gguf
|
150 |
+
```
|
151 |
+
|
152 |
+
After build the `llama.cpp`, we can use `llama-cli` script to launch the PLM.
|
153 |
+
|
154 |
+
```bash
|
155 |
+
./build/bin/llama-cli -m ./PLM-Team/PLM-1.8B-Instruct-gguf/PLM-1.8B-Instruct-Q8_0.gguf -cnv -p "hello!" -n 128
|
156 |
+
```
|
157 |
+
|
158 |
## Future works
|
159 |
|
160 |
- [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
|