Update README.md
Browse files
README.md
CHANGED
@@ -94,6 +94,27 @@ PLM-1.8B is a strong and reliable model, particularly in basic knowledge underst
|
|
94 |
|
95 |
## How to use PLM
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
### llama.cpp
|
98 |
|
99 |
The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
|
@@ -104,7 +125,7 @@ cd llama.cpp
|
|
104 |
pip install -r requirements.txt
|
105 |
```
|
106 |
|
107 |
-
Then we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
|
108 |
|
109 |
- For CPU
|
110 |
|
@@ -120,6 +141,18 @@ cmake -B build -DGGML_CUDA=ON
|
|
120 |
cmake --build build --config Release
|
121 |
```
|
122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
## Future works
|
124 |
|
125 |
- [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
|
|
|
94 |
|
95 |
## How to use PLM
|
96 |
|
97 |
+
Here we introduce some methods to use PLM models.
|
98 |
+
|
99 |
+
### Hugging Face
|
100 |
+
|
101 |
+
```python
|
102 |
+
import torch
|
103 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
104 |
+
|
105 |
+
# Load model and tokenizer
|
106 |
+
tokenizer = AutoTokenizer.from_pretrained("PLM-Team/PLM-1.8B-Instruct")
|
107 |
+
model = AutoModelForCausalLM.from_pretrained("PLM-Team/PLM-1.8B-Instruct", torch_dtype=torch.bfloat16)
|
108 |
+
|
109 |
+
# Input text
|
110 |
+
input_text = "Tell me something about reinforcement learning."
|
111 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
112 |
+
|
113 |
+
# Completion
|
114 |
+
output = model.generate(inputs["input_ids"], max_new_tokens=100)
|
115 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
116 |
+
```
|
117 |
+
|
118 |
### llama.cpp
|
119 |
|
120 |
The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
|
|
|
125 |
pip install -r requirements.txt
|
126 |
```
|
127 |
|
128 |
+
Then, we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
|
129 |
|
130 |
- For CPU
|
131 |
|
|
|
141 |
cmake --build build --config Release
|
142 |
```
|
143 |
|
144 |
+
Don't forget to download the GGUF files of the PLM. We use the quantization methods in `llama.cpp` to generate the quantized PLM.
|
145 |
+
|
146 |
+
```bash
|
147 |
+
huggingface-cli download --resume-download PLM-Team/PLM-1.8B-Instruct-gguf --local-dir PLM-Team/PLM-1.8B-Instruct-gguf
|
148 |
+
```
|
149 |
+
|
150 |
+
After build the `llama.cpp`, we can use `llama-cli` script to launch the PLM.
|
151 |
+
|
152 |
+
```bash
|
153 |
+
./build/bin/llama-cli -m ./PLM-Team/PLM-1.8B-Instruct-gguf/PLM-1.8B-Instruct-Q8_0.gguf -cnv -p "hello!" -n 128
|
154 |
+
```
|
155 |
+
|
156 |
## Future works
|
157 |
|
158 |
- [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
|