Text Generation
PyTorch
English
Chinese
plm
conversational
custom_code
daven3 commited on
Commit
014ed68
·
verified ·
1 Parent(s): 935be54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -94,6 +94,27 @@ PLM-1.8B is a strong and reliable model, particularly in basic knowledge underst
94
 
95
  ## How to use PLM
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ### llama.cpp
98
 
99
  The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
@@ -104,7 +125,7 @@ cd llama.cpp
104
  pip install -r requirements.txt
105
  ```
106
 
107
- Then we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
108
 
109
  - For CPU
110
 
@@ -120,6 +141,18 @@ cmake -B build -DGGML_CUDA=ON
120
  cmake --build build --config Release
121
  ```
122
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ## Future works
124
 
125
  - [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.
 
94
 
95
  ## How to use PLM
96
 
97
+ Here we introduce some methods to use PLM models.
98
+
99
+ ### Hugging Face
100
+
101
+ ```python
102
+ import torch
103
+ from transformers import AutoTokenizer, AutoModelForCausalLM
104
+
105
+ # Load model and tokenizer
106
+ tokenizer = AutoTokenizer.from_pretrained("PLM-Team/PLM-1.8B-Instruct")
107
+ model = AutoModelForCausalLM.from_pretrained("PLM-Team/PLM-1.8B-Instruct", torch_dtype=torch.bfloat16)
108
+
109
+ # Input text
110
+ input_text = "Tell me something about reinforcement learning."
111
+ inputs = tokenizer(input_text, return_tensors="pt")
112
+
113
+ # Completion
114
+ output = model.generate(inputs["input_ids"], max_new_tokens=100)
115
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
116
+ ```
117
+
118
  ### llama.cpp
119
 
120
  The original contribution to the llama.cpp framwork is [Si1w/llama.cpp](https://github.com/Si1w/llama.cpp). Here is the usage:
 
125
  pip install -r requirements.txt
126
  ```
127
 
128
+ Then, we can build with CPU of GPU (e.g. Orin). The build is based on `cmake`.
129
 
130
  - For CPU
131
 
 
141
  cmake --build build --config Release
142
  ```
143
 
144
+ Don't forget to download the GGUF files of the PLM. We use the quantization methods in `llama.cpp` to generate the quantized PLM.
145
+
146
+ ```bash
147
+ huggingface-cli download --resume-download PLM-Team/PLM-1.8B-Instruct-gguf --local-dir PLM-Team/PLM-1.8B-Instruct-gguf
148
+ ```
149
+
150
+ After build the `llama.cpp`, we can use `llama-cli` script to launch the PLM.
151
+
152
+ ```bash
153
+ ./build/bin/llama-cli -m ./PLM-Team/PLM-1.8B-Instruct-gguf/PLM-1.8B-Instruct-Q8_0.gguf -cnv -p "hello!" -n 128
154
+ ```
155
+
156
  ## Future works
157
 
158
  - [ ] Release vLLM, SGLang, and PowerInfer inference scripts for PLM.