Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,35 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
Base Model: MPT-7B
|
6 |
+
|
7 |
+
This is a Hermes Lite version that excludes the training data of Nous Instruct that hermes model was also trained on, and is experimental.
|
8 |
+
|
9 |
+
Big thanks to BitTensor foundation for the compute to attempt this experiment!
|
10 |
+
|
11 |
+
There seems to have been some sort of problem with the training that I cannot identify, that, while it does seem improved from the base model, does not seem to have learned nearly as much as was learned by Llama in training Hermes.
|
12 |
+
|
13 |
+
Typically, the model would response with long responses when asked, be much more contextually intelligent, and answer in a thoughtful way. However, for whatever reason - likely something to do with not training with LLM-Foundry - the model does not like longer responses, and typical responds quite breifly.
|
14 |
+
|
15 |
+
You should load the model and tokenizer like so:
|
16 |
+
|
17 |
+
```python
|
18 |
+
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b')
|
19 |
+
tokenizer.pad_token = "<|padding|>"
|
20 |
+
model = AutoModelForCausalLM.from_pretrained(
|
21 |
+
"./Hermes-MPT7b/",
|
22 |
+
torch_dtype=torch.float16,
|
23 |
+
device_map='auto',
|
24 |
+
trust_remote_code=True
|
25 |
+
)
|
26 |
+
```
|
27 |
+
|
28 |
+
You should use the eos_token_id parameter in the generate function, and skip_special_tokens=True in the tokenizer decode.
|
29 |
+
|
30 |
+
```python
|
31 |
+
generated_ids = model.generate(input_ids, max_new_tokens=512, do_sample=True, top_p=0.5, top_k=0, repetition_penalty=1.1, min_new_tokens=100, eos_token_id=tokenizer.eos_token_id)
|
32 |
+
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
|
33 |
+
```
|
34 |
+
|
35 |
+
While the model is not quite where I'd like it to be, it could be useful for learning how MPT model works, and for some uses, so it is uploaded here.
|