legolasyiu commited on
Commit
c1b8056
1 Parent(s): 49092a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -21
README.md CHANGED
@@ -114,27 +114,6 @@ outputs = model.generate(**inputs, max_new_tokens=20)
114
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
115
  ```
116
 
117
- ## For 4bit, here is example
118
- [4bit inference](https://colab.research.google.com/drive/1e1QbonIhSNuv7nUhMU7MQV6FcSKTVzCN?usp=sharing)
119
-
120
- ```py
121
- import torch
122
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
123
-
124
- # load model in 4-bit
125
- quantization_config = BitsAndBytesConfig(
126
- load_in_4bit=True,
127
- bnb_4bit_compute_dtype=torch.float16
128
- )
129
-
130
- model_id = "EpistemeAI/Fireball-12B-v1.0f"
131
- tokenizer = AutoTokenizer.from_pretrained(model_id)
132
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto", quantization_config=quantization_config)
133
- inputs = tokenizer("Should we prepay our private student loans, given our particular profile?", return_tensors="pt")
134
- outputs = model.generate(**inputs, max_new_tokens=120)
135
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
136
- ```
137
-
138
 
139
  > [!TIP]
140
  > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.
 
114
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
115
  ```
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
  > [!TIP]
119
  > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.