Gemma-7B in 8-bit with bitsandbytes
This is the repository for Gemma-7B quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found here. This is the base model and it's not instruction fine-tuned.
Usage
Please visit original Gemma-7B model card for intended uses and limitations.
You can use this model like following:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer("google/gemma-7b")
model = AutoModelForCausalLM.from_pretrained(
"merve/gemma-7b-8bit",
device_map='auto'
)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.