Llama3-70b-Instruct-4bit
This model is a quantized version of meta-llama/Meta-Llama-3-70B-Instruct
Libraries to Install
- pip install transformers torch
Authentication needed before running the script
Run the following command in the terminal/jupyter_notebook:
Terminal: huggingface-cli login
Jupyter_notebook:
>>> from huggingface_hub import notebook_login >>> notebook_login()
NOTE: Copy and Paste the token from your Huggingface Account Settings > Access Tokens > Create a new token / Copy the existing one.
Script
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> import torch
>>> # Load model and tokenizer
>>> model_id = "screevoai/llama3-70b-instruct-4bit"
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = AutoModelForCausalLM.from_pretrained(
>>> model_id,
>>> torch_dtype=torch.bfloat16,
>>> device_map="cuda:0"
>>> )
>>> # message
>>> messages = [
>>> {"role": "system", "content": "You are a personal assistant chatbot, so respond accordingly"},
>>> {"role": "user", "content": "What is Machine Learning?"},
>>> ]
>>> input_ids = tokenizer.apply_chat_template(
>>> messages,
>>> add_generation_prompt=True,
>>> return_tensors="pt"
>>> ).to(model.device)
>>> terminators = [
>>> tokenizer.eos_token_id,
>>> tokenizer.convert_tokens_to_ids("<|eot_id|>")
>>> ]
>>> # Generate predictions using the model
>>> outputs = model.generate(
>>> input_ids,
>>> max_new_tokens=512,
>>> eos_token_id=terminators,
>>> do_sample=True,
>>> temperature=0.6,
>>> top_p=0.9,
>>> )
>>> response = outputs[0][input_ids.shape[-1]:]
>>> print(tokenizer.decode(response, skip_special_tokens=True))
- Downloads last month
- 125
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for screevoai/llama3-70b-instruct-4bit
Base model
meta-llama/Meta-Llama-3-70B
Finetuned
meta-llama/Meta-Llama-3-70B-Instruct