Model Card for Model ID

This is a quantized version of Llama 3.1 70B Instruct. Quantized to 8-bit using bistandbytes and accelerate.

  • Developed by: Farid Saud @ DSRS
  • License: llama3.1
  • Base Model: meta-llama/Meta-Llama-3.1-70B-Instruct

Use this model

Use a pipeline as a high-level helper:

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="fsaudm/Meta-Llama-3.1-70B-Instruct-INT8")
pipe(messages)

Load model directly

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-INT8")
model = AutoModelForCausalLM.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-INT8")

The base model information can be found in the original meta-llama/Meta-Llama-3.1-70B-Instruct

Downloads last month
466
Safetensors
Model size
70.6B params
Tensor type
F32
·
FP16
·
I8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for fsaudm/Meta-Llama-3.1-70B-Instruct-INT8

Quantized
(95)
this model

Collection including fsaudm/Meta-Llama-3.1-70B-Instruct-INT8