Llama-3-70B-Instruct-ov-fp16-int4-sym
Built with Meta Llama 3
Model Description
This is a version of the original Meta-Llama-3-70B-Instruct model converted to OpenVINO™ IR (Intermediate Representation) format for optimized inference on Intel® hardware. The model is created using the examples shown in OpenVINO™ Notebooks repository.
Intended Use
This model is designed for advanced natural language understanding and generation tasks, ideal for academic researchers and developers in commercial settings looking to integrate efficient AI capabilities into their applications. It is not to be used for creating or promoting harmful or illegal content as per the guidelines outlined in the Meta Llama 3 Acceptable Use Policy.
Licensing and Redistribution
This model is released under the Meta Llama 3 Community License. Redistribution requires inclusion of this license and a citation to the original model. Modifications and derivative works must prominently display "Built with Meta Llama 3" and adhere to the redistribution policies detailed in the original model license terms.
Weight Compression Parameters
For more information on the parameters, refer to the OpenVINO™ 2024.1.0 documentation
- mode: INT4_ASYM
- group_size: 128
- ratio: 0.8
Running Model Inference
Install packages required for using Optimum Intel integration with the OpenVINO™ backend:
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
Run model inference:
from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer
model_id = "nsbendre25/Llama-3-70B-Instruct-ov_fp16-int4_sym"
# Initialize the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)
pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
pipeline("i am in paris, plan me a 2 week trip")
- Downloads last month
- 5