SandLogic Technologies - Quantized Phi-3.1-mini-4k-instruct Models
Model Description
We have quantized the Phi-3.1-mini-4k-instruct model into three variants:
- Q5_KM
- Q4_KM
- IQ4_XS
These quantized models offer improved efficiency while maintaining performance.
Discover our full range of quantized language models by visiting our SandLogic Lexicon GitHub. To learn more about our company and services, check out our website at SandLogic.
Original Model Information
- Name: Phi-3.1-mini-4k-instruct
- Developer: Microsoft
- Model Type: Open-source language model
- Parameters: 3.8 billion
- Context Length: 4,000 tokens
- Training Data: 3.3 trillion tokens, including curated public documents, synthetic "textbook-like" data, and high-quality chat data
- Language: English
Model Capabilities
The Phi-3.1-mini-4k-instruct model is designed for a variety of commercial and research applications, particularly in environments with limited memory or computational resources, scenarios requiring low latency, and tasks involving robust reasoning capabilities, such as mathematics and logic.
The model's key capabilities include:
- Instruction following
- Structured output generation
- High-quality multi-turn conversations
- Explicit support for the <|system|> tag
- Improved reasoning capabilities
Use Cases
- Environments with Limited Resources: Suitable for deployment on devices with limited memory or computational power, such as laptops, desktops, or edge devices.
- Low-Latency Applications: Ideal for use cases where quick responses are critical, such as customer service chatbots or real-time text generation.
- Mathematics and Logic-Based Tasks: Performs well on tasks requiring robust reasoning capabilities, including math problem-solving and logical inference.
- Processing and Analyzing Long-Form Text: Able to handle and analyze large amounts of text efficiently.
Model Variants
We offer three quantized versions of the Phi-3.1-mini-4k-instruct model:
- Q5_KM: 5-bit quantization using the KM method
- Q4_KM: 4-bit quantization using the KM method
- IQ4_XS: 4-bit quantization using the IQ4_XS method
These quantized models aim to reduce model size and improve inference speed while maintaining performance as close to the original model as possible.
Input and Output
- Input: Text string (e.g., instructions, prompts, or long-form text)
- Output: Generated text following the input, with structured output, improved reasoning, and adherence to the <|system|> tag
Usage
pip install llama-cpp-python
Please refer to the llama-cpp-python documentation to install with GPU support.
Basic Text Completion
Here's an example demonstrating how to use the high-level API for basic text completion:
from llama_cpp import Llama
llm = Llama(
model_path="./Phi-3-mini-4k-instruct-q4.gguf", # path to GGUF file
n_ctx=4096, # The max sequence length to use - note that longer sequence lengths require much more resources
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
n_gpu_layers=35, # The number of layers to offload to GPU, if you have GPU acceleration available. Set to 0 if no GPU acceleration is available on your system.
)
prompt = "How to explain Internet to a medieval knight?"
# Simple inference example
output = llm(
f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
max_tokens=256, # Generate up to 256 tokens
stop=["<|end|>"],
echo=True, # Whether to echo the prompt
)
print(output['choices'][0]['text'])
Download
You can download Llama
models in gguf
format directly from Hugging Face using the from_pretrained
method. This feature requires the huggingface-hub
package.
To install it, run: pip install huggingface-hub
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="SandLogicTechnologies/Phi-3.1-mini-4k-instruct-GGUF",
filename="*Phi-3.1-mini-4k-instruct-Q5_K_M.gguf",
verbose=False
)
By default, from_pretrained will download the model to the Hugging Face cache directory. You can manage installed model files using the huggingface-cli tool.
License
Phi-3-mini-4k-instruct license - Phi-3
Acknowledgements
We thank the Microsoft team for developing and releasing the original Phi-3.1-mini-4k-instruct model. Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.
Contact
For any inquiries or support, please contact us at [email protected] or visit our support page.
- Downloads last month
- 9