This repository is a quantized version of the original model microsoft/Phi-3.5-MoE-instruct which is the FP16 half-precision official version released by Microsoft.

Model Summary

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

🏑 Phi-3 Portal
πŸ“° Phi-3 Microsoft Blog
πŸ“– Phi-3 Technical Report
πŸ‘©β€πŸ³ Phi-3 Cookbook
πŸ–₯️ Try It

MoE references: πŸ“œPhi-3.5-MoE Blog | 😁GRIN MoE

Phi-3.5: [mini-instruct]; [MoE-instruct] ; [vision-instruct]

Running πŸƒ

TGI

model=danieldk/Phi-3.5-MoE-instruct-AWQ-INT4
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.4.0 \
    --model-id $model --num-shard 2

Quantization Reproduction

Soon (need to upstream an AutoAWQ patch).

Downloads last month
5
Safetensors
Model size
5.83B params
Tensor type
I32
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support