Phi4 MoE 2x14B Instruct
Mixture of Experts of Phi4 14B-IT & 14B-IT.
- 14.2B parameters (4bit quant with bitsandbytes)
- BF16-U8 (Dynamic Quants by Unsloth using bnb-4bit)
- Phi4 (Phi3, Llama)
- Instruct
Model Summary
Developers | Microsoft Research |
Description | phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures |
Architecture | 14B parameters, dense decoder-only Transformer model |
Inputs | Text, best suited for prompts in the chat format |
Context length | 16K tokens |
GPUs | 1920 H100-80G |
Training time | 21 days |
Training data | 9.8T tokens |
Outputs | Generated text in response to input |
Dates | October 2024 – November 2024 |
Status | Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data |
Release date | December 12, 2024 |
License | MIT |
- Downloads last month
- 14