magicunicorn commited on
Commit
2c74cf2
·
verified ·
1 Parent(s): 12aaae2

🦄 Upload NPU+iGPU unicorn-execution-engine-models model

Browse files
Files changed (2) hide show
  1. README.md +14 -0
  2. model_placeholder.txt +42 -0
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - unicorn-execution-engine
4
+ - npu
5
+ - igpu
6
+ - framework
7
+ - documentation
8
+ ---
9
+
10
+ # 🦄 Unicorn Execution Engine Model Collection
11
+
12
+ Hardware Requirements: NPU Phoenix + AMD Radeon 780M
13
+ Size: 0.001GB
14
+ Framework: documentation
model_placeholder.txt ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Placeholder
2
+
3
+ This repository is ready to host optimized model variants for the Unicorn Execution Engine.
4
+
5
+ ## Planned Model Files:
6
+
7
+ ### Gemma 3n E2B Variants
8
+ - `gemma3n-e2b-fp16-npu.safetensors` (MatFormer FP16 optimized)
9
+ - `gemma3n-e2b-int8-npu.safetensors` (MatFormer INT8 quantized)
10
+ - `gemma3n-e2b-config.json` (Model configuration)
11
+ - `gemma3n-e2b-tokenizer.json` (Tokenizer configuration)
12
+
13
+ ### Qwen2.5-7B Variants
14
+ - `qwen25-7b-fp16-hybrid.safetensors` (Hybrid execution FP16)
15
+ - `qwen25-7b-int8-hybrid.safetensors` (Hybrid execution INT8)
16
+ - `qwen25-7b-config.json` (Model configuration)
17
+ - `qwen25-7b-tokenizer.json` (Tokenizer configuration)
18
+
19
+ ### NPU Optimization Files
20
+ - `npu_attention_kernels.mlir` (MLIR-AIE kernels)
21
+ - `igpu_optimization_configs.json` (ROCm configurations)
22
+ - `performance_profiles.json` (Turbo mode profiles)
23
+
24
+ ## Model Sizes (Estimated)
25
+ - **Gemma 3n E2B FP16**: ~4GB
26
+ - **Gemma 3n E2B INT8**: ~2GB
27
+ - **Qwen2.5-7B FP16**: ~14GB
28
+ - **Qwen2.5-7B INT8**: ~7GB
29
+
30
+ ## Performance Targets
31
+ - **Gemma 3n E2B**: 100+ TPS with turbo mode
32
+ - **Qwen2.5-7B**: 60+ TPS with hybrid execution
33
+ - **Memory Usage**: <10GB total system budget
34
+ - **Latency**: <30ms time to first token
35
+
36
+ To create actual optimized models, run the Unicorn Execution Engine quantization pipeline:
37
+
38
+ ```bash
39
+ cd Unicorn-Execution-Engine
40
+ python quantization_engine.py --model gemma3n-e2b --precision fp16 --target npu
41
+ python quantization_engine.py --model qwen25-7b --precision int8 --target hybrid
42
+ ```