🦄 Upload NPU+iGPU unicorn-execution-engine-models model

Browse files

Files changed (2) hide show

README.md +14 -0
model_placeholder.txt +42 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+---
+tags:
+- unicorn-execution-engine
+- npu
+- igpu
+- framework
+- documentation
+---
+# 🦄 Unicorn Execution Engine Model Collection
+Hardware Requirements: NPU Phoenix + AMD Radeon 780M
+Size: 0.001GB
+Framework: documentation

model_placeholder.txt ADDED Viewed

	@@ -0,0 +1,42 @@

+# Model Placeholder
+This repository is ready to host optimized model variants for the Unicorn Execution Engine.
+## Planned Model Files:
+### Gemma 3n E2B Variants
+- `gemma3n-e2b-fp16-npu.safetensors` (MatFormer FP16 optimized)
+- `gemma3n-e2b-int8-npu.safetensors` (MatFormer INT8 quantized)
+- `gemma3n-e2b-config.json` (Model configuration)
+- `gemma3n-e2b-tokenizer.json` (Tokenizer configuration)
+### Qwen2.5-7B Variants
+- `qwen25-7b-fp16-hybrid.safetensors` (Hybrid execution FP16)
+- `qwen25-7b-int8-hybrid.safetensors` (Hybrid execution INT8)
+- `qwen25-7b-config.json` (Model configuration)
+- `qwen25-7b-tokenizer.json` (Tokenizer configuration)
+### NPU Optimization Files
+- `npu_attention_kernels.mlir` (MLIR-AIE kernels)
+- `igpu_optimization_configs.json` (ROCm configurations)
+- `performance_profiles.json` (Turbo mode profiles)
+## Model Sizes (Estimated)
+- **Gemma 3n E2B FP16**: ~4GB
+- **Gemma 3n E2B INT8**: ~2GB
+- **Qwen2.5-7B FP16**: ~14GB
+- **Qwen2.5-7B INT8**: ~7GB
+## Performance Targets
+- **Gemma 3n E2B**: 100+ TPS with turbo mode
+- **Qwen2.5-7B**: 60+ TPS with hybrid execution
+- **Memory Usage**: <10GB total system budget
+- **Latency**: <30ms time to first token
+To create actual optimized models, run the Unicorn Execution Engine quantization pipeline:
+```bash
+cd Unicorn-Execution-Engine
+python quantization_engine.py --model gemma3n-e2b --precision fp16 --target npu
+python quantization_engine.py --model qwen25-7b --precision int8 --target hybrid
+```