Spaces:
Running
Running
File size: 2,738 Bytes
75a996e 8104655 75a996e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# Model Performance Testing Methodology
This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.
## Hardware Specifications
### GPU
- Model: AMD Radeon RX 7600 XT 16GB
- Note: Currently the most affordable (GPU-poorest) graphics card with 16GB VRAM on the market, making it an excellent choice for budget-conscious AI enthusiasts
### System Specifications
- CPU: AMD Ryzen 7 5700X (16) @ 4.66 GHz
- Motherboard: B550 Pro4
- RAM: 64GB
- OS: Debian 12 Bookworm
- Kernel: Linux 6.8.12-8
- Testing Environment: Ollama with ROCm backend
## Testing Methodology
Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:
1. Model Loading: Each model is loaded fresh before testing
2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
3. Main Test: A comprehensive creative writing prompt is processed
4. Performance Metrics Collection: Various metrics are gathered during generation
### Test Prompt
The following creative writing prompt is used to test all models:
```
You are a creative writing assistant. Write a short story about a futuristic city where:
1. The city is powered by a mysterious energy source
2. The inhabitants have developed unique abilities
3. There's a hidden conflict between different factions
4. The protagonist discovers a shocking truth about the city's origins
Make the story engaging and include vivid descriptions of the city's architecture and technology.
```
This prompt was chosen because it:
- Requires creative thinking and complex reasoning
- Generates substantial output (typically 500-1000 tokens)
- Tests both context understanding and generation capabilities
- Produces consistent length outputs for fair comparison
## Metrics Collected
For each model, we collect and analyze:
1. Performance Metrics:
- Tokens per second (overall)
- Generation tokens per second
- Total response time
- Total tokens generated
2. Resource Usage:
- VRAM usage
- Model size
- Parameter count
3. Model Information:
- Quantization level
- Model format
- Model family
## Testing Parameters
All tests are run with consistent generation parameters:
- Temperature: 0.7
- Top P: 0.9
- Top K: 40
- Max Tokens: 1000
- Repetition Penalty: 1.0
- Seed: 42 (for reproducibility)
## Notes
- Tests are run sequentially to ensure no resource contention
- A 3-second cooldown period is maintained between tests
- Models are unloaded after each test to ensure clean state
- Results are saved both in detailed and summary formats
- The testing script automatically handles model pulling and cleanup |