gpu-poor-llm-arena / README_model_stats.md
k-mktr's picture
Update README_model_stats.md
8104655 verified

A newer version of the Gradio SDK is available: 5.23.3

Upgrade

Model Performance Testing Methodology

This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.

Hardware Specifications

GPU

  • Model: AMD Radeon RX 7600 XT 16GB
  • Note: Currently the most affordable (GPU-poorest) graphics card with 16GB VRAM on the market, making it an excellent choice for budget-conscious AI enthusiasts

System Specifications

  • CPU: AMD Ryzen 7 5700X (16) @ 4.66 GHz
  • Motherboard: B550 Pro4
  • RAM: 64GB
  • OS: Debian 12 Bookworm
  • Kernel: Linux 6.8.12-8
  • Testing Environment: Ollama with ROCm backend

Testing Methodology

Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:

  1. Model Loading: Each model is loaded fresh before testing
  2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
  3. Main Test: A comprehensive creative writing prompt is processed
  4. Performance Metrics Collection: Various metrics are gathered during generation

Test Prompt

The following creative writing prompt is used to test all models:

You are a creative writing assistant. Write a short story about a futuristic city where:
1. The city is powered by a mysterious energy source
2. The inhabitants have developed unique abilities
3. There's a hidden conflict between different factions
4. The protagonist discovers a shocking truth about the city's origins

Make the story engaging and include vivid descriptions of the city's architecture and technology.

This prompt was chosen because it:

  • Requires creative thinking and complex reasoning
  • Generates substantial output (typically 500-1000 tokens)
  • Tests both context understanding and generation capabilities
  • Produces consistent length outputs for fair comparison

Metrics Collected

For each model, we collect and analyze:

  1. Performance Metrics:

    • Tokens per second (overall)
    • Generation tokens per second
    • Total response time
    • Total tokens generated
  2. Resource Usage:

    • VRAM usage
    • Model size
    • Parameter count
  3. Model Information:

    • Quantization level
    • Model format
    • Model family

Testing Parameters

All tests are run with consistent generation parameters:

  • Temperature: 0.7
  • Top P: 0.9
  • Top K: 40
  • Max Tokens: 1000
  • Repetition Penalty: 1.0
  • Seed: 42 (for reproducibility)

Notes

  • Tests are run sequentially to ensure no resource contention
  • A 3-second cooldown period is maintained between tests
  • Models are unloaded after each test to ensure clean state
  • Results are saved both in detailed and summary formats
  • The testing script automatically handles model pulling and cleanup