Spaces:

nvidia
/

tp-1-dgx-node-estimator

Running

App Files Files Community

huckiyang commited on Jun 16

Commit

991a47c

1 Parent(s): 8b9c170

[node] estimation

Browse files

Files changed (2) hide show

README.md +6 -4
app.py +8 -4

README.md CHANGED Viewed

@@ -69,7 +69,7 @@ python app.py
   - Fine-tuning: 2.5x (moderate overhead)
 ### Node Calculation
-- **H100 Memory**: 80GB HBM3 per GPU (90% usable)
 - **Model Parallelism**: Automatic consideration for large models
 - **Memory Efficiency**: Optimal distribution across nodes
@@ -78,9 +78,9 @@ python app.py
 | Model | Tokens (In/Out) | Batch Size | Use Case | Precision | Estimated Nodes |
 |-------|----------------|------------|----------|-----------|----------------|
 | LLaMA-3-8B | 2048/512 | 1 | Inference | FP16 | 1 |
-| LLaMA-3-70B | 4096/1024 | 4 | Inference | FP16 | 3-4 |
-| Qwen2.5-72B | 8192/2048 | 2 | Fine-tuning | BF16 | 4-5 |
-| Nemotron-4-340B | 2048/1024 | 1 | Inference | INT8 | 6-8 |
 ## CUDA Recommendations
@@ -139,6 +139,8 @@ This project is licensed under the MIT License - see the LICENSE file for detail
 ## Notes
 - For production deployments, consider adding a 10-20% buffer to estimates
 - Network bandwidth and storage requirements are not included in calculations
 - Estimates assume optimal memory layout and efficient implementations

   - Fine-tuning: 2.5x (moderate overhead)
 ### Node Calculation
+- **H100 Node**: 8 × H100 GPUs per node = 640GB HBM3 total (576GB usable per node)
 - **Model Parallelism**: Automatic consideration for large models
 - **Memory Efficiency**: Optimal distribution across nodes
 | Model | Tokens (In/Out) | Batch Size | Use Case | Precision | Estimated Nodes |
 |-------|----------------|------------|----------|-----------|----------------|
 | LLaMA-3-8B | 2048/512 | 1 | Inference | FP16 | 1 |
+| LLaMA-3-70B | 4096/1024 | 4 | Inference | FP16 | 1 |
+| Qwen2.5-72B | 8192/2048 | 2 | Fine-tuning | BF16 | 1 |
+| Nemotron-4-340B | 2048/1024 | 1 | Inference | INT8 | 1-2 |
 ## CUDA Recommendations
 ## Notes
+- **Node Configuration**: Each H100 node contains 8 × H100 GPUs (640GB total memory)
 - For production deployments, consider adding a 10-20% buffer to estimates
 - Network bandwidth and storage requirements are not included in calculations
 - Estimates assume optimal memory layout and efficient implementations
+- Multi-node setups require high-speed interconnects (InfiniBand/NVLink) for optimal performance

app.py CHANGED Viewed

@@ -30,7 +30,9 @@ MODEL_SPECS = {
 }
 # H100 specifications
-H100_MEMORY_GB = 80
 H100_COMPUTE_CAPABILITY = "9.0"
 # CUDA version recommendations based on model and use case
@@ -120,7 +122,7 @@ def estimate_h100_nodes(
     total_memory_per_instance = (model_memory + kv_cache_memory) * overhead_multiplier.get(use_case, 1.2)
     # Calculate nodes needed
-    memory_per_node = H100_MEMORY_GB * 0.9  # Reserve 10% for system
     nodes_needed = max(1, int(np.ceil(total_memory_per_instance / memory_per_node)))
     # For very large models, consider model parallelism
@@ -138,9 +140,10 @@ def estimate_h100_nodes(
     • **KV Cache Memory**: {kv_cache_memory:.1f} GB (for {total_tokens:,} tokens × {batch_size} batch size)
     • **Use Case Overhead**: {overhead_multiplier.get(use_case, 1.2):.1f}x ({use_case})
     • **Total Memory Required**: {total_memory_per_instance:.1f} GB
-    • **H100 Usable Memory**: {memory_per_node:.1f} GB per node
-    **Recommendation**: {nodes_needed} H100 node(s)
     """
     breakdown = {
@@ -167,6 +170,7 @@ def get_cuda_recommendation(use_case: str) -> str:
     **Additional Requirements:**
     • **Driver Version**: 525.60.13+ (Linux) / 527.41+ (Windows)
     • **Compute Capability**: {H100_COMPUTE_CAPABILITY} (H100 native)
     • **Memory**: ECC enabled recommended for production
     """

 }
 # H100 specifications
+H100_MEMORY_GB = 80  # Memory per GPU
+H100_GPUS_PER_NODE = 8  # GPUs per node
+H100_NODE_MEMORY_GB = H100_MEMORY_GB * H100_GPUS_PER_NODE  # 640GB per node
 H100_COMPUTE_CAPABILITY = "9.0"
 # CUDA version recommendations based on model and use case
     total_memory_per_instance = (model_memory + kv_cache_memory) * overhead_multiplier.get(use_case, 1.2)
     # Calculate nodes needed
+    memory_per_node = H100_NODE_MEMORY_GB * 0.9  # Reserve 10% for system (576GB usable per node)
     nodes_needed = max(1, int(np.ceil(total_memory_per_instance / memory_per_node)))
     # For very large models, consider model parallelism
     • **KV Cache Memory**: {kv_cache_memory:.1f} GB (for {total_tokens:,} tokens × {batch_size} batch size)
     • **Use Case Overhead**: {overhead_multiplier.get(use_case, 1.2):.1f}x ({use_case})
     • **Total Memory Required**: {total_memory_per_instance:.1f} GB
+    • **H100 Node Specs**: {H100_GPUS_PER_NODE} × {H100_MEMORY_GB}GB = {H100_NODE_MEMORY_GB}GB per node
+    • **Usable Memory**: {memory_per_node:.1f} GB per node (10% reserved)
+    **Recommendation**: {nodes_needed} H100 node(s) ({nodes_needed * H100_GPUS_PER_NODE} H100 GPUs total)
     """
     breakdown = {
     **Additional Requirements:**
     • **Driver Version**: 525.60.13+ (Linux) / 527.41+ (Windows)
     • **Compute Capability**: {H100_COMPUTE_CAPABILITY} (H100 native)
+    • **Node Configuration**: {H100_GPUS_PER_NODE} × H100 GPUs per node ({H100_NODE_MEMORY_GB}GB total)
     • **Memory**: ECC enabled recommended for production
     """