Muennighoff
commited on
Commit
•
8c53520
1
Parent(s):
4cb792b
Update README.md
Browse files
README.md
CHANGED
@@ -217,25 +217,25 @@ The performance may vary depending on the prompt. For BLOOMZ models, we recommen
|
|
217 |
|
218 |
## Model
|
219 |
|
220 |
-
- Architecture
|
221 |
-
- Finetuning steps
|
222 |
-
- Finetuning tokens
|
223 |
-
- Finetuning layout
|
224 |
-
- Precision
|
225 |
|
226 |
## Hardware
|
227 |
|
228 |
-
-
|
229 |
-
- 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links
|
230 |
-
- NCCL-communications network
|
231 |
-
|
232 |
|
233 |
## Software
|
234 |
|
235 |
-
- [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
|
236 |
-
- [DeepSpeed](https://github.com/microsoft/DeepSpeed)
|
237 |
-
- [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
|
238 |
-
- [apex](https://github.com/NVIDIA/apex)
|
239 |
|
240 |
# Evaluation
|
241 |
|
|
|
217 |
|
218 |
## Model
|
219 |
|
220 |
+
- **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file
|
221 |
+
- **Finetuning steps:** 498
|
222 |
+
- **Finetuning tokens:** 2.09 billion
|
223 |
+
- **Finetuning layout:** 72x pipeline parallel, 1x tensor parallel, 4x data parallel
|
224 |
+
- **Precision:** bfloat16
|
225 |
|
226 |
## Hardware
|
227 |
|
228 |
+
- **CPUs:** AMD CPUs with 512GB memory per node
|
229 |
+
- **GPUs:** 288 A100 80GB GPUs (36 nodes) with 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links
|
230 |
+
- **Communication:** NCCL-communications network with a fully dedicated subnet
|
231 |
+
|
232 |
|
233 |
## Software
|
234 |
|
235 |
+
- **Orchestration:** [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
|
236 |
+
- **Optimizer & parallelism:** [DeepSpeed](https://github.com/microsoft/DeepSpeed)
|
237 |
+
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
|
238 |
+
- **FP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
|
239 |
|
240 |
# Evaluation
|
241 |
|