VictorSanh
commited on
Commit
•
53f0739
1
Parent(s):
31098a9
Update readme and doc from the 80b repo
Browse files
README.md
CHANGED
@@ -305,11 +305,15 @@ Similarly to the base IDEFICS models, we performed checkpoint selection to stop
|
|
305 |
|
306 |
## Hardware
|
307 |
|
308 |
-
The IDEFICS models were trained on an AWS SageMaker cluster
|
|
|
|
|
|
|
|
|
309 |
|
310 |
## Software
|
311 |
|
312 |
-
The training software is built on top of HuggingFace Transformers + Accelerate, and DeepSpeed ZeRO-3 for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
|
313 |
|
314 |
|
315 |
# Bias, Risks, and Limitations
|
|
|
305 |
|
306 |
## Hardware
|
307 |
|
308 |
+
The IDEFICS models were trained on an AWS SageMaker cluster with 8x80GB A100 GPUs nodes and EFA network.
|
309 |
+
|
310 |
+
- IDEFICS-80B took ~28 days of training on 64 nodes (512 GPUs).
|
311 |
+
- IDEFICS-80b-instruct finetuned the base model for ~3 days on 48 nodes (384 GPUs).
|
312 |
+
|
313 |
|
314 |
## Software
|
315 |
|
316 |
+
The training software is built on top of HuggingFace Transformers + Accelerate, and [DeepSpeed ZeRO-3](https://github.com/microsoft/DeepSpeed) for training, and [WebDataset](https://github.com/webdataset/webdataset) for data loading.
|
317 |
|
318 |
|
319 |
# Bias, Risks, and Limitations
|