[email protected]
commited on
Commit
·
07a7f16
1
Parent(s):
5a57d92
Add training code link
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ In this repo, we are open-sourcing NVLM-1.0-D-72B (decoder-only architecture), t
|
|
34 |
|
35 |
|
36 |
## Reference(s)
|
37 |
-
[Paper](https://arxiv.org/abs/2409.11402)   [Inference Code (HF)](https://huggingface.co/nvidia/NVLM-D-72B/tree/main)   [Training Code
|
38 |
|
39 |
## Benchmark Results
|
40 |
We train our model with legacy [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/legacy) and adapt the codebase to Huggingface for model hosting, reproducibility, and inference.
|
@@ -103,7 +103,7 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows
|
|
103 |
When converting Megatron checkpoint to Huggingface, we adapt [InternVL codebase](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B) to support model loading and multi-GPU inference in HF.
|
104 |
We also use the tokenizer from [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/tree/main) when adapting the tokenizer to Huggingface, as it contains extra special tokens for vision tasks, e.g., `<|vision_pad|>`.
|
105 |
We train NVLM-1.0-D-72B based on the [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct/tree/main) text-only model and [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) ViT model with our large-scale high-quality multimodal dataset.
|
106 |
-
For training code, please refer to [Megatron-
|
107 |
|
108 |
|
109 |
### Prepare the environment
|
|
|
34 |
|
35 |
|
36 |
## Reference(s)
|
37 |
+
[Paper](https://arxiv.org/abs/2409.11402)   [Inference Code (HF)](https://huggingface.co/nvidia/NVLM-D-72B/tree/main)   [Training Code](https://github.com/NVIDIA/Megatron-LM/tree/NVLM-1.0/examples/multimodal/nvlm)   [Website](https://research.nvidia.com/labs/adlr/NVLM-1/)
|
38 |
|
39 |
## Benchmark Results
|
40 |
We train our model with legacy [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/legacy) and adapt the codebase to Huggingface for model hosting, reproducibility, and inference.
|
|
|
103 |
When converting Megatron checkpoint to Huggingface, we adapt [InternVL codebase](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B) to support model loading and multi-GPU inference in HF.
|
104 |
We also use the tokenizer from [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/tree/main) when adapting the tokenizer to Huggingface, as it contains extra special tokens for vision tasks, e.g., `<|vision_pad|>`.
|
105 |
We train NVLM-1.0-D-72B based on the [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct/tree/main) text-only model and [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) ViT model with our large-scale high-quality multimodal dataset.
|
106 |
+
For training code, please refer to [Megatron-Core](https://github.com/NVIDIA/Megatron-LM/tree/NVLM-1.0/examples/multimodal/nvlm).
|
107 |
|
108 |
|
109 |
### Prepare the environment
|