Text Generation
English
Megatron-LM
nvidia
Retro
InstructRetro
48B
boxin-wbx commited on
Commit
9fe3db3
·
verified ·
1 Parent(s): 70f2354

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -17,6 +17,16 @@ library_name: Megatron-LM
17
 
18
  # InstructRetro
19
 
 
 
 
 
 
 
 
 
 
 
20
  Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation.
21
  Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token.
22
  Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT.
@@ -24,12 +34,6 @@ Retro also provides the flexibility to update the
24
  knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762)
25
  by updating the retrieval database without training LMs again.
26
 
27
- InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) further scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023).
28
- The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity.
29
- With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.
30
-
31
- ## Model Overview
32
-
33
  ### License
34
 
35
  The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).
 
17
 
18
  # InstructRetro
19
 
20
+ [Documentation](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro)   [Paper](https://arxiv.org/abs/2310.07713)   [Evaluation Data](https://drive.google.com/drive/folders/1xw-N0LJR_lIWnH6BKzHIb49quVCS_V72?usp=drive_link)   [Model Weights](https://huggingface.co/collections/nvidia/instructretro-65837ea76b60651e01faec8d)
21
+
22
+ InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023).
23
+ The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity.
24
+ With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.
25
+
26
+ **For more information about InstructRetro, check the [Documentation](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro)!**
27
+
28
+ ## Background
29
+
30
  Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation.
31
  Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token.
32
  Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT.
 
34
  knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762)
35
  by updating the retrieval database without training LMs again.
36
 
 
 
 
 
 
 
37
  ### License
38
 
39
  The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).