NeMo
Safetensors
llama
srvm commited on
Commit
311a6c1
·
1 Parent(s): 517bdfc

Add link to tech report. Fix typo in usage example #2

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -10,7 +10,7 @@ license_link: >-
10
 
11
  Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
12
  It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
13
- Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
14
 
15
  This model is ready for commercial use.
16
 
@@ -59,7 +59,7 @@ import torch
59
  from transformers import AutoTokenizer, LlamaForCausalLM
60
 
61
  # Load the tokenizer and model
62
- model_path = "nvidia/Llama3.1-Minitron-4B-Width-Base"
63
  tokenizer = AutoTokenizer.from_pretrained(model_path)
64
 
65
  device = 'cuda'
@@ -143,4 +143,6 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
143
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
144
 
145
  ## References
146
- * [Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
 
 
 
10
 
11
  Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
12
  It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
13
+ Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our [technical report](https://arxiv.org/abs/2408.11796) for more details.
14
 
15
  This model is ready for commercial use.
16
 
 
59
  from transformers import AutoTokenizer, LlamaForCausalLM
60
 
61
  # Load the tokenizer and model
62
+ model_path = "nvidia/Llama-3.1-Minitron-4B-Width-Base"
63
  tokenizer = AutoTokenizer.from_pretrained(model_path)
64
 
65
  device = 'cuda'
 
143
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
144
 
145
  ## References
146
+
147
+ * [Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
148
+ * [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)