rwmasood commited on
Commit
4a137e1
·
verified ·
1 Parent(s): 2ac7171

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -27,7 +27,10 @@ license: llama3.1
27
  Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing.
28
  To address the escalating computational costs of training large-scale models, various approaches have been proposed.
29
  We present our results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
30
- In this work, we take a step toward realizing such an approach. Specifically, we extend an existing **8B**-parameter model to **10B** parameters by initializing the additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to surpass the foundational model on the **EleutherAI** evaluation benchmark. However, our approach yielded improved performance in terms of **perplexity**, demonstrating potential for cost-efficient scaling strategies in large language model development.
 
 
 
31
 
32
 
33
  ## Usage
 
27
  Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing.
28
  To address the escalating computational costs of training large-scale models, various approaches have been proposed.
29
  We present our results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
30
+
31
+ In this work, we take a step toward realizing such an approach. Specifically, we extend an existing **8B**-parameter model to **10B** parameters by initializing the
32
+ additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to
33
+ surpass the foundational model on the **EleutherAI** evaluation benchmark. However, the average scores are very clsoe, demonstrating potential for cost-efficient scaling strategies in large language model development.
34
 
35
 
36
  ## Usage