dittops commited on
Commit
f8f24b5
·
2 Parent(s): fb5bedd 5a4b00b

Merge branch 'main' of https://huggingface.co/budecosystem/boomer-1b into main

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -1,3 +1,109 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
  ---
7
+
8
+ <div align="center"><img src="https://accubits-assests.s3.ap-south-1.amazonaws.com/boomer/Boomer-Png.png" width=200></div>
9
+
10
+
11
+ <p align="center"><i>Democratizing access to LLMs for the open-source community.<br>Let's advance AI, together. </i></p>
12
+
13
+ ----
14
+
15
+ ## Introduction 🎉
16
+
17
+ We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b)
18
+
19
+ <div align="center"><img src="https://accubits-assests.s3.ap-south-1.amazonaws.com/boomer/boomer-arch.jpg" width=500></div>
20
+
21
+ ## Getting Started on GitHub 💻
22
+
23
+ Ready to dive in? Here's how you can get started with our models on GitHub.
24
+
25
+ Install the necessary dependencies with the following command:
26
+
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ ### Generate responses
32
+
33
+ Now that your model is fine-tuned, you're ready to generate responses. You can do this using our generate.py script, which runs inference from the Hugging Face model hub and inference on a specified input. Here's an example of usage:
34
+
35
+ ```bash
36
+ python generate.py --base_model 'budecosystem/boomer-1b' --prompt="the president of India is"
37
+ ```
38
+
39
+ ### Fine-tuning 🎯
40
+
41
+
42
+ It's time to upgrade the model by fine-tuning the model. You can do this using our provided finetune.py script. Here's an example command:
43
+
44
+ ```bash
45
+ torchrun --nproc_per_node 4 train.py \
46
+ --base_model budecosystem/boomer-1b \
47
+ --data_path dataset.json \
48
+ --output_dir output \
49
+ --per_device_train_batch_size 2 \
50
+ --gradient_accumulation_steps 2 \
51
+ --num_train_epochs 1 \
52
+ --learning_rate 2e-5 \
53
+ --fp16 True \
54
+ --logging_steps 10 \
55
+ --deepspeed ds_config.json
56
+ ```
57
+
58
+ ## Model details
59
+
60
+ | Parameters | Value |
61
+ | :------------- | :----: |
62
+ | n_layers | 4 |
63
+ | n_heads | 32 |
64
+ | d_model | 4096 |
65
+ | vocab size | 32000 |
66
+ | sequence length | 4096 |
67
+ | Intermediate size | 11008 |
68
+
69
+ ### Tokenizer
70
+
71
+ We used the SentencePiece tokenizer during the fine-tuning process. This tokenizer is known for its capability to handle open-vocabulary language tasks efficiently.
72
+
73
+ ### Training details
74
+
75
+ The model is trained of 4 A100 80GB for approximately 250hrs.
76
+
77
+ | Hyperparameters | Value |
78
+ | :----------------------------| :-----: |
79
+ | per_device_train_batch_size | 2 |
80
+ | gradient_accumulation_steps | 2 |
81
+ | learning_rate | 2e-4 |
82
+ | optimizer | adamw |
83
+ | beta | 0.9, 0.95 |
84
+ | fp16 | True |
85
+ | GPU | 4 A100 80GB |
86
+
87
+
88
+ ## Evaluations
89
+
90
+ We have evaluated the pre-trained model on few of the benchmarks
91
+
92
+ | Model Name | ARC | MMLU | Human Eval | Hellaswag | BBH | DROP | GSM8K |
93
+ |:----------:|:--------:|:----:|:----------:|:---------:|:-----: |:-----:|:----:|
94
+ | Boomer1B | 22.35 | 25.92| 6.1 | 31.66 | 28.65 | 6.13 | 1.5 |
95
+
96
+ ### Why use BOOMER?
97
+
98
+ Retrieval augmentation
99
+ Inference at the edge
100
+ Language modeling use cases
101
+
102
+ ### Final thought on Boomer!
103
+
104
+ This isn't the end. It's just the beginning of a journey towards creating more advanced, more efficient, and more accessible language models. We invite you to join us on this exciting journey.
105
+
106
+
107
+ ### Aknowledgements
108
+
109
+ We'd like to thank the open-source community and the researchers whose foundational work laid the path for BOOMER. Special shoutout to our dedicated team who have worked relentlessly to curate the dataset and fine-tune the model to perfection.