Merge branch 'main' of https://huggingface.co/budecosystem/boomer-1b into main
Browse files
README.md
CHANGED
@@ -1,3 +1,109 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: transformers
|
6 |
---
|
7 |
+
|
8 |
+
<div align="center"><img src="https://accubits-assests.s3.ap-south-1.amazonaws.com/boomer/Boomer-Png.png" width=200></div>
|
9 |
+
|
10 |
+
|
11 |
+
<p align="center"><i>Democratizing access to LLMs for the open-source community.<br>Let's advance AI, together. </i></p>
|
12 |
+
|
13 |
+
----
|
14 |
+
|
15 |
+
## Introduction 🎉
|
16 |
+
|
17 |
+
We are open-sourcing one of our early experiments of pretraining with custom architecture and datasets. This 1.1B parameter model is pre-trained from scratch using a custom-curated dataset of 41B tokens. The model's architecture experiments contain the addition of flash attention and a higher intermediate dimension of the MLP layer. The dataset is a combination of wiki, stories, arxiv, math and code. The model is available on huggingface [Boomer1B](https://huggingface.co/budecosystem/boomer-1b)
|
18 |
+
|
19 |
+
<div align="center"><img src="https://accubits-assests.s3.ap-south-1.amazonaws.com/boomer/boomer-arch.jpg" width=500></div>
|
20 |
+
|
21 |
+
## Getting Started on GitHub 💻
|
22 |
+
|
23 |
+
Ready to dive in? Here's how you can get started with our models on GitHub.
|
24 |
+
|
25 |
+
Install the necessary dependencies with the following command:
|
26 |
+
|
27 |
+
```bash
|
28 |
+
pip install -r requirements.txt
|
29 |
+
```
|
30 |
+
|
31 |
+
### Generate responses
|
32 |
+
|
33 |
+
Now that your model is fine-tuned, you're ready to generate responses. You can do this using our generate.py script, which runs inference from the Hugging Face model hub and inference on a specified input. Here's an example of usage:
|
34 |
+
|
35 |
+
```bash
|
36 |
+
python generate.py --base_model 'budecosystem/boomer-1b' --prompt="the president of India is"
|
37 |
+
```
|
38 |
+
|
39 |
+
### Fine-tuning 🎯
|
40 |
+
|
41 |
+
|
42 |
+
It's time to upgrade the model by fine-tuning the model. You can do this using our provided finetune.py script. Here's an example command:
|
43 |
+
|
44 |
+
```bash
|
45 |
+
torchrun --nproc_per_node 4 train.py \
|
46 |
+
--base_model budecosystem/boomer-1b \
|
47 |
+
--data_path dataset.json \
|
48 |
+
--output_dir output \
|
49 |
+
--per_device_train_batch_size 2 \
|
50 |
+
--gradient_accumulation_steps 2 \
|
51 |
+
--num_train_epochs 1 \
|
52 |
+
--learning_rate 2e-5 \
|
53 |
+
--fp16 True \
|
54 |
+
--logging_steps 10 \
|
55 |
+
--deepspeed ds_config.json
|
56 |
+
```
|
57 |
+
|
58 |
+
## Model details
|
59 |
+
|
60 |
+
| Parameters | Value |
|
61 |
+
| :------------- | :----: |
|
62 |
+
| n_layers | 4 |
|
63 |
+
| n_heads | 32 |
|
64 |
+
| d_model | 4096 |
|
65 |
+
| vocab size | 32000 |
|
66 |
+
| sequence length | 4096 |
|
67 |
+
| Intermediate size | 11008 |
|
68 |
+
|
69 |
+
### Tokenizer
|
70 |
+
|
71 |
+
We used the SentencePiece tokenizer during the fine-tuning process. This tokenizer is known for its capability to handle open-vocabulary language tasks efficiently.
|
72 |
+
|
73 |
+
### Training details
|
74 |
+
|
75 |
+
The model is trained of 4 A100 80GB for approximately 250hrs.
|
76 |
+
|
77 |
+
| Hyperparameters | Value |
|
78 |
+
| :----------------------------| :-----: |
|
79 |
+
| per_device_train_batch_size | 2 |
|
80 |
+
| gradient_accumulation_steps | 2 |
|
81 |
+
| learning_rate | 2e-4 |
|
82 |
+
| optimizer | adamw |
|
83 |
+
| beta | 0.9, 0.95 |
|
84 |
+
| fp16 | True |
|
85 |
+
| GPU | 4 A100 80GB |
|
86 |
+
|
87 |
+
|
88 |
+
## Evaluations
|
89 |
+
|
90 |
+
We have evaluated the pre-trained model on few of the benchmarks
|
91 |
+
|
92 |
+
| Model Name | ARC | MMLU | Human Eval | Hellaswag | BBH | DROP | GSM8K |
|
93 |
+
|:----------:|:--------:|:----:|:----------:|:---------:|:-----: |:-----:|:----:|
|
94 |
+
| Boomer1B | 22.35 | 25.92| 6.1 | 31.66 | 28.65 | 6.13 | 1.5 |
|
95 |
+
|
96 |
+
### Why use BOOMER?
|
97 |
+
|
98 |
+
Retrieval augmentation
|
99 |
+
Inference at the edge
|
100 |
+
Language modeling use cases
|
101 |
+
|
102 |
+
### Final thought on Boomer!
|
103 |
+
|
104 |
+
This isn't the end. It's just the beginning of a journey towards creating more advanced, more efficient, and more accessible language models. We invite you to join us on this exciting journey.
|
105 |
+
|
106 |
+
|
107 |
+
### Aknowledgements
|
108 |
+
|
109 |
+
We'd like to thank the open-source community and the researchers whose foundational work laid the path for BOOMER. Special shoutout to our dedicated team who have worked relentlessly to curate the dataset and fine-tune the model to perfection.
|