crumb commited on
Commit
81f2047
·
1 Parent(s): af087dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -23
README.md CHANGED
@@ -7,29 +7,6 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- ## Gale Models: {Small, Medium, Large}
11
-
12
- Gale comprises three decoder-only transformer models derived from [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1), with varying layers dropped from the original Mistral-7b model: `[15:-8]`, `[10:-3]`, and `[2:-2]` for large, medium, and small respectively. Models were fine-tuned with high-rank adapters on a small randomized subset of high-quality web documents to ensure coherent text generation.
13
-
14
- The Crumbly 'Horizon' dataset used to train the Gale models consists of updated English text and code to fine-tune models like Gale which need to "set" their architectural changes in place. It's an efficient approach to leverage prior model knowledge instead of starting from scratch. A small {2%,3%,9%} (large,medium,small) subset of Horizon, specifically random 1k token windows, is used to set the Gale models due to the extensive time required to train on larger datasets with Crumbly's compute setup (2xA6000 Lambdalabs Vector Workstation). The dataset isn't publicly shared.
15
-
16
- | Model | Parameters | Retained Layers |
17
- | --- | --- | --- |
18
- | [Gale-Large](https://hf.co/crumbly/Gale-large) | 5.1B | 23/32 |
19
- | [Gale-Medium](https://hf.co/crumbly/Gale-medium) | 3B | 13/32 |
20
- | [Gale-Small](https://hf.co/crumbly/Gale-small) | 1B | 4/32 |
21
-
22
- | Horizon Subset | Token % |
23
- | --- | --- |
24
- | Papers | 21.65% |
25
- | GitHub | 35.34% |
26
- | Books | 23.08% |
27
- | Wiki | 3.56% |
28
- | Webtext | 16.36% |
29
-
30
- ### **Bias**
31
- The Horizon dataset contains internet-sourced text including potentially offensive content. Measures should be taken to mitigate biases during inference of Gale models.
32
-
33
  ### Support us (me)
34
 
35
  ***btc** (btc network) 3JB6if8iTpWBbBGBdnoYZxpg3CZoLUUvYe <br> **eth** (eth network) 0x32df00b0a9ecee8cc9c73e4ce53ea79fad802028 <br> contact [crumb](https://twitter.com/aicrumb)*
 
7
  pinned: false
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Support us (me)
11
 
12
  ***btc** (btc network) 3JB6if8iTpWBbBGBdnoYZxpg3CZoLUUvYe <br> **eth** (eth network) 0x32df00b0a9ecee8cc9c73e4ce53ea79fad802028 <br> contact [crumb](https://twitter.com/aicrumb)*