crumb commited on
Commit
6cd43ca
·
1 Parent(s): 65bf5c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -20
README.md CHANGED
@@ -7,23 +7,20 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- ## Dante Models: {Small, Medium, Large}
11
 
12
- Dante comprises three decoder-only transformer models derived from [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1), with varying layers dropped from the original Mistral-7b model: `[15:-8]`, `[10:-3]`, and `[2:-2]` for large, medium, and small respectively.
13
-
14
- ![](graphic.png)
15
 
16
  | Model | Parameters | Retained Layers |
17
  | --- | --- | --- |
18
- | [Dante-Large](https://hf.co/crumbly/dante-large) | 5.1B | 23/32 |
19
- | [Dante-Medium](https://hf.co/crumbly/dante-medium) | 3B | 13/32 |
20
- | [Dante-Small](https://hf.co/crumbly/dante-small) | 1B | 4/32 |
21
 
22
- Models were fine-tuned with high-rank adapters on a small randomized subset of high-quality web documents to ensure coherent text generation.
23
 
24
- ## Virgil Dataset
25
 
26
- Virgil dataset, by Crumbly, consists of updated English text and code to fine-tune models like Dante which need to "set" their architectural changes in place. It's an efficient approach to leverage prior model knowledge instead of starting from scratch.
27
 
28
  | Subset | Token % |
29
  | --- | --- |
@@ -35,14 +32,6 @@ Virgil dataset, by Crumbly, consists of updated English text and code to fine-tu
35
 
36
  **Bias Alert**: Contains internet-sourced text including potentially offensive content. Measures should be taken to mitigate biases during inference.
37
 
38
- A small 2% subset of Virgil, specifically random 1k token windows, is used to set the Dante models due to the extensive time required to train on larger datasets with Crumbly's compute setup (2xA6000 Lambdalabs Vector Workstation). The dataset isn't publicly shared.
39
-
40
- ---
41
-
42
- **btc** (btc network only) 3JB6if8iTpWBbBGBdnoYZxpg3CZoLUUvYe
43
-
44
- **eth** (eth network only) 0x32df00b0a9ecee8cc9c73e4ce53ea79fad802028
45
-
46
- **xtz** tz1ULvqesQA8SnopRzuQKFQj2jdLGBatJoC3
47
 
48
- can accept most cryptos, contact [crumb](https://twitter.com/aicrumb)
 
7
  pinned: false
8
  ---
9
 
10
+ ## Gale Models: {Small, Medium, Large}
11
 
12
+ Gale comprises three decoder-only transformer models derived from [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1), with varying layers dropped from the original Mistral-7b model: `[15:-8]`, `[10:-3]`, and `[2:-2]` for large, medium, and small respectively. Models were fine-tuned with high-rank adapters on a small randomized subset of high-quality web documents to ensure coherent text generation.
 
 
13
 
14
  | Model | Parameters | Retained Layers |
15
  | --- | --- | --- |
16
+ | [Gale-Large](https://hf.co/crumbly/Gale-large) | 5.1B | 23/32 |
17
+ | [Gale-Medium](https://hf.co/crumbly/Gale-medium) | 3B | 13/32 |
18
+ | [Gale-Small](https://hf.co/crumbly/Gale-small) | 1B | 4/32 |
19
 
 
20
 
21
+ ## Horizon Dataset
22
 
23
+ The dataset used to train the Gale models consists of updated English text and code to fine-tune models like Dante which need to "set" their architectural changes in place. It's an efficient approach to leverage prior model knowledge instead of starting from scratch.
24
 
25
  | Subset | Token % |
26
  | --- | --- |
 
32
 
33
  **Bias Alert**: Contains internet-sourced text including potentially offensive content. Measures should be taken to mitigate biases during inference.
34
 
35
+ A small 2% subset of Horizon, specifically random 1k token windows, is used to set the Gale models due to the extensive time required to train on larger datasets with Crumbly's compute setup (2xA6000 Lambdalabs Vector Workstation). The dataset isn't publicly shared.
 
 
 
 
 
 
 
 
36
 
37
+ ***btc** (btc network) 3JB6if8iTpWBbBGBdnoYZxpg3CZoLUUvYe **eth** (eth network) 0x32df00b0a9ecee8cc9c73e4ce53ea79fad802028 contact [crumb](https://twitter.com/aicrumb)*