Text Generation
Transformers
Safetensors
zamba2
Inference Endpoints
BerenMillidge commited on
Commit
b274e6c
·
verified ·
1 Parent(s): db63e6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -4
README.md CHANGED
@@ -3,10 +3,15 @@ license: apache-2.0
3
  ---
4
  # Model Card for Zamba v2 3B
5
 
6
- Zamba-3B-v2 is a hybrid model between Mamba2, a state-space model, and transformers. It uses a mamba2 backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-3B-v2 was pre-trained on 3T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 100B high-quality tokens.
7
 
8
- Note: this is a temporary HuggingFace implementation of Zamba 3B and is designed for specific use cases. It may not be fully compatible with all frameworks and tools intended to interface with HuggingFace models.
 
 
 
 
9
 
 
10
 
11
  ## Quick start
12
 
@@ -47,7 +52,7 @@ outputs = model.generate(**input_ids, max_new_tokens=100)
47
  print(tokenizer.decode(outputs[0]))
48
  ```
49
 
50
- ## Model Details
51
 
52
  Zamba utilizes a unique hybrid SSM architecture. This architecture consists of a backbone of Mamba layers interspersed with a shared attention layer. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth.
53
 
@@ -88,4 +93,4 @@ If you find Zamba useful in your work please cite it as:
88
 
89
  ## Notice
90
 
91
- Zamba is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.
 
3
  ---
4
  # Model Card for Zamba v2 3B
5
 
6
+ Zamba2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba2-2.7B possesses three major improvements over Zamba1:
7
 
8
+ 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
9
+ 2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
10
+ 3.) We apply a LoRA projector to each shared MLP block allowing the network to specialize the MLPs at each shared layer with a minimal increase in total parameter count.
11
+
12
+ Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. Zamba2-2.7B was pre-trained on 3T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 100B high-quality tokens.
13
 
14
+ Note: this is a temporary HuggingFace implementation of Zamba 3B and is designed for specific use cases. It may not be fully compatible with all frameworks and tools intended to interface with HuggingFace models.
15
 
16
  ## Quick start
17
 
 
52
  print(tokenizer.decode(outputs[0]))
53
  ```
54
 
55
+ ## Model Details [to update!]
56
 
57
  Zamba utilizes a unique hybrid SSM architecture. This architecture consists of a backbone of Mamba layers interspersed with a shared attention layer. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth.
58
 
 
93
 
94
  ## Notice
95
 
96
+ Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.