BerenMillidge commited on
Commit
d7170f2
1 Parent(s): 88bc14a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -51,6 +51,17 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
51
 
52
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/wXFMLXZA2-xz2PDyUMwTI.png" width="600"/>
53
 
 
 
 
 
 
 
 
 
 
 
 
54
  Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
55
 
56
  Time to First Token (TTFT) | Output Generation
 
51
 
52
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/wXFMLXZA2-xz2PDyUMwTI.png" width="600"/>
53
 
54
+
55
+ | Model | MT-Bench | IFEval
56
+ |-------------|-----|----|---------------|--------------|
57
+ | **Zamba2-2.6B-Instruct** | 2.6B | **72.40** | **53.96** |
58
+ | Mistral-7B-Instruct | 7B | 72.40 | 66.4 | 45.3 |
59
+ | Gemma2-2B-Instruct | 2.7B | 51.69 | 48.8 |
60
+ | H2O-Danube-4B-Chat | 4B | 52.57 | 45.44 |
61
+ | StableLM-Zephyr-3B | 3B | 66.43 | 36.83 |
62
+
63
+
64
+
65
  Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
66
 
67
  Time to First Token (TTFT) | Output Generation