bartowski commited on
Commit
03a288a
1 Parent(s): 0f5ee7f

Update VRAM estimates

Browse files
Files changed (1) hide show
  1. README.md +9 -14
README.md CHANGED
@@ -12,10 +12,6 @@ Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.18">turb
12
 
13
  Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
14
 
15
- Conversion was done using the default calibration dataset.
16
-
17
- Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
18
-
19
  Original model: https://huggingface.co/Vezora/Mistral-22B-v0.2
20
 
21
  ## Prompt Format
@@ -26,17 +22,16 @@ Original model: https://huggingface.co/Vezora/Mistral-22B-v0.2
26
  ### Assistant:
27
  ```
28
 
29
- <a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/8_0">8.0 bits per weight</a>
30
-
31
- <a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/6_5">6.5 bits per weight</a>
32
-
33
- <a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/5_0">5.0 bits per weight</a>
34
-
35
- <a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/4_25">4.25 bits per weight</a>
36
-
37
- <a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_5">3.5 bits per weight</a>
38
 
39
- <a href="https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_0">3.0 bits per weight</a>
 
 
 
 
 
 
 
40
 
41
 
42
  ## Download instructions
 
12
 
13
  Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
14
 
 
 
 
 
15
  Original model: https://huggingface.co/Vezora/Mistral-22B-v0.2
16
 
17
  ## Prompt Format
 
22
  ### Assistant:
23
  ```
24
 
25
+ ## Available sizes
 
 
 
 
 
 
 
 
26
 
27
+ | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
28
+ | ------ | ---- | ------------ | ---- | ---- | ---- | ----------- |
29
+ | [8_0](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/8_0) | 8.0 | 8.0 | 23.5 GB | 26.0 GB | 29.5 GB | Near unquantized performance, max quality ExLlamaV2 can create. |
30
+ | [6_5](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/6_5) | 6.5 | 8.0 | 19.4 GB | 21.9 GB | 25.4 GB | Near unquantized performance at vastly reduced size, **recommended**. |
31
+ | [5_0](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/5_0) | 5.0 | 6.0 | 15.5 GB | 18.0 GB | 21.5 GB | Smaller size, lower quality, still very high performance, **recommended**. |
32
+ | [4_25](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/4_25) | 4.25 | 6.0 | 13.3 GB | 15.8 GB | 19.3 GB | GPTQ equivalent bits per weight, slightly higher quality. |
33
+ | [3_5](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_5) | 3.5 | 6.0 | 11.6 GB | 14.1 GB | 17.6 GB | Lower quality, only use if you have to. |
34
+ | [3_0](https://huggingface.co/bartowski/Mistral-22B-v0.2-exl2/tree/3_0) | 3.0 | 6.0 | 9.8 GB | 12.3 GB | 15.8 GB | Very low quality. Usable on 12GB with low context or 16gb with 32k. |
35
 
36
 
37
  ## Download instructions