minyichen commited on
Commit
cde0124
·
verified ·
1 Parent(s): d852af8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: yentinglin/Llama-3-Taiwan-70B-Instruct
3
+ language:
4
+ - zh
5
+ - en
6
+ license: llama3
7
+ model_creator: yentinglin
8
+ model_name: Llama-3-Taiwan-70B-Instruct
9
+ model_type: llama
10
+ pipeline_tag: text-generation
11
+ quantized_by: minyichen
12
+ tags:
13
+ - llama-3
14
+ ---
15
+
16
+ # Llama-3-Taiwan-70B-Instruct-fp8
17
+ - Model creator: [Yen-Ting Lin](https://huggingface.co/yentinglin)
18
+ - Original model: [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct)
19
+
20
+ <!-- description start -->
21
+ ## Description
22
+
23
+ This repo contains fp8 model files for [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct).
24
+
25
+ <!-- description end -->
26
+ <!-- repositories-available start -->
27
+ * [GPTQ models for GPU inference](minyichen/Llama-3-Taiwan-70B-Instruct-GPTQ)
28
+ * [Yen-Ting Lin's original unquantized model](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct)
29
+ <!-- repositories-available end -->
30
+
31
+ ## Quantization parameter
32
+
33
+ - activation_scheme : static
34
+ - quant_method : fp8
35
+ - ignored_layers : lm_head
36
+
37
+ It tooks about 8.5 hrs to quantize on H100.
38
+
39
+