Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,39 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: yentinglin/Llama-3-Taiwan-70B-Instruct
|
3 |
+
language:
|
4 |
+
- zh
|
5 |
+
- en
|
6 |
+
license: llama3
|
7 |
+
model_creator: yentinglin
|
8 |
+
model_name: Llama-3-Taiwan-70B-Instruct
|
9 |
+
model_type: llama
|
10 |
+
pipeline_tag: text-generation
|
11 |
+
quantized_by: minyichen
|
12 |
+
tags:
|
13 |
+
- llama-3
|
14 |
+
---
|
15 |
+
|
16 |
+
# Llama-3-Taiwan-70B-Instruct-fp8
|
17 |
+
- Model creator: [Yen-Ting Lin](https://huggingface.co/yentinglin)
|
18 |
+
- Original model: [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct)
|
19 |
+
|
20 |
+
<!-- description start -->
|
21 |
+
## Description
|
22 |
+
|
23 |
+
This repo contains fp8 model files for [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct).
|
24 |
+
|
25 |
+
<!-- description end -->
|
26 |
+
<!-- repositories-available start -->
|
27 |
+
* [GPTQ models for GPU inference](minyichen/Llama-3-Taiwan-70B-Instruct-GPTQ)
|
28 |
+
* [Yen-Ting Lin's original unquantized model](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct)
|
29 |
+
<!-- repositories-available end -->
|
30 |
+
|
31 |
+
## Quantization parameter
|
32 |
+
|
33 |
+
- activation_scheme : static
|
34 |
+
- quant_method : fp8
|
35 |
+
- ignored_layers : lm_head
|
36 |
+
|
37 |
+
It tooks about 8.5 hrs to quantize on H100.
|
38 |
+
|
39 |
+
|