zhiyucheng omrialmog commited on
Commit
f2ce704
·
verified ·
1 Parent(s): 81c5f9f

Update README.md (#1)

Browse files

- Update README.md (e25ad2eb492e34d20d601af48442a8a4f2122f4c)
- Update README.md (eb9e064cceb4ce3be30d9cd1c79c98238d9711bb)


Co-authored-by: Omri Almog <[email protected]>

Files changed (1) hide show
  1. README.md +25 -18
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  base_model:
3
  - meta-llama/Llama-3.1-8B-Instruct
 
 
 
4
  ---
5
  # Model Overview
6
 
@@ -75,39 +78,37 @@ python examples/llama/convert_checkpoint.py --model_dir Llama-3.1-8B-Instruct-FP
75
  trtllm-build --checkpoint_dir /ckpt --output_dir /engine
76
  ```
77
 
78
- * Accuracy evaluation:
79
-
80
- 1) Prepare the MMLU dataset:
81
- ```sh
82
- mkdir data; wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
83
- tar -xf data/mmlu.tar -C data && mv data/data data/mmlu
84
- ```
85
-
86
- 2) Measure MMLU:
87
-
88
- ```sh
89
- python examples/mmlu.py --engine_dir ./engine --tokenizer_dir Llama-3.1-8B-Instruct-FP8/ --test_trt_llm --data_dir data/mmlu
90
- ```
91
-
92
  * Throughputs evaluation:
93
 
94
  Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
95
 
96
  ## Evaluation
97
- The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
98
  <table>
99
  <tr>
100
  <td><strong>Precision</strong>
101
  </td>
102
  <td><strong>MMLU</strong>
103
  </td>
 
 
 
 
 
 
104
  <td><strong>TPS</strong>
105
  </td>
106
  </tr>
107
  <tr>
108
- <td>FP16
 
 
 
 
109
  </td>
110
- <td>68.6
 
 
111
  </td>
112
  <td>8,579.93
113
  </td>
@@ -115,7 +116,13 @@ The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark r
115
  <tr>
116
  <td>FP8
117
  </td>
118
- <td>68.3
 
 
 
 
 
 
119
  </td>
120
  <td>11,062.90
121
  </td>
 
1
  ---
2
  base_model:
3
  - meta-llama/Llama-3.1-8B-Instruct
4
+ license: llama3.1
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  ---
8
  # Model Overview
9
 
 
78
  trtllm-build --checkpoint_dir /ckpt --output_dir /engine
79
  ```
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  * Throughputs evaluation:
82
 
83
  Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
84
 
85
  ## Evaluation
86
+
87
  <table>
88
  <tr>
89
  <td><strong>Precision</strong>
90
  </td>
91
  <td><strong>MMLU</strong>
92
  </td>
93
+ <td><strong>GSM8K (CoT) </strong>
94
+ </td>
95
+ <td><strong>ARC Challenge</strong>
96
+ </td>
97
+ <td><strong>IFEVAL</strong>
98
+ </td>
99
  <td><strong>TPS</strong>
100
  </td>
101
  </tr>
102
  <tr>
103
+ <td>BF16
104
+ </td>
105
+ <td>69.4
106
+ </td>
107
+ <td>84.5
108
  </td>
109
+ <td>83.4
110
+ </td>
111
+ <td>80.4
112
  </td>
113
  <td>8,579.93
114
  </td>
 
116
  <tr>
117
  <td>FP8
118
  </td>
119
+ <td>68.7
120
+ </td>
121
+ <td>83.1
122
+ </td>
123
+ <td>83.3
124
+ </td>
125
+ <td>81.8
126
  </td>
127
  <td>11,062.90
128
  </td>