Update README.md
Browse files
README.md
CHANGED
@@ -38,19 +38,16 @@ The model's deep understanding of SEC filings and related financial data makes i
|
|
38 |
|
39 |
To ensure the robustness and effectiveness of Llama-3-SEC, the model has undergone rigorous evaluation on both domain-specific and general benchmarks. Key evaluation metrics include:
|
40 |
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
-

|
44 |
-
|
45 |
-
- Extractive numerical reasoning tasks, using subsets of TAT-QA and ConvFinQA datasets
|
46 |
-
|
47 |
-

|
48 |
-
|
49 |
-
- General evaluation metrics, such as BIG-bench, AGIEval, GPT4all, and TruthfulQA, to assess the model's performance on a wide range of tasks
|
50 |
-
|
51 |
-

|
52 |
-
|
53 |
-
- General perplexity on various datasets, including bigcode/starcoderdata, open-web-math/open-web-math, allenai/peS2o, mattymchen/refinedweb-3m, and Wikitext
|
54 |
|
55 |
The evaluation results demonstrate significant improvements in domain-specific performance while maintaining strong general capabilities, thanks to the use of advanced CPT and model merging techniques.
|
56 |
|
|
|
38 |
|
39 |
To ensure the robustness and effectiveness of Llama-3-SEC, the model has undergone rigorous evaluation on both domain-specific and general benchmarks. Key evaluation metrics include:
|
40 |
|
41 |
+
<table>
|
42 |
+
<tr>
|
43 |
+
<td><img src="https://i.ibb.co/xGHRfLf/Screenshot-2024-06-11-at-10-23-59-PM.png" alt="Domain Specific Perplexity of Model Variants" width="300"></td>
|
44 |
+
<td><img src="https://i.ibb.co/2v6PdDx/Screenshot-2024-06-11-at-10-25-03-PM.png" alt="Domain Specific Evaluations of Model Variants" width="300"></td>
|
45 |
+
</tr>
|
46 |
+
<tr>
|
47 |
+
<td colspan="2" style="text-align:center;"><img src="https://i.ibb.co/K5d0wMh/Screenshot-2024-06-11-at-10-23-18-PM.png" alt="General Evaluations of Model Variants" width="600"></td>
|
48 |
+
</tr>
|
49 |
+
</table>
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
The evaluation results demonstrate significant improvements in domain-specific performance while maintaining strong general capabilities, thanks to the use of advanced CPT and model merging techniques.
|
53 |
|