Update README.md
Browse files
README.md
CHANGED
@@ -65,9 +65,26 @@ print(response)
|
|
65 |
|
66 |
## Model Limitations and Biases
|
67 |
|
68 |
-
This model is specifically trained on astronomy literature
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
|
70 |
-
Importantly, this model has been superseded by more advanced versions. For state-of-the-art performance, we recommend using the latest models from AstroMLab.
|
71 |
|
72 |
## Ethical Considerations
|
73 |
|
|
|
65 |
|
66 |
## Model Limitations and Biases
|
67 |
|
68 |
+
This model is specifically trained on astronomy literature (abstracts, introductions, and conclusions) and may not generalize well to other domains. Users should be aware of potential biases in the training data, which may reflect historical trends and biases in astronomical research publications. Additionally, the regex-based extraction method used for processing the LaTeX source files may introduce some biases or inconsistencies in the training data.
|
69 |
+
|
70 |
+
Importantly, this model has been superseded by more advanced versions. Here's a performance comparison chart based upon the astronomical benchmarking Q&A as described in [Ting et al. 2024](https://arxiv.org/abs/2407.11194), and Pan et al. 2024.
|
71 |
+
|
72 |
+
| Model | Score (%) |
|
73 |
+
|-------|-----------|
|
74 |
+
| **AstroLLaMA-3.1-8B-Plus (AstroMLab)** | **77.2** |
|
75 |
+
| LLaMA-3.1-8B | 73.7 |
|
76 |
+
| **AstroLLaMA-2-70B (AstroMLab)** | **72.3** |
|
77 |
+
| Gemma-2-9B | 71.5 |
|
78 |
+
| Qwen-2.5-7B | 70.4 |
|
79 |
+
| Yi-1.5-9B | 68.4 |
|
80 |
+
| InternLM-2.5-7B | 64.0 |
|
81 |
+
| Mistral-7B-v0.3 | 63.9 |
|
82 |
+
| ChatGLM3-6B | 50.4 |
|
83 |
+
| <span style="color:red">AstroLLaMA-2-7B-AIC</span> | <span style="color:red">44.3</span> |
|
84 |
+
| AstroLLaMA-2-7B-Abstract | 43.5 |
|
85 |
+
|
86 |
+
As shown, AstroLLaMA-2-7B series are outperformed by newer models. For state-of-the-art performance, we recommend using the latest models.
|
87 |
|
|
|
88 |
|
89 |
## Ethical Considerations
|
90 |
|