File size: 5,390 Bytes
b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b 21d055c dc9d99b bfa7b8b dc9d99b 7786a5f b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b 7430cd3 b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce dc9d99b b64b3ce 7786a5f b64b3ce dc9d99b 7786a5f ecc2379 202d891 c4eef01 dc9d99b 202d891 dc9d99b ecc2379 b64b3ce dc9d99b 90607a1 f42739c dc9d99b 7810a9c dc9d99b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: mit
language:
- en
pipeline_tag: text-generation
tags:
- llama-3
- astronomy
- astrophysics
- arxiv
inference: false
base_model:
- meta-llama/Llama-3-8b-hf
---
# AstroLLaMA-3-8B-Base_AIC
AstroLLaMA-3-8B is a specialized base language model for astronomy, developed by fine-tuning Meta's LLaMA-3-8b architecture on astronomical literature. This model was developed by the AstroMLab team. It is designed for next token prediction tasks and is not an instruct/chat model.
## Model Details
- **Base Architecture**: LLaMA-3-8b
- **Training Data**: Abstract, Introduction, and Conclusion (AIC) sections from arXiv's astro-ph category papers
- **Data Processing**: Optical character recognition (OCR) on PDF files using the Nougat tool, followed by summarization using Qwen-2-8B and LLaMA-3.1-8B.
- **Fine-tuning Method**: Continual Pre-Training (CPT) using the LMFlow framework
- **Training Details**:
- Learning rate: 2 × 10⁻⁵
- Total batch size: 96
- Maximum token length: 512
- Warmup ratio: 0.03
- No gradient accumulation
- BF16 format
- Cosine decay schedule for learning rate reduction
- Training duration: 1 epoch
- **Primary Use**: Next token prediction for astronomy-related text generation and analysis
- **Reference**: [Pan et al. 2024](https://arxiv.org/abs/2409.19750)
## Generating text from a prompt
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("AstroMLab/astrollama-3-8b-base_aic")
model = AutoModelForCausalLM.from_pretrained("AstroMLab/astrollama-3-8b-base_aic", device_map="auto")
# Create the pipeline with explicit truncation
from transformers import pipeline
generator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
device_map="auto",
truncation=True,
max_length=512
)
# Example prompt from an astronomy paper
prompt = "In this letter, we report the discovery of the highest redshift, " \
"heavily obscured, radio-loud QSO candidate selected using JWST NIRCam/MIRI, " \
"mid-IR, sub-mm, and radio imaging in the COSMOS-Web field. "
# Set seed for reproducibility
torch.manual_seed(42)
# Generate text
generated_text = generator(prompt, do_sample=True)
print(generated_text[0]['generated_text'])
```
## Model Limitations and Biases
A key limitation identified during the development of this model is that training solely on astro-ph data may not be sufficient to significantly improve performance over the base model, especially for the already highly performant LLaMA-3 series. This suggests that to achieve substantial gains, future iterations may need to incorporate a broader range of high-quality astronomical data beyond arXiv, such as textbooks, Wikipedia, and curated summaries.
Here's a performance comparison chart based upon the astronomical benchmarking Q&A as described in [Ting et al. 2024](https://arxiv.org/abs/2407.11194):
| Model | Score (%) |
|-------|-----------|
| **AstroSage-LLaMA-3.1-8B (AstroMLab)** | **80.9** |
| LLaMA-3.1-8B | 73.7 |
| LLaMA-3-8B | 72.9 |
| **<span style="color:green">AstroLLaMA-3-8B-Base_AIC (AstroMLab)</span>** | **<span style="color:green">71.9</span>** |
| Gemma-2-9B | 71.5 |
| Qwen-2.5-7B | 70.4 |
| Yi-1.5-9B | 68.4 |
| InternLM-2.5-7B | 64.5 |
| Mistral-7B-v0.3 | 63.9 |
| ChatGLM3-6B | 50.4 |
| AstroLLaMA-2-7B-AIC | 44.3 |
| AstroLLaMA-2-7B-Abstract | 43.5 |
As shown, while AstroLLaMA-3-8B performs competitively among models in its class, it does not surpass the performance of the base LLaMA-3-8B model. This underscores the challenges in developing specialized models and the need for more diverse and comprehensive training data.
This model is released primarily for reproducibility purposes, allowing researchers to track the development process and compare different iterations of AstroLLaMA models.
For optimal performance and the most up-to-date capabilities in astronomy-related tasks, we recommend using AstroSage-8B, where these limitations have been addressed. The newer model incorporates expanded training data beyond astro-ph and features a greatly expanded fine-tuning process, resulting in significantly improved performance.
## Ethical Considerations
While this model is designed for scientific use, users should be mindful of potential misuse, such as generating misleading scientific content. Always verify model outputs against peer-reviewed sources for critical applications.
## Citation
If you use this model in your research, please cite:
```
@ARTICLE{2024arXiv240919750P,
author = {{Pan}, Rui and {Dung Nguyen}, Tuan and {Arora}, Hardik and {Accomazzi}, Alberto and {Ghosal}, Tirthankar and {Ting}, Yuan-Sen},
title = "{AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy}",
journal = {arXiv e-prints},
keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Computation and Language},
year = 2024,
month = sep,
eid = {arXiv:2409.19750},
pages = {arXiv:2409.19750},
doi = {10.48550/arXiv.2409.19750},
archivePrefix = {arXiv},
eprint = {2409.19750},
primaryClass = {astro-ph.IM},
adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240919750P},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
``` |