AstroMLab
/

astrollama-3-8b-base_aic

Text Generation

text-generation-inference

Model card Files Files and versions Community

tingyuansen commited on Sep 29, 2024

Commit

bfa7b8b

·

verified ·

1 Parent(s): 3576e8d

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ AstroLLaMA-3-8B is a specialized base language model for astronomy, developed by
   - No gradient accumulation
   - BF16 format
   - Cosine decay schedule for learning rate reduction
-  - Training duration: 1 epoch (approximately 32 A100 GPU hours)
 - **Primary Use**: Next token prediction for astronomy-related text generation and analysis
 - **Reference**: Pan et al. 2024 [Link to be added]
@@ -78,8 +78,8 @@ Here's a performance comparison chart based upon the astronomical benchmarking Q
 | Model | Score (%) |
 |-------|-----------|
 | LLaMA-3.1-8B | 73.7 |
-| **<span style="color:red">AstroLLaMA-3-8B-Base_AIC (AstroMLab)</span>** | **<span style="color:red">71.9</span>** |
 | LLaMA-3-8B | 72.0 |
 | Gemma-2-9B | 71.5 |
 | Qwen-2.5-7B | 70.4 |
 | Yi-1.5-9B | 68.4 |

   - No gradient accumulation
   - BF16 format
   - Cosine decay schedule for learning rate reduction
+  - Training duration: 1 epoch
 - **Primary Use**: Next token prediction for astronomy-related text generation and analysis
 - **Reference**: Pan et al. 2024 [Link to be added]
 | Model | Score (%) |
 |-------|-----------|
 | LLaMA-3.1-8B | 73.7 |
 | LLaMA-3-8B | 72.0 |
+| **<span style="color:red">AstroLLaMA-3-8B-Base_AIC (AstroMLab)</span>** | **<span style="color:red">71.9</span>** |
 | Gemma-2-9B | 71.5 |
 | Qwen-2.5-7B | 70.4 |
 | Yi-1.5-9B | 68.4 |