Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- onurkeles/econ_paper_abstracts
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- bleu
|
9 |
+
- rouge
|
10 |
+
library_name: transformers
|
11 |
+
---
|
12 |
+
# LLaMA-2-Econ: Title Generation Model
|
13 |
+
|
14 |
+
## Model Description
|
15 |
+
A fine-tuned version of the LLaMA-2-7B model for generating titles for economic research papers. Utilizing techniques like Quantized Low Rank Adaptation (QLoRA) and Parameter Efficient Fine Tuning (PEFT), this model aims to enhance the creativity and relevance of generated titles based on abstracts from economic papers.
|
16 |
+
|
17 |
+
## Intended Uses & Limitations
|
18 |
+
This model is designed to assist researchers by generating insightful and relevant titles for their economic research papers. Limitations include potential biases present in the training data and the need for human review to ensure title appropriateness and accuracy.
|
19 |
+
|
20 |
+
## Training and Evaluation Data
|
21 |
+
The model was fine-tuned on a collection of economics paper abstracts and their corresponding titles, obtained through the arXiv API, covering a wide range of economic subfields.
|
22 |
+
|
23 |
+
### Training Hyperparameters:
|
24 |
+
- **QLoRA Settings:**
|
25 |
+
- `lora_rank (lora_r)`: 64
|
26 |
+
- `lora_dropout`: 0.1
|
27 |
+
- **Precision & Quantization:**
|
28 |
+
- Precision: 4-bit
|
29 |
+
- Computation dtype: float16
|
30 |
+
- Quantization type: "nf4", with nested quantization
|
31 |
+
- **Training Schedule:**
|
32 |
+
- Epochs: 8, with early stopping patience of 2 epochs for efficiency
|
33 |
+
- bf16 training enabled
|
34 |
+
- **Optimizer & Learning Rate:**
|
35 |
+
- Optimizer: paged AdamW with 32-bit precision
|
36 |
+
- Learning rate: 2e-4, using a cosine learning rate scheduler
|
37 |
+
- Warmup ratio: 0.03
|
38 |
+
- **Additional Settings:**
|
39 |
+
- Gradient checkpointing and a maximum gradient norm of 0.3
|
40 |
+
- Sequences grouped by length for training efficiency
|
41 |
+
- PEFT adapters merged into the baseline models for enhanced performance
|
42 |
+
|
43 |
+
## Evaluation Results
|
44 |
+
- BLEU: 0.16
|
45 |
+
- ROUGE-1: 0.45
|
46 |
+
- ROUGE-2: 0.24
|
47 |
+
- ROUGE-L: 0.41
|