onurkeles commited on
Commit
f4f7aa9
·
verified ·
1 Parent(s): c2dc6c4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - onurkeles/econ_paper_abstracts
5
+ language:
6
+ - en
7
+ metrics:
8
+ - bleu
9
+ - rouge
10
+ library_name: transformers
11
+ ---
12
+ # LLaMA-2-Econ: Title Generation Model
13
+
14
+ ## Model Description
15
+ A fine-tuned version of the LLaMA-2-7B model for generating titles for economic research papers. Utilizing techniques like Quantized Low Rank Adaptation (QLoRA) and Parameter Efficient Fine Tuning (PEFT), this model aims to enhance the creativity and relevance of generated titles based on abstracts from economic papers.
16
+
17
+ ## Intended Uses & Limitations
18
+ This model is designed to assist researchers by generating insightful and relevant titles for their economic research papers. Limitations include potential biases present in the training data and the need for human review to ensure title appropriateness and accuracy.
19
+
20
+ ## Training and Evaluation Data
21
+ The model was fine-tuned on a collection of economics paper abstracts and their corresponding titles, obtained through the arXiv API, covering a wide range of economic subfields.
22
+
23
+ ### Training Hyperparameters:
24
+ - **QLoRA Settings:**
25
+ - `lora_rank (lora_r)`: 64
26
+ - `lora_dropout`: 0.1
27
+ - **Precision & Quantization:**
28
+ - Precision: 4-bit
29
+ - Computation dtype: float16
30
+ - Quantization type: "nf4", with nested quantization
31
+ - **Training Schedule:**
32
+ - Epochs: 8, with early stopping patience of 2 epochs for efficiency
33
+ - bf16 training enabled
34
+ - **Optimizer & Learning Rate:**
35
+ - Optimizer: paged AdamW with 32-bit precision
36
+ - Learning rate: 2e-4, using a cosine learning rate scheduler
37
+ - Warmup ratio: 0.03
38
+ - **Additional Settings:**
39
+ - Gradient checkpointing and a maximum gradient norm of 0.3
40
+ - Sequences grouped by length for training efficiency
41
+ - PEFT adapters merged into the baseline models for enhanced performance
42
+
43
+ ## Evaluation Results
44
+ - BLEU: 0.16
45
+ - ROUGE-1: 0.45
46
+ - ROUGE-2: 0.24
47
+ - ROUGE-L: 0.41