metadata
license: apache-2.0
datasets:
- financial_phrasebank
- pauri32/fiqa-2018
- zeroshot/twitter-financial-news-sentiment
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- finance
We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.
Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:
- Financial Terms
- Financial Datasets
- Earnings Call 2016-2023 NASDAQ 100 components stocks's Earnings Call Transcripts.
In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:
- Masking Probability: 0.4 (instead of default 0.15)
- Warmup Steps: 0 (deriving better results than models with warmup steps)
- Epochs: 1 (is enough in case of overfitting)
- weight_decay: 0.01
- Train Batch Size: 64
- FP16