sohi-g's picture
Update README.md
6278e5b
|
raw
history blame
7.15 kB
---
license: apache-2.0
datasets:
- briefai/LongShort-Dataset
language:
- en
pipeline_tag: text-generation
tags:
- pytorch
- dolly
- Gen-AI
- Finance
- KPI Extraction
---
# LongShort-Dolly-2-7B
### Model Description
LongShort-Dolly-2-7B is a large language model fine-tuned on earnings call documents to extract financial KPIs from the earnings call documents. It is based on the Dolly-2-7B Architecture.
- Model creator: [Brief AI](https://huggingface.co/briefai)
- Original model: [Dolly-2-7B](https://huggingface.co/databricks/dolly-v2-7b)
### Dataset Description
- Data Source: Factiva
- Data Description: 28K+ Earnings Call Documents
- Data Scope: 1K+ public companies
- Fine Tuning Data: Collection of 60K+ samples.
## Prompt template: LongShort-Dolly-2-7B
```
[INST]Given the context, answer the question.
### Question:
Extract all the finance-based performance indicators and evaluation metrics.
### Context:
{context}
### Answer:
[/INST]
```
## Basics
*This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
*It is useful for anyone who wants to reference the model.*
**Developed by:** [Brief AI Team](https://huggingface.co/briefai)
**Model Type:** Transformer-based Large Language Model
**Version:** 1.0.0
**Languages:** English
**License:** Apache 2.0
**Release Date Estimate:** Wednesday, 29.November.2023
**Send Questions to:** [email protected]
**Cite as:** Brief AI LongShort Language Model
**Funded by:** UChicago Data Science Institute
**Mentored by:** Nick Kadochnikov
## Technical Specifications
*This section includes details about the model objective and architecture, and the compute infrastructure.*
*It is useful for people interested in model development.*
Please see [the LongShort training README](https://github.com/brief-ai-uchicago/LongShort-Dataset) for full details on replicating training.
### Model Architecture and Objective
* Modified from Dolly-2-7B
**Objective:** Financial KPI extraction from earnings call documents.
### Hardware and Software - Compute Infrastructure
* 4 NVIDIA L4 GPUs & 48 vCPUs
* Environment: PyTorch (pytorch-2.0 w/ CUDA-11.8; see [Github link](https://github.com/pytorch/pytorch))
* CPU: GCP G2 Standard 48 (Platform: Intel Cascade Lake) (Accelerator Optimized)
* CPU memory: 192GB RAM
* GPU memory: 30GB per GPU
## Training
*This section provides information about the training.*
*It is useful for people who want to learn more about the model inputs and training footprint.*
The following bits and bytes quantization config was used during training:
* quant_method: bitsandbytes
* load_in_8bit: False
* load_in_4bit: True
* llm_int8_threshold: 6.0
* llm_int8_skip_modules: None
* llm_int8_enable_fp32_cpu_offload: False
* llm_int8_has_fp16_weight: False
* bnb_4bit_quant_type: nf4
* bnb_4bit_use_double_quant: True
* bnb_4bit_compute_dtype: float16
Framework versions
* PEFT 0.4.0
### Training Data
*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
Details for the dataset can be found in [LongShort Dataset](https://github.com/brief-ai-uchicago/LongShort-Dataset)
Training data includes:
- 5000 Earnings Call Documents
## How to use
This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
[LongShort-Dolly-2-7B](https://huggingface.co/briefai/LongShort-Dolly-2-7B)
## Intended Use
This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pre-trained base model that can be further fine-tuned for specific tasks. The use cases below are not exhaustive.
### Direct Use
- Text generation
- Exploring characteristics of language generated by a language model
- Examples: Cloze tests, counterfactuals, generations with reframings
### Downstream Use
- Tasks that leverage language models include: Information Extraction, Question Answering, Summarization
#### Out-of-scope Uses
Using the model in [high-stakes](#high-stakes) settings is out of scope for this model. The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct.
Out-of-scope Uses Include:
- Usage for evaluating or scoring individuals, such as for employment, education, or credit
- Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
#### Misuse
Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:
- Spam generation
- Disinformation and influence operations
- Disparagement and defamation
- Harassment and abuse
- [Deception](#deception)
- Unconsented impersonation and imitation
- Unconsented surveillance
- Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)
## Intended Users
### Direct Users
- General Public
- Researchers
- Students
- Educators
- Engineers/developers
- Non-commercial entities
- Financial Industry
# Risks and Limitations
*This section identifies foreseeable harms and misunderstandings.*
Model may:
- Overrepresent some viewpoints and underrepresent others
- Contain stereotypes
- Contain [personal information](#personal-data-and-information)
- Generate:
- Hateful, abusive, or violent language
- Discriminatory or prejudicial language
- Content that may not be appropriate for all settings, including sexual content
- Make errors, including producing incorrect information as if it were factual
- Generate irrelevant or repetitive outputs
- Induce users into attributing human traits to it, such as sentience or consciousness
# Evaluation
*This section describes the evaluation protocols and provides the results.*
Result: LongShort-Falcon-7B gives 45.4% accuracy on a validation set of 10% of the original training dataset.
**Train-time Evaluation:**
Final checkpoint after 700 epochs:
- Training Loss: 1.645
# Recommendations
*This section provides information on warnings and potential mitigations.*
- Indirect users should be made aware when the content they're working with is created by the LLM.
- Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
- Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
# Model Card Authors
Vishal Parameshwaran, Garima Sohi, Jose Gerala, Sanchit Narayan Kumar