Model Card for SVLM
This model is a Seq2Seq Language Model (SVLM) fine-tuned to answer questions from the ACL research paper dataset. It generates responses related to academic research questions, making it useful for research and academic inquiry.
Model Details
Model Description
- Developed by: @binarybardakshat
- Model type: Seq2Seq Language Model (BART-based)
- Language(s) (NLP): English
- License: [More Information Needed]
- Finetuned from model: facebook/bart-base
Model Sources
- Repository: [More Information Needed]
Uses
Direct Use
This model can be directly used to answer questions based on research data from ACL papers. It is suitable for academic and research purposes.
Out-of-Scope Use
The model may not work well for general conversation or non-research-related queries.
Bias, Risks, and Limitations
The model may carry biases present in the training data, which consists of ACL research papers. It might not generalize well outside this domain.
Recommendations
Users should be cautious of biases and ensure that outputs align with their academic requirements.
How to Get Started with the Model
Use the code below to get started with the model:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("path_to_your_tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained("path_to_your_model")
## Training Details
### Training Data
The model was trained using the ACL dataset, which consists of research papers focused on computational linguistics.
### Training Procedure
#### Training Hyperparameters
- **Training regime:** fp32
- **Learning rate:** 2e-5
- **Epochs:** 3
- **Batch size:** 8
## Evaluation
### Testing Data
The model was evaluated on a subset of the ACL dataset, focusing on research-related questions.
### Metrics
- **Accuracy**
- **Loss**
### Results
The model performs best in research-related question-answering tasks. Further evaluation metrics will be added as the model is used more widely.
## Environmental Impact
- **Hardware Type:** GPU (NVIDIA V100)
- **Hours used:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications
### Model Architecture and Objective
The model is based on BART architecture, designed to perform sequence-to-sequence tasks like text summarization and translation.
### Compute Infrastructure
#### Hardware
- **NVIDIA V100 GPU**
#### Software
- **TensorFlow**
- **Transformers**
- **Safetensors**