Model Card for SVLM

This model is a Seq2Seq Language Model (SVLM) fine-tuned to answer questions from the ACL research paper dataset. It generates responses related to academic research questions, making it useful for research and academic inquiry.

Model Details

Model Description

Developed by: @binarybardakshat
Model type: Seq2Seq Language Model (BART-based)
Language(s) (NLP): English
License: [More Information Needed]
Finetuned from model: facebook/bart-base

Model Sources

Repository: [More Information Needed]

Uses

Direct Use

This model can be directly used to answer questions based on research data from ACL papers. It is suitable for academic and research purposes.

Out-of-Scope Use

The model may not work well for general conversation or non-research-related queries.

Bias, Risks, and Limitations

The model may carry biases present in the training data, which consists of ACL research papers. It might not generalize well outside this domain.

Recommendations

Users should be cautious of biases and ensure that outputs align with their academic requirements.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("path_to_your_tokenizer")
model = AutoModelForSeq2SeqLM.from_pretrained("path_to_your_model")
## Training Details

### Training Data

The model was trained using the ACL dataset, which consists of research papers focused on computational linguistics.

### Training Procedure

#### Training Hyperparameters

- **Training regime:** fp32
- **Learning rate:** 2e-5
- **Epochs:** 3
- **Batch size:** 8

## Evaluation

### Testing Data

The model was evaluated on a subset of the ACL dataset, focusing on research-related questions.

### Metrics

- **Accuracy**
- **Loss**

### Results

The model performs best in research-related question-answering tasks. Further evaluation metrics will be added as the model is used more widely.

## Environmental Impact

- **Hardware Type:** GPU (NVIDIA V100)
- **Hours used:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications

### Model Architecture and Objective

The model is based on BART architecture, designed to perform sequence-to-sequence tasks like text summarization and translation.

### Compute Infrastructure

#### Hardware

- **NVIDIA V100 GPU**

#### Software

- **TensorFlow**
- **Transformers**
- **Safetensors**