|
--- |
|
license: mit |
|
datasets: |
|
- neural-bridge/rag-dataset-12000 |
|
language: |
|
- en |
|
--- |
|
|
|
# RAGPT: Fine-tuned GPT-2 for Context-Based Question Answering |
|
|
|
## Model Description |
|
|
|
RAGPT is a fine-tuned version of GPT-2 small, specifically adapted for context-based question answering tasks. This model has been trained to generate relevant answers based on a given context and question, similar to a Retrieval-Augmented Generation (RAG) system. |
|
|
|
### Key Features |
|
|
|
- Based on the GPT-2 small architecture (124M parameters) |
|
- Fine-tuned on the "neural-bridge/rag-dataset-12000" dataset from Hugging Face |
|
- Capable of generating answers based on provided context and questions |
|
- Suitable for various question-answering applications |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned using the "neural-bridge/rag-dataset-12000" dataset, which contains: |
|
- Context passages |
|
- Questions related to the context |
|
- Corresponding answers |
|
|
|
## Fine-tuning Process |
|
|
|
The fine-tuning process involved: |
|
1. Loading the pre-trained GPT-2 small model |
|
2. Preprocessing the dataset to combine context, question, and answer into a single text |
|
3. Training the model to predict the next token given the context and question |
|
|
|
### Hyperparameters |
|
|
|
- Base model: GPT-2 small |
|
- Number of training epochs: 3 |
|
- Batch size: 4 |
|
- Learning rate: Default AdamW optimizer settings |
|
- Max sequence length: 512 tokens |
|
|
|
## Usage |
|
|
|
To use the model: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "BueormLLC/RAGPT" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
# Prepare input |
|
context = "Your context here" |
|
question = "Your question here" |
|
input_text = f"Contexto: {context}\nPregunta: {question}\nRespuesta:" |
|
|
|
# Generate answer |
|
input_ids = tokenizer.encode(input_text, return_tensors="pt") |
|
output = model.generate(input_ids, max_length=150, num_return_sequences=1) |
|
answer = tokenizer.decode(output[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Limitations |
|
|
|
- The model's knowledge is limited to its training data and the base GPT-2 model. |
|
- It may sometimes generate irrelevant or incorrect answers, especially for topics outside its training domain. |
|
- The model does not have access to external information or real-time data. |
|
|
|
## Ethical Considerations |
|
|
|
Users should be aware that this model, like all language models, may reflect biases present in its training data. It should not be used as a sole source of information for critical decisions. |
|
|
|
## Future Improvements |
|
|
|
- Fine-tuning on a larger and more diverse dataset |
|
- Experimenting with larger base models (e.g., GPT-2 medium or large) |
|
- Implementing techniques to improve factual accuracy and reduce hallucinations |
|
|
|
## Support us |
|
|
|
- [Paypal](https://paypal.me/bueorm) |
|
- [Patreon](https://patreon.com/bueorm) |
|
### We appreciate your support, without you we could not do what we do. |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
``` |
|
@misc{RAGPT, |
|
author = {Bueorm}, |
|
title = {RAGPT: Fine-tuned GPT-2 for Context-Based Question Answering}, |
|
year = {2024}, |
|
publisher = {GitHub}, |
|
journal = {None}, |
|
howpublished = {\url{https://huggingface.co/BueormLLC/RAGPT}} |
|
} |
|
``` |