|
--- |
|
license: mit |
|
datasets: |
|
- neural-bridge/rag-dataset-12000 |
|
- neural-bridge/rag-dataset-1200 |
|
language: |
|
- en |
|
--- |
|
# VERY IMPORTANT |
|
This model is in alpha phase and is NOT yet recommended for use. |
|
# RAGPT-2 (alpha v1): Fine-tuned GPT-2 for Context-Based Question Answering |
|
|
|
## Model Description |
|
|
|
RAGPT-2 is a fine-tuned version of [GPT-2 small](https://huggingface.co/BueormLLC/CleanGPT), specifically adapted for context-based question answering tasks. This model has been trained to generate relevant answers based on a given context and question, similar to a Retrieval-Augmented Generation (RAG) system. |
|
|
|
### Key Features |
|
|
|
- Based on the GPT-2 small architecture (124M parameters) |
|
- Fine-tuned on the "neural-bridge/rag-dataset-12000" and others dataset from Hugging Face |
|
- Capable of generating answers based on provided context and questions |
|
- Suitable for various question-answering applications |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned using the "neural-bridge/rag-dataset-12000" and "neural-bridge/rag-dataset-1200" dataset, which contains: |
|
- Context passages |
|
- Questions related to the context |
|
- Corresponding answers |
|
|
|
## Fine-tuning Process |
|
|
|
The fine-tuning process involved: |
|
1. Loading the pre-trained GPT-2 small model |
|
2. Preprocessing the dataset to combine context, question, and answer into a single text |
|
3. Training the model to predict the next token given the context and question |
|
|
|
### Hyperparameters |
|
|
|
- Base model: GPT-2 small |
|
- Number of training epochs: 8 |
|
- Batch size: 4 |
|
- Learning rate: Default AdamW optimizer settings |
|
- Max sequence length: 512 tokens |
|
|
|
## Usage |
|
|
|
To use the model: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("BueormLLC/RAGPT-2") |
|
model = AutoModelForCausalLM.from_pretrained("BueormLLC/RAGPT-2") |
|
|
|
context = "Mount Everest is the highest mountain in the world, with a height of 8,848 meters." |
|
question = "What is the height of Mount Everest?" |
|
input_text = f"Context: {context}\nquestion: {question}\nanswer:" |
|
|
|
input_ids = tokenizer.encode(input_text, return_tensors="pt") |
|
output = model.generate(input_ids, max_length=150, num_return_sequences=1) |
|
answer = tokenizer.decode(output[0], skip_special_tokens=True) |
|
|
|
print(f"Respuesta generada: {answer}") |
|
``` |
|
|
|
## Limitations |
|
|
|
- The model's knowledge is limited to its training data and the base GPT-2 model. |
|
- It may sometimes generate irrelevant or incorrect answers, especially for topics outside its training domain. |
|
- The model does not have access to external information or real-time data. |
|
|
|
## Ethical Considerations |
|
|
|
Users should be aware that this model, like all language models, may reflect biases present in its training data. It should not be used as a sole source of information for critical decisions. |
|
|
|
## Future Improvements |
|
|
|
- Fine-tuning on a larger and more diverse dataset |
|
- Experimenting with larger base models (e.g., GPT-2 medium or large) |
|
- Implementing techniques to improve factual accuracy and reduce hallucinations |
|
|
|
## Support us |
|
|
|
- [Paypal](https://paypal.me/bueorm) |
|
- [Patreon](https://patreon.com/bueorm) |
|
### We appreciate your support, without you we could not do what we do. |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
``` |
|
@misc{RAGPT, |
|
author = {Bueorm}, |
|
title = {RAGPT-2: Fine-tuned GPT-2 for Context-Based Question Answering}, |
|
year = {2024}, |
|
publisher = {GitHub}, |
|
journal = {None}, |
|
howpublished = {\url{https://huggingface.co/BueormLLC/RAGPT-2}} |
|
} |
|
``` |