Mistral-7B-WikiFineTuned
This project involves fine-tuning the Mistral-7B-Instruct model using the Wikipedia dataset. The goal is to create a model that provides accurate and informative text generation with a coherent and well-structured language output.
Model Description
- Base Model: Mistral-7B
- Fine-Tuned on: Wikitext-103-raw-v1
- Purpose: The model is designed to offer the maximum amount of information with the shortest training time, aiming to provide accurate and informative content while maintaining a coherent and well-structured language output.
- License: MIT
How to Use
To use this model, you can load it with the Hugging Face transformers
library. Below is a basic example of how to use the model for text generation:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned")
# Load the model
model = AutoModelForCausalLM.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned",
device_map="auto",
load_in_4bit=True)
# Create the pipeline
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
# Generate text
prompt = "The future of AI is"
output = generator(prompt, max_new_tokens=50)
print(output[0]['generated_text'])
Inference API
You can also use the model directly via the Hugging Face Inference API:
import requests
API_URL = "https://api-inference.huggingface.co/models/Mesutby/mistral-7B-wikitext-finetuned"
headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({"inputs": "The future of AI is"})
print(output)
Training Details
- Framework Used: PyTorch
- Optimization Techniques:
- 4-bit quantization using
bitsandbytes
to reduce memory usage. - Training accelerated using
peft
andaccelerate
.
- 4-bit quantization using
Dataset
The model was fine-tuned on the Wikitext-103-raw-v1 dataset, split into training and evaluation subsets.
Training Configuration
- Learning Rate: 2e-4
- Batch Size: 4 (with gradient accumulation)
- Max Steps: 125 (for demonstration; should ideally be higher, e.g., 1000)
- Optimizer: Paged AdamW (32-bit)
- Evaluation Strategy: Evaluation every 25 steps
- PEFT Configuration: LoRA with 8 ranks and dropout of 0.1
Hyperparameters
- Learning Rate: 2e-4
- Batch Size: 4
- Max Steps: 125 (demo)
Evaluation
The model was evaluated on a subset of the Wikitext dataset. Detailed evaluation metrics can be observed during training.
Limitations and Biases
While the model performs well on a variety of text generation tasks, it may still exhibit biases present in the training data. Users should be cautious when deploying this model in sensitive or high-stakes applications.
License
This model is licensed under the MIT License. See the LICENSE file for more details.
Contact
For any questions or issues, please contact [email protected].
- Downloads last month
- 2