---
language: en
license: mit
tags:
- text-generation
- causal-lm
- mistral
- wikipedia
inference: true
model_name: Mistral-7B-WikiFineTuned
model_type: CausalLM
pipeline_tag: text-generation
---

# Mistral-7B-WikiFineTuned

This project involves fine-tuning the [Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using the Wikipedia dataset. The goal is to create a model that provides accurate and informative text generation with a coherent and well-structured language output.

## Model Description

- **Base Model:** Mistral-7B
- **Fine-Tuned on:** Wikitext-103-raw-v1
- **Purpose:** The model is designed to offer the maximum amount of information with the shortest training time, aiming to provide accurate and informative content while maintaining a coherent and well-structured language output.
- **License:** MIT

## How to Use

To use this model, you can load it with the Hugging Face `transformers` library. Below is a basic example of how to use the model for text generation:

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned")

# Load the model
model = AutoModelForCausalLM.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned", 
                                             device_map="auto",
                                             load_in_4bit=True)

# Create the pipeline
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text
prompt = "The future of AI is"
output = generator(prompt, max_new_tokens=50)
print(output[0]['generated_text'])
```

### Inference API

You can also use the model directly via the Hugging Face Inference API:

```python
import requests

API_URL = "https://api-inference.huggingface.co/models/Mesutby/mistral-7B-wikitext-finetuned"
headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({"inputs": "The future of AI is"})
print(output)
```

## Training Details

- **Framework Used:** PyTorch
- **Optimization Techniques:** 
  - 4-bit quantization using `bitsandbytes` to reduce memory usage.
  - Training accelerated using `peft` and `accelerate`.

### Dataset

The model was fine-tuned on the Wikitext-103-raw-v1 dataset, split into training and evaluation subsets.

### Training Configuration

- **Learning Rate:** 2e-4
- **Batch Size:** 4 (with gradient accumulation)
- **Max Steps:** 125 (for demonstration; should ideally be higher, e.g., 1000)
- **Optimizer:** Paged AdamW (32-bit)
- **Evaluation Strategy:** Evaluation every 25 steps
- **PEFT Configuration:** LoRA with 8 ranks and dropout of 0.1

### Hyperparameters

- **Learning Rate:** 2e-4
- **Batch Size:** 4
- **Max Steps:** 125 (demo)

## Evaluation

The model was evaluated on a subset of the Wikitext dataset. Detailed evaluation metrics can be observed during training.

## Limitations and Biases

While the model performs well on a variety of text generation tasks, it may still exhibit biases present in the training data. Users should be cautious when deploying this model in sensitive or high-stakes applications.

## License

This model is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

## Contact

For any questions or issues, please contact [bymuhammedmesut@gmail.com](mailto:bymuhammedmesut@gmail.com).