--- language: en license: mit tags: - text-generation - causal-lm - mistral - wikipedia inference: true model_name: Mistral-7B-WikiFineTuned model_type: CausalLM pipeline_tag: text-generation --- # Mistral-7B-WikiFineTuned This project involves fine-tuning the [Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using the Wikipedia dataset. The goal is to create a model that provides accurate and informative text generation with a coherent and well-structured language output. ## Model Description - **Base Model:** Mistral-7B - **Fine-Tuned on:** Wikitext-103-raw-v1 - **Purpose:** The model is designed to offer the maximum amount of information with the shortest training time, aiming to provide accurate and informative content while maintaining a coherent and well-structured language output. - **License:** MIT ## How to Use To use this model, you can load it with the Hugging Face `transformers` library. Below is a basic example of how to use the model for text generation: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned") # Load the model model = AutoModelForCausalLM.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned", device_map="auto", load_in_4bit=True) # Create the pipeline generator = pipeline("text-generation", model=model, tokenizer=tokenizer) # Generate text prompt = "The future of AI is" output = generator(prompt, max_new_tokens=50) print(output[0]['generated_text']) ``` ### Inference API You can also use the model directly via the Hugging Face Inference API: ```python import requests API_URL = "https://api-inference.huggingface.co/models/Mesutby/mistral-7B-wikitext-finetuned" headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({"inputs": "The future of AI is"}) print(output) ``` ## Training Details - **Framework Used:** PyTorch - **Optimization Techniques:** - 4-bit quantization using `bitsandbytes` to reduce memory usage. - Training accelerated using `peft` and `accelerate`. ### Dataset The model was fine-tuned on the Wikitext-103-raw-v1 dataset, split into training and evaluation subsets. ### Training Configuration - **Learning Rate:** 2e-4 - **Batch Size:** 4 (with gradient accumulation) - **Max Steps:** 125 (for demonstration; should ideally be higher, e.g., 1000) - **Optimizer:** Paged AdamW (32-bit) - **Evaluation Strategy:** Evaluation every 25 steps - **PEFT Configuration:** LoRA with 8 ranks and dropout of 0.1 ### Hyperparameters - **Learning Rate:** 2e-4 - **Batch Size:** 4 - **Max Steps:** 125 (demo) ## Evaluation The model was evaluated on a subset of the Wikitext dataset. Detailed evaluation metrics can be observed during training. ## Limitations and Biases While the model performs well on a variety of text generation tasks, it may still exhibit biases present in the training data. Users should be cautious when deploying this model in sensitive or high-stakes applications. ## License This model is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details. ## Contact For any questions or issues, please contact [bymuhammedmesut@gmail.com](mailto:bymuhammedmesut@gmail.com).