Model Overview This model is a fine-tuned version of the Llama-3.2-3B model, trained on a curated and optimized dataset derived through Facility Location (FL) techniques. The base model, Llama-3.2-3B, is a state-of-the-art large language model designed for various natural language processing tasks, and it has been further adapted to improve its task-specific performance.
Dataset Details Original Dataset: The dataset initially consisted of 10,000 samples, combining diverse conversational pairs for instruction tuning and response generation tasks. Data Selection Process: The Facility Location (FL) algorithm was applied to the original dataset to identify the most representative and diverse samples. This method maximized dataset utility by ensuring a balanced and informative subset while maintaining the richness of the original data. As a result, the dataset was reduced to 7,000 high-quality samples, retaining only the most relevant and representative data points. Dataset Characteristics: Chosen-Response Pairs: 7,000 samples of question-response pairs refined to optimize learning efficiency. Diversity & Balance: The FL algorithm ensured the dataset captures diverse language usage and contexts without redundancy.
- Downloads last month
- 11