|
--- |
|
library_name: transformers |
|
license: llama3.2 |
|
base_model: meta-llama/Llama-3.2-1B |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: Intellecta |
|
results: [] |
|
datasets: |
|
- fka/awesome-chatgpt-prompts |
|
- BAAI/Infinity-Instruct |
|
- allenai/WildChat-1M |
|
- lavita/ChatDoctor-HealthCareMagic-100k |
|
- zjunlp/Mol-Instructions |
|
- garage-bAInd/Open-Platypus |
|
language: |
|
- en |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# Intellecta |
|
|
|
This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset. |
|
|
|
## Model description |
|
|
|
The model is based on LLaMA (Large Language Model Meta AI), a family of state-of-the-art language models developed for natural language understanding and generation. This specific implementation uses the LLaMA 3.2-1B model, which is fine-tuned for general-purpose conversational AI tasks. |
|
|
|
Architecture: Transformer-based causal language model. |
|
Tokenization: Uses the AutoTokenizer compatible with the LLaMA model, with adjustments to ensure proper padding. |
|
Pre-trained Foundation: The model builds on the pre-trained weights of LLaMA, focusing on improving performance for conversational and instruction-based tasks. |
|
Implementation: Developed with Hugging Face’s Transformers library for extensibility and ease of use. |
|
|
|
## Intended uses & limitations |
|
|
|
Intended Uses |
|
Instruction-following tasks: Can perform tasks such as answering questions, summarizing, and text generation. |
|
Conversational agents: Suitable for chatbots and virtual assistants, including those in specialized domains like healthcare or education. |
|
Research and Development: Fine-tuning and benchmarking against datasets for downstream tasks. |
|
|
|
## Training and evaluation data |
|
|
|
Datasets Used |
|
fka/awesome-chatgpt-prompts: General-purpose instruction-following and conversational dataset based on GPT-like interactions. |
|
BAAI/Infinity-Instruct (3M): A large instruction dataset containing a wide variety of tasks and instructions. |
|
allenai/WildChat-1M: Focused on open-ended conversational data. |
|
lavita/ChatDoctor-HealthCareMagic-100k: Healthcare-specific dataset for medical conversational agents. |
|
zjunlp/Mol-Instructions: Molecular biology-related instructions. |
|
garage-bAInd/Open-Platypus: Dataset aimed at general-purpose, open-domain reasoning. |
|
Data Preprocessing |
|
Text prompts and responses are tokenized with padding and truncation. |
|
Labels are derived from input tokens, masking padding tokens with -100 to exclude them from loss computation. |
|
|
|
## Training procedure |
|
The training procedure for the model fine-tunes the pre-trained LLaMA 3.2-1B model on various datasets with a focus on instruction-following and conversational tasks. Below are the key aspects of the training process: |
|
|
|
1. Preprocessing |
|
Tokenization: |
|
|
|
The input prompts and their responses are tokenized using the AutoTokenizer configured for LLaMA. |
|
Special considerations: |
|
Padding tokens are explicitly handled using the pad_token (set to the eos_token if undefined). |
|
Inputs are truncated to a maximum length of 512 tokens to fit model constraints. |
|
Label Preparation: |
|
|
|
Input IDs are cloned to create labels for supervised learning. |
|
Padding tokens in labels are masked with -100 to ensure they are ignored during loss computation. |
|
Dataset Mapping: |
|
|
|
Each dataset's prompt field is tokenized and reformatted into the model’s required input-output structure. |
|
Non-standard datasets without a prompt column are skipped to avoid errors. |
|
|
|
2. Model Setup |
|
Pre-trained Model: |
|
|
|
The base model, meta-llama/Llama-3.2-1B, is loaded with pre-trained weights. |
|
It is fine-tuned for causal language modeling, focusing on instruction-based outputs. |
|
Tokenizer Setup: |
|
|
|
The tokenizer ensures consistency in encoding and decoding for the model. |
|
Padding is fixed (using eos_token as a fallback). |
|
|
|
3. Training Configuration |
|
TrainingArguments: |
|
|
|
The Hugging Face TrainingArguments object is used to configure the training process: |
|
Output Directory: llama_output stores the model checkpoints and logs. |
|
Epochs: 4 epochs for a balance between training time and generalization. |
|
Batch Size: 4 examples per device to handle memory constraints. |
|
Gradient Accumulation: 4 steps to simulate a larger effective batch size. |
|
Learning Rate: 1e-4 with a warmup phase of 500 steps for stable optimization. |
|
Weight Decay: 0.01 to mitigate overfitting. |
|
Mixed Precision: FP16 (half-precision) is used for faster training and reduced memory usage. |
|
Logging Steps: Logs are generated every 10 steps to monitor training progress. |
|
Checkpointing: Model checkpoints are saved at the end of each epoch. |
|
Push to Hub: The fine-tuned model is uploaded to Hugging Face’s Hub (kssrikar4/Intellecta). |
|
Data Collator: |
|
|
|
The DataCollatorForSeq2Seq ensures that batches are dynamically padded for efficiency during training. |
|
|
|
4. Fine-Tuning Process |
|
Trainer: |
|
|
|
The Hugging Face Trainer class orchestrates the training process, combining the model, data, and training configuration. |
|
Loss is computed for each batch using the model's outputs (e.g., logits) and the prepared labels. |
|
The optimizer and learning rate scheduler are managed internally by the Trainer. |
|
Training Loop: |
|
|
|
During each epoch: |
|
The model processes batches of tokenized prompts and computes the causal language modeling (CLM) loss. |
|
Gradients are accumulated over multiple steps to simulate a larger batch size. |
|
Optimizer updates are applied after gradient accumulation. |
|
Validation: |
|
|
|
While validation data is not explicitly defined in the code, the Trainer supports evaluation if an eval_dataset is provided. |
|
Saving checkpoints at each epoch allows model evaluation post-training. |
|
5. Post-Training |
|
Push to Hub: |
|
|
|
The trained model, along with its tokenizer and configuration, is pushed to the Hugging Face Hub under the ID kssrikar4/Intellecta. |
|
Usage: |
|
|
|
The fine-tuned model can be downloaded and directly used for inference or further fine-tuning. |
|
|
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size: 16 |
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 500 |
|
- num_epochs: 4 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.48.0 |
|
- Pytorch 2.5.1+cpu |
|
- Datasets 3.2.0 |
|
- Tokenizers 0.21.0 |