|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceH4/ultrachat_200k |
|
base_model: |
|
- HuggingFaceTB/SmolLM2-1.7B |
|
library_name: peft |
|
--- |
|
|
|
# SmolLM2-1.7B-UltraChat_200k |
|
 |
|
|
|
Quantized Low Rank Adaptation (QLoRA) finetuned from HuggingFaceTB/SmolLM2-1.7B to UltraChat 200k dataset. |
|
|
|
Serves as an exercise in LLM post-training. |
|
|
|
## Model Details |
|
|
|
- **Developed by:** Andrew Melbourne |
|
- **Model type:** Language Model |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** HuggingFaceTB/SmolLM2-1.7B |
|
|
|
### Model Sources |
|
|
|
Training and inference scripts are available here. |
|
- **Repository:** [SmolLM2-1.7B-ultrachat_200k on Github](https://github.com/Melbourneandrew/SmolLM2-1.7B-UltraChat_200k) |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
```python |
|
from peft import LoraConfig, get_peft_model, TaskType |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("M3LBY/SmolLM2-1.7B-UltraChat_200k") |
|
tokenizer = AutoTokenizer.from_pretrained("M3LBY/SmolLM2-1.7B-UltraChat_200k") |
|
|
|
messages = [{"role": "user", "content": "How far away is the sun?"}] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
outputs = model.generate(**inputs) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
## Training Details |
|
|
|
The adapter model was trained using Supervised Fine-Tuning (SFT) with the following configuration: |
|
|
|
- Base model: SmolLM2-1.7B |
|
- Mixed precision: bfloat16 |
|
- Learning rate: 2e-5 with linear scheduler |
|
- Warmup ratio: 0.1 |
|
- Training epochs: 1 |
|
- Effective batch size: 32 |
|
- Sequence length: 512 tokens |
|
- Flash Attention 2 enabled |
|
|
|
Trained to a loss of 1.6965 after 6,496 steps. |
|
|
|
Elapsed time: 2 hours 37 minutes. |
|
|
|
Consumed ~22 Colab Compute Units for an estimated cost of $2.21 cents. |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
|
|
## Citation [optional] |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
- PEFT 0.14.0% |