M3LBY
/

SmolLM2-1.7B-UltraChat_200k

Model card Files Files and versions Community

SmolLM2-1.7B-UltraChat_200k / README.md

M3LBY's picture

update readme text

e656196 28 days ago

|

history blame contribute delete

2.38 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceH4/ultrachat_200k
	base_model:
	- HuggingFaceTB/SmolLM2-1.7B
	library_name: peft
	---

	# SmolLM2-1.7B-UltraChat_200k
	![SmolLM2-1.7B-UltraChat_200k](https://imagedelivery.net/tQa_QONPmkASFny9ZSDT4A/7b7d93d3-72fb-4a22-e4cf-d762d314c100/public)

	Quantized Low Rank Adaptation (QLoRA) finetuned from HuggingFaceTB/SmolLM2-1.7B to UltraChat 200k dataset.

	Serves as an exercise in LLM post-training.

	## Model Details

	- Developed by: Andrew Melbourne
	- Model type: Language Model
	- License: Apache 2.0
	- Finetuned from model: HuggingFaceTB/SmolLM2-1.7B

	### Model Sources

	Training and inference scripts are available here.
	- Repository: [SmolLM2-1.7B-ultrachat_200k on Github](https://github.com/Melbourneandrew/SmolLM2-1.7B-UltraChat_200k)

	## How to Get Started with the Model

	Use the code below to get started with the model.
	```python
	from peft import LoraConfig, get_peft_model, TaskType
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("M3LBY/SmolLM2-1.7B-UltraChat_200k")
	tokenizer = AutoTokenizer.from_pretrained("M3LBY/SmolLM2-1.7B-UltraChat_200k")

	messages = [{"role": "user", "content": "How far away is the sun?"}]
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(prompt, return_tensors="pt")

	outputs = model.generate(**inputs)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Training Details

	The adapter model was trained using Supervised Fine-Tuning (SFT) with the following configuration:

	- Base model: SmolLM2-1.7B
	- Mixed precision: bfloat16
	- Learning rate: 2e-5 with linear scheduler
	- Warmup ratio: 0.1
	- Training epochs: 1
	- Effective batch size: 32
	- Sequence length: 512 tokens
	- Flash Attention 2 enabled

	Trained to a loss of 1.6965 after 6,496 steps.

	Elapsed time: 2 hours 37 minutes.

	Consumed ~22 Colab Compute Units for an estimated cost of $2.21 cents.

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->


	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]

	- PEFT 0.14.0%