Haleshot commited on
Commit
90127cd
·
verified ·
1 Parent(s): f20490d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Haleshot/Mathmate-7B-DELLA
3
+ tags:
4
+ - finetuned
5
+ - orpo
6
+ - math
7
+ - preference-learning
8
+ datasets:
9
+ - argilla/distilabel-math-preference-dpo
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # Mathmate-7B-DELLA-ORPO
14
+
15
+ Mathmate-7B-DELLA-ORPO is a finetuned version of [Haleshot/Mathmate-7B-DELLA](https://huggingface.co/Haleshot/Mathmate-7B-DELLA) using the ORPO (Offline Ranked Preference Optimization) technique. This model has been specifically tuned to improve its performance on mathematical reasoning tasks based on human preferences.
16
+
17
+ ## Model Details
18
+
19
+ - **Base Model:** [Haleshot/Mathmate-7B-DELLA](https://huggingface.co/Haleshot/Mathmate-7B-DELLA)
20
+ - **Finetuning Method:** ORPO (Offline Ranked Preference Optimization)
21
+ - **Training Dataset:** [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
22
+
23
+ ## Finetuning
24
+
25
+ This model was finetuned using the ORPO technique, which is an extension of DPO (Direct Preference Optimization) that can work with ranked preferences instead of just binary ones. The process was adapted from the tutorial ["Fine-tune Llama 3 with ORPO"](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) by Maxime Labonne, with some custom modifications to the code.
26
+
27
+ ## Dataset
28
+
29
+ The model was finetuned on the [argilla/distilabel-math-preference-dpo](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo) dataset. This dataset contains mathematical problems along with multiple solution attempts, ranked by human preference. This allowed the model to learn from human judgments about what constitutes a good mathematical explanation or solution.
30
+
31
+ ## Usage
32
+
33
+ Here's an example of how to use the model:
34
+
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForCausalLM
37
+ import torch
38
+
39
+ model_name = "Haleshot/Mathmate-7B-DELLA-ORPO"
40
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
42
+
43
+ def generate_response(prompt, max_length=512):
44
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
45
+ outputs = model.generate(**inputs, max_length=max_length, num_return_sequences=1, do_sample=True, temperature=0.7)
46
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
47
+
48
+ # Example usage
49
+ prompt = "Solve the following equation: 2x + 5 = 13"
50
+ response = generate_response(prompt)
51
+ print(response)
52
+ ```
53
+
54
+ ## Limitations
55
+
56
+ While this model has been finetuned on mathematical problems, it may still make mistakes or provide incorrect solutions. Always verify the model's output, especially for critical applications or complex mathematical problems.
57
+
58
+ ## References
59
+
60
+ 1. Maxime Labonne. (2024). [Fine-tune Llama 3 with ORPO](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html)
61
+ 2. Argilla. [distilabel-math-preference-dpo dataset](https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo)
62
+ 3. Haleshot. [Mathmate-7B-DELLA](https://huggingface.co/Haleshot/Mathmate-7B-DELLA)
63
+
64
+ ## Citation
65
+
66
+ If you use this model in your research, please cite:
67
+
68
+ ```
69
+ @misc{mathmate-7b-della-orpo,
70
+ author = {Haleshot},
71
+ title = {Mathmate-7B-DELLA-ORPO},
72
+ year = {2024},
73
+ publisher = {HuggingFace},
74
+ journal = {HuggingFace Hub},
75
+ howpublished = {\url{https://huggingface.co/Haleshot/Mathmate-7B-DELLA-ORPO}},
76
+ }
77
+ ```
78
+
79
+ ## Acknowledgements
80
+
81
+ Special thanks to Maxime Labonne for the ORPO finetuning tutorial, and to the Argilla team for providing the dataset used in this finetuning process.