MathGenie
/

Mistral-7B-Ours-SFT-SCDPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

luzimu commited on Jul 4

Commit

39d5582

•

1 Parent(s): a1a8d9b

modify readme

Files changed (2) hide show

README.md +15 -14
eval.png +0 -0

README.md CHANGED Viewed

@@ -1,22 +1,23 @@
 ---
-base_model: /mnt/cache/luzimu/rlhf_math/alignment-handbook/outs/Mistral-7B-v0.1-lce
 tags:
-- alignment-handbook
-- generated_from_trainer
-datasets:
-- /mnt/cache/luzimu/rlhf_math/data/controled_steps_math_gsm8k_lce_dpo_ascend_lim2_lim3_add_dpo1x1
 model-index:
-- name: Mistral-7B-v0.1-lce_controled_steps_dpo_ascend_lim2_lim3_add_dpo1x1
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Mistral-7B-v0.1-lce_controled_steps_dpo_ascend_lim2_lim3_add_dpo1x1
-This model is a fine-tuned version of [/mnt/cache/luzimu/rlhf_math/alignment-handbook/outs/Mistral-7B-v0.1-lce](https://huggingface.co//mnt/cache/luzimu/rlhf_math/alignment-handbook/outs/Mistral-7B-v0.1-lce) on the /mnt/cache/luzimu/rlhf_math/data/controled_steps_math_gsm8k_lce_dpo_ascend_lim2_lim3_add_dpo1x1 dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.1793
 - Rewards/chosen: 0.2587
 - Rewards/rejected: -7.0301
@@ -29,15 +30,15 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 ---
+base_model: MathGenie/Mistral-7B-Ours-SFT
 tags:
+- math
 model-index:
+- name: Mistral-7B-Ours-SFT-SCDPO
   results: []
+license: apache-2.0
+language:
+- en
+metrics:
+- accuracy
+pipeline_tag: text-generation
 ---
+# Mistral-7B-Ours-SFT-SCDPO
+This model is a fine-tuned version of MathGenie/Mistral-7B-Ours-SFT.
 It achieves the following results on the evaluation set:
 - Loss: 0.1793
 - Rewards/chosen: 0.2587
 - Rewards/rejected: -7.0301
 ## Model description
+This is a model fine-tuned for mathematical problem-solving.
 ## Intended uses & limitations
+The model is intended for solving math problems.
 ## Training and evaluation data
+![eval](../Mistral-7B-Ours-SFT/eval.png)
 ## Training procedure

eval.png ADDED Viewed