fblgit
/

juanako-7b-v1

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

fblgit commited on Nov 25, 2023

Commit

0c8a4b2

·

1 Parent(s): e16b6d5

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -11,9 +11,9 @@ model-index:
 license: artistic-2.0
 ---
-# juanako-7b-v1
-This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.4594
 - Rewards/chosen: -1.1095
@@ -27,7 +27,7 @@ It achieves the following results on the evaluation set:
 Followed [alignment-handbook](https://github.com/huggingface/alignment-handbook) to perform DPO (Phase 2) over Zephyr-SFT model.
-**Please feel free to run more tests and commit the results. Also if you are interested to participate in [UNA's paper research or GPU sponsorship](mailto:[email protected])**
 Special thanks to [TheBloke](https://huggingface.co/TheBloke) for converting the model into multiple formats and overall his enormous contribution to the community.
 Here are the models:
@@ -263,6 +263,7 @@ hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: No
 | - stem           |N/A    |none  |acc   |0.5217|±  |0.1149|
 ### Citations
 @misc{tunstall2023zephyr,
       title={Zephyr: Direct Distillation of LM Alignment},

 license: artistic-2.0
 ---
+# juanako-7b-v1 (UNA: Uniform Neural Alignment)
+This model uses uniform neural alignment (UNA) for the DPO training phases and is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.4594
 - Rewards/chosen: -1.1095
 Followed [alignment-handbook](https://github.com/huggingface/alignment-handbook) to perform DPO (Phase 2) over Zephyr-SFT model.
+**Please feel free to run more tests and commit the results. Also if you are interested to participate in [UNA's paper research or GPU sponsorship](mailto:[email protected]) to support UNA research, feel free to contact.**
 Special thanks to [TheBloke](https://huggingface.co/TheBloke) for converting the model into multiple formats and overall his enormous contribution to the community.
 Here are the models:
 | - stem           |N/A    |none  |acc   |0.5217|±  |0.1149|
 ### Citations
+Please feel free to raise a PR if there is any missing citation.
 @misc{tunstall2023zephyr,
       title={Zephyr: Direct Distillation of LM Alignment},