Text Generation
Transformers
Safetensors
mistral
alignment-handbook
Generated from Trainer
text-generation-inference
fblgit commited on
Commit
0c8a4b2
·
1 Parent(s): e16b6d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -11,9 +11,9 @@ model-index:
11
  license: artistic-2.0
12
  ---
13
 
14
- # juanako-7b-v1
15
 
16
- This model is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 0.4594
19
  - Rewards/chosen: -1.1095
@@ -27,7 +27,7 @@ It achieves the following results on the evaluation set:
27
 
28
  Followed [alignment-handbook](https://github.com/huggingface/alignment-handbook) to perform DPO (Phase 2) over Zephyr-SFT model.
29
 
30
- **Please feel free to run more tests and commit the results. Also if you are interested to participate in [UNA's paper research or GPU sponsorship](mailto:[email protected])**
31
 
32
  Special thanks to [TheBloke](https://huggingface.co/TheBloke) for converting the model into multiple formats and overall his enormous contribution to the community.
33
  Here are the models:
@@ -263,6 +263,7 @@ hf (pretrained=fblgit/juanako-7b-v1,load_in_4bit=False,dtype=float16), limit: No
263
  | - stem |N/A |none |acc |0.5217|± |0.1149|
264
 
265
  ### Citations
 
266
 
267
  @misc{tunstall2023zephyr,
268
  title={Zephyr: Direct Distillation of LM Alignment},
 
11
  license: artistic-2.0
12
  ---
13
 
14
+ # juanako-7b-v1 (UNA: Uniform Neural Alignment)
15
 
16
+ This model uses uniform neural alignment (UNA) for the DPO training phases and is a fine-tuned version of [fblgit/zephyr-lora-dpo-b1](https://huggingface.co/fblgit/zephyr-lora-dpo-b1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
17
  It achieves the following results on the evaluation set:
18
  - Loss: 0.4594
19
  - Rewards/chosen: -1.1095
 
27
 
28
  Followed [alignment-handbook](https://github.com/huggingface/alignment-handbook) to perform DPO (Phase 2) over Zephyr-SFT model.
29
 
30
+ **Please feel free to run more tests and commit the results. Also if you are interested to participate in [UNA's paper research or GPU sponsorship](mailto:[email protected]) to support UNA research, feel free to contact.**
31
 
32
  Special thanks to [TheBloke](https://huggingface.co/TheBloke) for converting the model into multiple formats and overall his enormous contribution to the community.
33
  Here are the models:
 
263
  | - stem |N/A |none |acc |0.5217|± |0.1149|
264
 
265
  ### Citations
266
+ Please feel free to raise a PR if there is any missing citation.
267
 
268
  @misc{tunstall2023zephyr,
269
  title={Zephyr: Direct Distillation of LM Alignment},