fblgit's picture
Update README.md
c25c343
|
raw
history blame
2.53 kB
metadata
license: apache-2.0
base_model: one-man-army/una-neural-chat-v3-3-P1-OMA
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - allenai/ultrafeedback_binarized_cleaned
model-index:
  - name: una-neural-chat-v3-3-P2
    results: []

OMA, OneManArmy presents, una-neural-chat-v3-3 PHASE 2. Powered by UNA (Uniform Neural Alignment), using zephyr trainer, allenai/ultrafeedback cleaned.. and JUST THAT. Outperforming its base model, not adding any data.. just UNA Algorythm on Transformers Lib. UNA Settings:

  • MLP : 0.05
  • ATT : 0.03
  • LNOR : 0.02

una-neural-chat-v3-3-phase2

This model is a fine-tuned version of Intel/neural-chat-7b-v3-3 on the allenai/ultrafeedback_binarized_cleaned dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4524
  • Rewards/chosen: -0.7101
  • Rewards/rejected: -2.0953
  • Rewards/accuracies: 0.7831
  • Rewards/margins: 1.3852
  • Logps/rejected: -321.5471
  • Logps/chosen: -327.5048
  • Logits/rejected: -2.6445
  • Logits/chosen: -2.6674

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5431 0.2 380 0.4900 -0.6823 -1.6613 0.7607 0.9790 -317.2069 -327.2263 -2.6478 -2.6651
0.4369 0.4 760 0.4783 -0.7562 -2.1298 0.7719 1.3737 -321.8924 -327.9652 -2.7370 -2.7562
0.4005 0.6 1140 0.4697 -0.6913 -2.0134 0.7770 1.3221 -320.7278 -327.3167 -2.7067 -2.7224
0.3759 0.8 1520 0.4568 -0.7387 -2.0643 0.7882 1.3256 -321.2370 -327.7909 -2.6626 -2.6829
0.5213 1.0 1900 0.4524 -0.7101 -2.0953 0.7831 1.3852 -321.5471 -327.5048 -2.6445 -2.6674

Framework versions

  • Transformers 4.35.0-UNA
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1