|
--- |
|
license: apache-2.0 |
|
base_model: one-man-army/una-neural-chat-v3-3-P1-OMA |
|
tags: |
|
- alignment-handbook |
|
- generated_from_trainer |
|
datasets: |
|
- allenai/ultrafeedback_binarized_cleaned |
|
model-index: |
|
- name: una-neural-chat-v3-3-P2 |
|
results: [] |
|
--- |
|
|
|
``` |
|
--- . ____ ----- ______ ----- . |
|
___ / \ ..................... ____ / \ |
|
.' '. -- ..:::::''''''''''''''''':::::.. .' '. |
|
--- | ^ ^ | .::::'''' (_ ''''::::. -- | ^ ^ ' |
|
| ^ ^ | .::'' _) ''::. | ^ ^ | -- |
|
____ '...' .::' .-. (_ '::. '...' |
|
.-.!_ .::' _) / \ '::. ! ____ |
|
/ / `-`.:' '-.-' _) ':..""". |
|
-- ' | '.|:' _) .'. (_ ':/' | \ |
|
| | |'. _/^---^\_ | . -- |
|
___ \ . '| \-------../ (_ \ '.' |
|
' : ' _) '.\:::/.' (_ )_ |' || ___ |
|
| | .| _( | | |'| / ' . | |
|
-- | '. | \ '.\ /.' '. | |-- |
|
|'. '| |[ ]| (_ | .' |____ |
|
__ .'\ | .'\ '.^.' \ |. . |
|
.'-.\'. | | _) (:) | ||| | |
|
.' \'..' . _..--'''--.._ (_ /'-._.-'| --- |
|
| `-..'. .-' '-. | .-'. |
|
\ `-. .' .. .. '. .'-._.-' `. |
|
-- ) `-./ '::. .::' \ _.-' / |
|
'._/-.. / '::. .::' \-' .-' |
|
::.`-. '' ':: ::' '' _..-\_.' |
|
::: '._ | \ ' ' / | .-' .:: _____ |
|
____ ::: `-.| ' .----..___..----. ' | .-' ::: |
|
::: \ | _..--. .--.._ | /-' ::: --- |
|
::: _) | ' / | | \ ' | ( ::: |
|
-- ::: ) | _.' '._ | ( )_ :::____ |
|
____ ::: /'. \_.' )\ /( '._/ .'\ (_ ::: |
|
::: .-'| `-->-@ / \ @->--' |-. ::: |
|
::: .-' \ | / \ | / `-. ::: --- |
|
---- '' _.-' | )/ \( | `-. ::: _____ |
|
_.-=--..-' . \ /\ /\ / `-. '' |
|
/.._ `. .-' .\ '-.\.\\. |
|
| `. \ .-' | '. .' | `-. \ |
|
\ _\. `.-' | '-././.\.\.-' | `. | |
|
`.-' | /::::::::::: \ /::::::::`. ,-. / |
|
- | / /LGB ---- '-. .-' ---- `. | \_.' |
|
__ \ | .' _____ '-._._._._.-' ____ | | | |
|
`--' `-. '._ / -- |
|
`...-' |
|
``` |
|
MESS WITH THE BEST, DIE LIKE THE REST |
|
--=- D*D - R****1911 - F***L***T - P***D*X -=-- |
|
THE WORLD NEED US BACK :) |
|
|
|
OMA, OneManArmy presents, `una-neural-chat-v3-3` **PHASE 2**. Powered by UNA (Uniform Neural Alignment), using zephyr trainer, allenai/ultrafeedback cleaned.. and JUST THAT. |
|
Outperforming its base model, not adding any data.. just UNA Algorythm on Transformers Lib. |
|
UNA Settings: |
|
* MLP : 0.05 |
|
* ATT : 0.03 |
|
* LNOR : 0.02 |
|
|
|
# una-neural-chat-v3-3-phase2 |
|
|
|
This model is a fine-tuned version of [Intel/neural-chat-7b-v3-3](https://huggingface.co/Intel/neural-chat-7b-v3-3) on the allenai/ultrafeedback_binarized_cleaned dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.4524 |
|
- Rewards/chosen: -0.7101 |
|
- Rewards/rejected: -2.0953 |
|
- Rewards/accuracies: 0.7831 |
|
- Rewards/margins: 1.3852 |
|
- Logps/rejected: -321.5471 |
|
- Logps/chosen: -327.5048 |
|
- Logits/rejected: -2.6445 |
|
- Logits/chosen: -2.6674 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| |
|
| 0.5431 | 0.2 | 380 | 0.4900 | -0.6823 | -1.6613 | 0.7607 | 0.9790 | -317.2069 | -327.2263 | -2.6478 | -2.6651 | |
|
| 0.4369 | 0.4 | 760 | 0.4783 | -0.7562 | -2.1298 | 0.7719 | 1.3737 | -321.8924 | -327.9652 | -2.7370 | -2.7562 | |
|
| 0.4005 | 0.6 | 1140 | 0.4697 | -0.6913 | -2.0134 | 0.7770 | 1.3221 | -320.7278 | -327.3167 | -2.7067 | -2.7224 | |
|
| 0.3759 | 0.8 | 1520 | 0.4568 | -0.7387 | -2.0643 | 0.7882 | 1.3256 | -321.2370 | -327.7909 | -2.6626 | -2.6829 | |
|
| 0.5213 | 1.0 | 1900 | 0.4524 | -0.7101 | -2.0953 | 0.7831 | 1.3852 | -321.5471 | -327.5048 | -2.6445 | -2.6674 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.0-UNA |
|
- Pytorch 2.1.0 |
|
- Datasets 2.14.6 |
|
- Tokenizers 0.14.1 |
|
|