|
--- |
|
base_model: |
|
- Lambent/threebird-scribe-alpha0.3-7B |
|
- Lambent/bigbird-scribe-7B |
|
- Lambent/aetherbird-scribe-7B |
|
- Lambent/songbird-scribe-7B |
|
- Lambent/codebird-scribe-7B |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
license: apache-2.0 |
|
datasets: |
|
- Lambent/storytellers-32k |
|
- Doctor-Shotgun/no-robots-sharegpt |
|
- BEE-spoke-data/sarcasm-scrolls |
|
- TheSkullery/Aether-Lite-V1.6 |
|
- vishnupriyavr/spotify-million-song-dataset |
|
- TheSkullery/Gryphe-Opus-WritingPrompts-merged |
|
- bigcode/the-stack-smol-xs |
|
- bjoernp/Vezora_Tested-22k-Python-Alpaca-sharegpt-filtered |
|
- thesven/code_bagel_35k |
|
- practical-dreamer/RPGPT_PublicDomain-ShareGPT |
|
- Undi95/Capybara-ShareGPT |
|
--- |
|
# fourbirdstock |
|
|
|
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
|
|
|
## Merge Details |
|
|
|
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |
|
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:| |
|
|eq_bench| 2.1|none | 0|eqbench |↑ | 78.7955|± |1.4668| |
|
| | |none | 0|percent_parseable|↑ |100.0000|± |0.0000| |
|
|
|
|
|
0.3 involved 3 separate tunes stock merged on overlapping datasets for long context writing, multi-turn conversation and RP, with a touch of poetry and code. |
|
From there, each of the four threads was separately task-tuned on 2 datasets each. |
|
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator. |
|
|
|
My understanding of the Model Stock merge method is that it reduces task adaptation to a significant degree, but also significantly limits forgetting caused by training. |
|
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability. |
|
|
|
This model's refusals are ... not nonexistent, but certainly don't rely on them. |
|
To my knowledge it has no particular refusal behavior for simply NSFW content, but I haven't exactly exhaustively tested which OSHA violations it will aid and abet. |
|
|
|
### Merge Method |
|
|
|
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Lambent/threebird-scribe-alpha0.3-7B](https://huggingface.co/Lambent/threebird-scribe-alpha0.3-7B) as a base. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* [Lambent/bigbird-scribe-7B](https://huggingface.co/Lambent/bigbird-scribe-7B) |
|
* [Lambent/aetherbird-scribe-7B](https://huggingface.co/Lambent/aetherbird-scribe-7B) |
|
* [Lambent/songbird-scribe-7B](https://huggingface.co/Lambent/songbird-scribe-7B) |
|
* [Lambent/codebird-scribe-7B](https://huggingface.co/Lambent/codebird-scribe-7B) |
|
|
|
### Configuration |
|
|
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
models: |
|
- model: Lambent/codebird-scribe-7B |
|
- model: Lambent/songbird-scribe-7B |
|
- model: Lambent/aetherbird-scribe-7B |
|
- model: Lambent/bigbird-scribe-7B |
|
base_model: Lambent/threebird-scribe-alpha0.3-7B |
|
merge_method: model_stock |
|
parameters: |
|
filter_wise: false |
|
tokenizer_source: union |
|
dtype: float16 |
|
|
|
|
|
``` |