|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
pipeline_tag: conversational |
|
tags: |
|
- merge |
|
--- |
|
# Athena 120b |
|
|
|
 |
|
|
|
An auto-regressive causal LM created by combining 3x finetuned models into one via passthrough merging slices in a stacked order. This is my first early prototype attempt at expanding on existing work and simply attempts to take the creative RP focused model of [Venus-120](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.1) and switch out the [Euryale-1.3-L2-70B](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B) for [WinterGoddess-1.4x-70B-L2](https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2) in an attempt to inherit more task focus and instruct as is mentioned in WinterGoddess. My goals are much more work and task focused so I wanted a model that was creative but less focused on RP and more focused on writing and documentation tasks. |
|
|
|
# Prompting Format |
|
|
|
Both Vicuna and Alpaca will work, but due the final layers belonging primarily to Xwin. |
|
|
|
|
|
|
|
|
|
# Benchmarks |
|
Coming soon. |
|
|
|
# Acknowledgements |
|
[@chargoddard](https://huggingface.co/chargoddard) - [mergekit](https://github.com/cg123/mergekit). |
|
[@migtissera](https://huggingface.co/migtissera) - for [Tess-XL](https://huggingface.co/migtissera/Tess-XL-v1.0) which inspired me to believe that open models can compete on logic tasks with the big commercial models. |
|
[@alpindale](https://huggingface.co/alpindale) - for [Goliath-120B](https://huggingface.co/alpindale/goliath-120b?text=Hey+my+name+is+Thomas%21+How+are+you%3F) that started this crazy endeavor for us all |
|
[@nsfwthrowitaway69](https://huggingface.co/nsfwthrowitaway69) - for sharing the merge config for [Venus-120B](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.1) and getting me off the starting block with some questions on mergekit and tokenizers |
|
|
|
Keep it open and keep sharing everyone! With Mixtral and MOE changes to mergekit coupled with these larger merged models? I think the sky is the limit for us all. I can only imagine what will happen if we took a group of these 120 models, fin tuned them each a bit and applied the MOE Mixtral merge method to them? I would also point out that if a clever VC came along and funded that work? You have the people you need right here on huggingface and all they need is the equipment to do it on. |