ibivibiv
/

athena-120b

Text Generation

text-generation-inference

Model card Files Files and versions Community

athena-120b / README.md

ibivibiv's picture

Update README.md

9a096c8 about 1 year ago

|

history blame contribute delete

2.31 kB

	---
	license: llama2
	language:
	- en
	pipeline_tag: conversational
	tags:
	- merge
	---
	# Athena 120b

	![img](./athena.png)

	An auto-regressive causal LM created by combining 3x finetuned models into one via passthrough merging slices in a stacked order. This is my first early prototype attempt at expanding on existing work and simply attempts to take the creative RP focused model of [Venus-120](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.1) and switch out the [Euryale-1.3-L2-70B](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B) for [WinterGoddess-1.4x-70B-L2](https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2) in an attempt to inherit more task focus and instruct as is mentioned in WinterGoddess. My goals are much more work and task focused so I wanted a model that was creative but less focused on RP and more focused on writing and documentation tasks.

	# Prompting Format

	Both Vicuna and Alpaca will work, but due the final layers belonging primarily to Xwin.




	# Benchmarks
	Coming soon.

	# Acknowledgements
	[@chargoddard](https://huggingface.co/chargoddard) - [mergekit](https://github.com/cg123/mergekit).
	[@migtissera](https://huggingface.co/migtissera) - for [Tess-XL](https://huggingface.co/migtissera/Tess-XL-v1.0) which inspired me to believe that open models can compete on logic tasks with the big commercial models.
	[@alpindale](https://huggingface.co/alpindale) - for [Goliath-120B](https://huggingface.co/alpindale/goliath-120b?text=Hey+my+name+is+Thomas%21+How+are+you%3F) that started this crazy endeavor for us all
	[@nsfwthrowitaway69](https://huggingface.co/nsfwthrowitaway69) - for sharing the merge config for [Venus-120B](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.1) and getting me off the starting block with some questions on mergekit and tokenizers

	Keep it open and keep sharing everyone! With Mixtral and MOE changes to mergekit coupled with these larger merged models? I think the sky is the limit for us all. I can only imagine what will happen if we took a group of these 120 models, fin tuned them each a bit and applied the MOE Mixtral merge method to them? I would also point out that if a clever VC came along and funded that work? You have the people you need right here on huggingface and all they need is the equipment to do it on.