File size: 1,510 Bytes
32dec96 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: apache-2.0
language:
- fr
- it
- de
- es
- en
tags:
- moe
- mixtral
- sharegpt
- axolotl
library_name: transformers
base_model: v2ray/Mixtral-8x22B-v0.2
inference: false
model_creator: MaziyarPanahi
model_name: Goku-8x22B-v0.2
pipeline_tag: text-generation
quantized_by: MaziyarPanahi
datasets:
- microsoft/orca-math-word-problems-200k
- teknium/OpenHermes-2.5
---
<img src="./Goku-8x22b-v0.1.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Goku-8x22B-v0.2 (Goku 141b-A35b)
A fine-tuned version of [v2ray/Mixtral-8x22B-v0.2](https://huggingface.co/v2ray/Mixtral-8x22B-v0.2) model on the following datasets:
- teknium/OpenHermes-2.5
- WizardLM/WizardLM_evol_instruct_V2_196k
- microsoft/orca-math-word-problems-200k
This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an `8192 sequence length`. This results in the model being able to generate longer and more coherent responses.
## How to use it
**Use a pipeline as a high-level helper:**
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2")
```
**Load model directly:**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
```
|