Mixtral-8x7B--v0.1: Model 3

Model Description

This model is the 3rd extracted standalone model from the mistralai/Mixtral-8x7B-v0.1, using the Mixtral Model Expert Extractor tool I made. It is constructed by selecting the first expert from each Mixture of Experts (MoE) layer. The extraction of this model is experimental. It is expected to be worse than Mistral-7B.

Model Architecture

The architecture of this model includes:

  • Multi-head attention layers derived from the base Mixtral model.
  • The first expert from each MoE layer, intended to provide a balanced approach to language understanding and generation tasks.
  • Additional layers and components as required to ensure the model's functionality outside the MoE framework.

Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "DrNicefellow/Mistral-3-from-Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

text = "Today is a pleasant"
input_ids = tokenizer.encode(text, return_tensors='pt')
output = model.generate(input_ids)

print(tokenizer.decode(output[0], skip_special_tokens=True))

License

This model is available under the Apache 2.0 License.

Discord Server

Join our Discord server here.

License

This model is open-sourced under the Apache 2.0 License. See the LICENSE file for more details.

Downloads last month
21
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including DrNicefellow/Mistral-3-from-Mixtral-8x7B-v0.1