|
--- |
|
language: en |
|
tags: |
|
- llama |
|
- text-generation |
|
- model-merging |
|
license: mit |
|
base_model: |
|
- meta-llama/Meta-Llama-3-8B |
|
library_name: transformers |
|
--- |
|
|
|
# llama-3-8b-merged-linear |
|
|
|
## Overview |
|
This model represents a linear merge of three distinct Llama 3-8b models using the Mergekit tool. The primary goal of this merge is to leverage the unique strengths of each base model, such as multilingual capabilities and specialized domain knowledge, into a more versatile and generalized language model. |
|
|
|
By merging these models linearly, we combine their expertise into a unified model that performs well across various tasks, such as text generation, multilingual understanding, and domain-specific tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Models Used**: |
|
- Danielbrdz/Barcenas-Llama3-8b-ORPO |
|
- DeepMount00/Llama-3-8b-Ita |
|
- lightblue/suzume-llama-3-8B-multilingual |
|
|
|
- **Merging Tool**: Mergekit |
|
- **Merge Method**: Linear merge with equal weighting (1.0) for all models |
|
- **Tokenizer Source**: Union |
|
- **Data Type**: float16 (FP16) precision |
|
- **License**: MIT License |
|
- **Languages Supported**: Multilingual, including English, Italian, and potentially others from the multilingual base models |
|
|
|
## Configuration |
|
The following YAML configuration was used to produce this model: |
|
|
|
```yaml |
|
models: |
|
- model: Danielbrdz/Barcenas-Llama3-8b-ORPO |
|
parameters: |
|
weight: 1.0 |
|
- model: DeepMount00/Llama-3-8b-Ita |
|
parameters: |
|
weight: 1.0 |
|
- model: lightblue/suzume-llama-3-8B-multilingual |
|
parameters: |
|
weight: 1.0 |
|
merge_method: linear |
|
tokenizer_source: union |
|
dtype: float16 |