vhab10's picture
Update README.md
e058c33 verified
|
raw
history blame
1.61 kB
metadata
language: en
tags:
  - llama
  - text-generation
  - model-merging
license: mit
base_model:
  - meta-llama/Meta-Llama-3-8B
library_name: transformers

llama-3-8b-merged-linear

Overview

This model represents a linear merge of three distinct Llama 3-8b models using the Mergekit tool. The primary goal of this merge is to leverage the unique strengths of each base model, such as multilingual capabilities and specialized domain knowledge, into a more versatile and generalized language model.

By merging these models linearly, we combine their expertise into a unified model that performs well across various tasks, such as text generation, multilingual understanding, and domain-specific tasks.

Model Details

Model Description

  • Models Used:

    • Danielbrdz/Barcenas-Llama3-8b-ORPO
    • DeepMount00/Llama-3-8b-Ita
    • lightblue/suzume-llama-3-8B-multilingual
  • Merging Tool: Mergekit

  • Merge Method: Linear merge with equal weighting (1.0) for all models

  • Tokenizer Source: Union

  • Data Type: float16 (FP16) precision

  • License: MIT License

  • Languages Supported: Multilingual, including English, Italian, and potentially others from the multilingual base models

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Danielbrdz/Barcenas-Llama3-8b-ORPO
    parameters:
      weight: 1.0
  - model: DeepMount00/Llama-3-8b-Ita
    parameters:
      weight: 1.0
  - model: lightblue/suzume-llama-3-8B-multilingual
    parameters:
      weight: 1.0
merge_method: linear
tokenizer_source: union
dtype: float16