leafspark
/

IridiumLlama-72B-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

IridiumLlama-72B-v0.1 / README.md

leafspark's picture

docs: add mradermacher's GGUFs

9e3c635 verified 7 months ago

|

history blame contribute delete

1.77 kB

metadata

license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
language:
  - en
  - zh
library_name: transformers
tags:
  - mergekit
  - llama

IridiumLlama-72B-v0.1

Model Description

IridiumLlama is a 72B parameter language model created through a merge of Qwen2-72B-Instruct, calme2.1-72b, and magnum-72b-v1 using model_stock.

This is converted from leafspark/Iridium-72B-v0.1 (currently private)

Features

72 billion parameters
Sharded in 31 files (unlike Iridium, which has 963 shards due to the merging process)
Combines Magnum prose with Calam smarts
Llamaified for easy use

Technical Specifications

Architecture

LlamaForCasualLM
Models: Qwen2-72B-Instruct (base), calme2.1-72b, magnum-72b-v1
Merged layers: 80
Total tensors: 1,043
Context length: 32k

Tensor Distribution

Attention layers: 560 files
MLP layers: 240 files
Layer norms: 160 files
Miscellaneous (embeddings, output): 162 files

Merging

Custom script utilizing safetensors library.

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("leafspark/IridiumLlama-72B-v0.1", 
                                             device_map="auto", 
                                             torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("leafspark/IridiumLlama-72B-v0.1")

GGUFs

Find them here: mradermacher/IridiumLlama-72B-v0.1-GGUF

Hardware Requirements

At least ~150GB of free space
~150GB VRAM