Meta-Llama-3-70bのセルフマージにより120Bにパラメーター数を拡大したモデルの高性能化が報告されています
今回高品質な日本語LLMである、karakuri-ai/karakuri-lm-8x7b-chat-v0.1の精度を更に高めるために、"num_hidden_layers": 32、から、56への自己拡張マージを行いました。
マージに利用したスライスのインターバルから本モデル(Ex-karakuri-8x12B-chat-v2)が非マージ部分4層、Ex-karakuri-8x12B-chat-v1は8層に設定しています
It was inspired by large merges like:
- Meta-Llama-3-120B-Instruct
- alpindale/goliath-120b
- nsfwthrowitaway69/Venus-120b-v1.0
- cognitivecomputations/MegaDolphin-120b
- wolfram/miquliz-120b-v2.0.
slices:
- sources:
- layer_range: [0, 4]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [2, 6]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [4, 8]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [6, 10]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [8, 12]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [10, 14]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [12, 16]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [14, 18]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [16, 20]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [18, 22]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [20, 24]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [22, 26]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [24, 28]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [26, 30]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
- sources:
- layer_range: [28, 32]
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1
merge_method: passthrough
dtype: bfloat16
- Downloads last month
- 358
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.