---
base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---


## 概要

[llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) をベースに、LoRA（Low-Rank Adaptation）を用いたモデルである[Toki-AI/llm-jp-3-13b-finetune-241202](https://huggingface.co/Toki-AI/llm-jp-3-13b-finetune-241202)に対して、DPOを適用したモデルである。

- **Developed by:** Toki-AI  
- **License:** apache-2.0  
- **Finetuned from model:** [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)  

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

## 使用データセット

- [weblab-GENIAC/aya-ja-nemotron-dpo-masked](https://huggingface.co/datasets/weblab-GENIAC/aya-ja-nemotron-dpo-masked)

## 推論環境の準備

以下のようにライブラリをインストールしてください（環境によっては適宜調整）。

```bash
pip install transformers
pip install datasets
pip install accelerate
pip install trl
pip install peft
pip install bitsandbytes   # 4bit量子化利用時など、GPUに応じて追加でインストール
```

`!pip list` などで依存パッケージが正しくインストールされているかを確認しておきます。

```bash
!pip list
```


## 推論方法

### 1. モデルのロード

```python
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # 推論時に使用する最大シーケンス長
dtype = None           # Noneで自動検出 (Tesla T4, V100ならfp16、Ampere+ならbfloat16)
load_in_4bit = True    # メモリ削減のため4bit量子化を使用
HF_TOKEN = "your_token"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Toki-AI/llm-jp-3-13b-finetune-241202",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = HF_TOKEN
)
```

### 2. Hugging Face Hubへのログイン

```python
from huggingface_hub import login
login(HF_TOKEN)  # Hugging Faceのアクセストークンを入力
```

### 3. 推論実行例 (ELYZA-tasks-100-TV)

以下は `elyza-tasks-100-TV_0.jsonl` のタスクに対して推論を行い、回答を取得する例です。

```python
import json
from tqdm import tqdm
from unsloth import FastLanguageModel

# 推論のためにモデルのモードを切り替え
FastLanguageModel.for_inference(model)

# JSONLファイルの読み込み
datasets = []
with open("/content/elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
        line = line.strip()
        item += line
        if item.endswith("}"):
            datasets.append(json.loads(item))
            item = ""

results = []
for dt in tqdm(datasets):
    input_text = dt["input"]
    prompt = f"""### 指示\n{input_text}\n### 回答\n"""

    # トークナイズ & 推論
    inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        use_cache=True,
        do_sample=False,
        repetition_penalty=1.2
    )
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]
    
    results.append({
        "task_id": dt["task_id"],
        "input": input_text,
        "output": prediction
    })

# 結果を確認
for r in results[:3]:
    print(r)
```

本サンプルコードでは `repetition_penalty=1.2` を指定していますが、各種パラメータは用途に応じて変更してください (例: `temperature`, `top_p`, `max_new_tokens` など)。


## ライセンス

本モデルは [Apache License 2.0](./LICENSE) のもとで配布されています。  
ベースモデルである [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) のライセンスおよび利用規約もあわせてご確認ください。


本モデルの利用にあたっては、各種規約を遵守してください。

---