holy0516's picture
Update README.md
6c3ecdb verified
metadata
base_model: llm-jp/llm-jp-3-13b
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
license: apache-2.0
language:
  - en

Uploaded model

  • Developed by: holy0516
  • License: apache-2.0
  • Finetuned from model : llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

出力までの流れ

    1. 必要なライブラリのインストール・インポート
    2. ベースモデルの読み込みとLoRAアダプタの指定
    3. Hugging Face Tokenの指定
    4. 元モデルのロード
    5. LoRAアダプタの結合
    6. タスクの読み込み
    7. 推論
    8. 出力

コード

1. 必要なライブラリのインストール・インポート

!pip uninstall unsloth -y !pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --upgrade torch !pip install --upgrade xformers

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from unsloth import FastLanguageModel import torch from tqdm import tqdm import json

2. ベースモデルの読み込みとLoRAアダプタの指定

model_id = "llm-jp/llm-jp-3-13b" adapter_id = "holy0516/llm-jp-3-13b-it-r1_elyza100-r4"

3. Hugging Face Tokenの指定

(略)

4. 元モデルのロード

dtype = None # Noneにしておけば自動で設定 load_in_4bit = True # 今回は13Bモデルを扱うためTrue

model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_id, dtype=dtype, load_in_4bit=load_in_4bit, trust_remote_code=True, )

5. LoRAのアダプタの統合

model = PeftModel.from_pretrained(model, adapter_id, token = HF_TOKEN)

6. タスクの読み込み

datasets = [] with open("/content/elyza-tasks-100-TV_0.jsonl", "r") as f: item = "" for line in f: line = line.strip() item += line if item.endswith("}"): datasets.append(json.loads(item)) item = ""

7. 推論

FastLanguageModel.for_inference(model) results = [] for dt in tqdm(datasets): input = dt["input"]

prompt = f"""### 指示\n{input}\n### 回答\n"""

inputs = tokenizer([prompt], return_tensors = "pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens = 512, use_cache = True, do_sample=False, repetition_penalty=1.2) prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]

results.append({"task_id": dt["task_id"], "input": input, "output": prediction})

8. 出力結果の保存

with open(f"output.jsonl", 'w', encoding='utf-8') as f: for result in results: json.dump(result, f, ensure_ascii=False) f.write('\n')