YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
<!--
README.md for Hugging Face model card
Author: zcsn
Description: A llama-based model trained with Unsloth and TRL.
Comments have been added inline in HTML comments to describe various sections.
-->
---
base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---
# Uploaded Model
<!--
モデルの基本情報や開発者、ライセンスなどを明記するセクション
-->
- **Developed by:** zcsn
- **License:** apache-2.0
- **Finetuned from model:** [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)
---
## Overview
<!--
モデルの目的や特徴などをざっくり説明するセクション
-->
- 本モデルは [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) をベースに、UnslothやHugging Faceの[TRL](https://github.com/lvwerra/trl)ライブラリを用いて効率的に学習を行いました。
- **RAG (Retrieval-Augmented Generation)** を使用し、質問に対して最も近いQAを類似度検索で取得し、Few-shot形式で与える仕組みになっています。
- 学習データとして、300問のQ&Aデータ(`elyza/ELYZA-tasks-100`を元に手作業で作成)を使用しています。
---
## Usage
<!--
モデルの使用方法の概要を示すセクション
-->
1. **Requirements**
- Python環境(例: Google Colab, ローカルPC など)
- Hugging Faceのアクセストークン (`HF_TOKEN`)
2. **Installation**
以下コマンドで必要なライブラリをインストールできます:
```bash
!pip install --upgrade --no-cache-dir "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
!pip install transformers
!pip uninstall unsloth_zoo -y
!pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth-zoo.git
!pip install -qU langchain-community faiss-gpu
!pip install -qU langchain-openai
!pip install langchain
!pip install tiktoken
!pip install faiss-gpu
Model Loading
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from unsloth import FastLanguageModel import torch max_seq_length = 512 dtype = None load_in_4bit = True model_id = "llm-jp/llm-jp-3-13b" new_model_id = "llm-jp-3-13b-finetune-9" model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_id, dtype=dtype, load_in_4bit=load_in_4bit, trust_remote_code=True, )
Fine-tuning (SFT)
model = FastLanguageModel.get_peft_model( model, r = 64, target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"], lora_alpha = 32, lora_dropout = 0, bias = "none", use_gradient_checkpointing = "unsloth", random_state = 3407, use_rslora = False, loftq_config = None, max_seq_length = max_seq_length, ) HF_TOKEN = "" # Hugging Face TOKEN を入れる from datasets import load_dataset dataset = load_dataset("json", data_files="noanswer.json") prompt = """### 指示 {} ### 回答 {}""" EOS_TOKEN = tokenizer.eos_token def formatting_prompts_func(examples): input = examples["text"] output = examples["output"] text = prompt.format(input, output) + EOS_TOKEN return {"formatted_text": text} dataset = dataset.map( formatting_prompts_func, num_proc=4, ) from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset=dataset["train"], max_seq_length = max_seq_length, dataset_text_field="formatted_text", packing = False, args = TrainingArguments( per_device_train_batch_size = 1, gradient_accumulation_steps = 8, num_train_epochs = 3, logging_steps = 1, warmup_steps = 10, save_steps=50, save_total_limit=2, max_steps=-1, learning_rate = 2e-4, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), group_by_length=True, seed = 3407, output_dir = "outputs", report_to = "none", ), ) trainer_stats = trainer.train()
Inference (RAG-based Workflow)
下記の手順を踏むと、Hugging Face上のモデルを用いて入力データ (elyza-tasks-100-TV_0.jsonl
) を推論し、その結果を {new_model_id}_rag_output.jsonl
というファイルに出力できます。
import json
import os
import time
from tqdm import tqdm
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.docstore.document import Document
from transformers import AutoTokenizer, AutoModelForCausalLM
from unsloth import FastLanguageModel
# elyza-tasks-100-TV_0.jsonl を読み込み
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r", encoding="utf-8") as f:
item = ""
for line in f:
line = line.strip()
item += line
if item.endswith("}"):
datasets.append(json.loads(item))
item = ""
# RAG用にFAISSインデックスを構築
with open("noanswer.json", "r", encoding="utf-8") as f:
nhk_data = json.load(f)
documents = []
for item in nhk_data:
doc = Document(
page_content=item["text"],
metadata={"ID": item["ID"], "output": item["output"]}
)
documents.append(doc)
embeddings = OpenAIEmbeddings(
openai_api_key="YOUR_OPENAI_API_KEY", # Azure経由の場合はAzureOpenAIEmbeddingsに置き換えてください
chunk_size=1
)
db = FAISS.from_documents(documents, embeddings)
# モデルのロード(PEFT適用済みモデルをロード)
FastLanguageModel.for_inference(model)
results = []
fewshotresults = []
for dt in tqdm(datasets):
input_query = dt["input"]
# 類似度が高い文書をFAISSで検索
similar_docs = db.similarity_search(input_query, k=2)
if len(similar_docs) > 0:
fewshot_text_1 = similar_docs[0].page_content
fewshot_output_1 = similar_docs[0].metadata.get("output", "出力例がありません")
else:
fewshot_text_1 = "該当する例が見つかりませんでした。"
fewshot_output_1 = "申し訳ありませんが、回答が見つかりません。"
if len(similar_docs) > 1:
fewshot_text_2 = similar_docs[1].page_content
fewshot_output_2 = similar_docs[1].metadata.get("output", "出力例がありません")
else:
fewshot_text_2 = "該当する例が見つかりませんでした。"
fewshot_output_2 = "申し訳ありませんが、回答が見つかりません。"
# プロンプトをテンプレートに沿って作成
prompt = f\"\"\"### あなたは指示に対して正確に回答するヘルプデスクの担当者です。
指示に従って回答してください
Let’s think step by step
また回答方法は以下の===で囲まれた例を参考にしてください。
### 例
===
### 指示
{fewshot_text_1}
### 回答
{fewshot_output_1}
### 指示
{fewshot_text_2}
### 回答
{fewshot_output_2}
===
### 指示
{input_query}
### 回答
\"\"\"
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
use_cache=True,
do_sample=False,
repetition_penalty=1.2
)
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('### 回答')[-1].strip()
results.append({"task_id": dt["task_id"], "input": input_query, "output": prediction})
fewshot_str = f\"{input_query}{fewshot_text_1}{fewshot_text_2}\"
fewshotresults.append({
"task_id": dt["task_id"],
"fewshotresults": fewshot_str
})
# 出力を確認
print(fewshotresults)
# 推論結果をJSONLで保存
with open(f\"{new_model_id}_rag_output.jsonl\", 'w', encoding='utf-8') as f:
for result in results:
json.dump(result, f, ensure_ascii=False)
f.write('\\n')
Push to Hugging Face Hub
model.push_to_hub_merged(
new_model_id,
tokenizer=tokenizer,
save_method="lora",
token=HF_TOKEN,
private=True
)
model.push_to_hub(new_model_id, token=HF_TOKEN, private=True)
tokenizer.push_to_hub(new_model_id, token=HF_TOKEN)
© 2024 zcsn. Released under the Apache-2.0 License.
```