YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
<!-- 
README.md for Hugging Face model card
Author: zcsn
Description: A llama-based model trained with Unsloth and TRL. 
Comments have been added inline in HTML comments to describe various sections.
-->

---
base_model: llm-jp/llm-jp-3-13b
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
license: apache-2.0
language:
  - en
---

# Uploaded Model

<!-- 
モデルの基本情報や開発者、ライセンスなどを明記するセクション 
-->
- **Developed by:** zcsn  
- **License:** apache-2.0  
- **Finetuned from model:** [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)

---

## Overview

<!-- 
モデルの目的や特徴などをざっくり説明するセクション
-->
- 本モデルは [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) をベースに、UnslothやHugging Faceの[TRL](https://github.com/lvwerra/trl)ライブラリを用いて効率的に学習を行いました。  
- **RAG (Retrieval-Augmented Generation)** を使用し、質問に対して最も近いQAを類似度検索で取得し、Few-shot形式で与える仕組みになっています。  
- 学習データとして、300問のQ&Aデータ(`elyza/ELYZA-tasks-100`を元に手作業で作成)を使用しています。

---

## Usage

<!-- 
モデルの使用方法の概要を示すセクション 
-->
1. **Requirements**  
   - Python環境(例: Google Colab, ローカルPC など)  
   - Hugging Faceのアクセストークン (`HF_TOKEN`)

2. **Installation**  
   以下コマンドで必要なライブラリをインストールできます:
   ```bash
   !pip install --upgrade --no-cache-dir "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
   !pip install transformers
   !pip uninstall unsloth_zoo -y
   !pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth-zoo.git
   !pip install -qU langchain-community faiss-gpu
   !pip install -qU langchain-openai
   !pip install langchain
   !pip install tiktoken
   !pip install faiss-gpu
  1. Model Loading

    from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
    from unsloth import FastLanguageModel
    import torch
    
    max_seq_length = 512
    dtype = None
    load_in_4bit = True
    
    model_id = "llm-jp/llm-jp-3-13b"
    new_model_id = "llm-jp-3-13b-finetune-9"
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=model_id,
        dtype=dtype,
        load_in_4bit=load_in_4bit,
        trust_remote_code=True,
    )
    
  2. Fine-tuning (SFT)

    model = FastLanguageModel.get_peft_model(
        model,
        r = 64,
        target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"],
        lora_alpha = 32,
        lora_dropout = 0,
        bias = "none",
        use_gradient_checkpointing = "unsloth",
        random_state = 3407,
        use_rslora = False,
        loftq_config = None,
        max_seq_length = max_seq_length,
    )
    
    HF_TOKEN = ""  # Hugging Face TOKEN を入れる
    
    from datasets import load_dataset
    dataset = load_dataset("json", data_files="noanswer.json")
    
    prompt = """### 指示
    {}
    ### 回答
    {}"""
    
    EOS_TOKEN = tokenizer.eos_token
    
    def formatting_prompts_func(examples):
        input = examples["text"]
        output = examples["output"]
        text = prompt.format(input, output) + EOS_TOKEN
        return {"formatted_text": text}
    
    dataset = dataset.map(
        formatting_prompts_func,
        num_proc=4,
    )
    
    from trl import SFTTrainer
    from transformers import TrainingArguments
    from unsloth import is_bfloat16_supported
    
    trainer = SFTTrainer(
        model = model,
        tokenizer = tokenizer,
        train_dataset=dataset["train"],
        max_seq_length = max_seq_length,
        dataset_text_field="formatted_text",
        packing = False,
        args = TrainingArguments(
            per_device_train_batch_size = 1,
            gradient_accumulation_steps = 8,
            num_train_epochs = 3,
            logging_steps = 1,
            warmup_steps = 10,
            save_steps=50,
            save_total_limit=2,
            max_steps=-1,
            learning_rate = 2e-4,
            fp16 = not is_bfloat16_supported(),
            bf16 = is_bfloat16_supported(),
            group_by_length=True,
            seed = 3407,
            output_dir = "outputs",
            report_to = "none",
        ),
    )
    
    trainer_stats = trainer.train()
    

Inference (RAG-based Workflow)

下記の手順を踏むと、Hugging Face上のモデルを用いて入力データ (elyza-tasks-100-TV_0.jsonl) を推論し、その結果を {new_model_id}_rag_output.jsonl というファイルに出力できます。

import json
import os
import time
from tqdm import tqdm
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.docstore.document import Document
from transformers import AutoTokenizer, AutoModelForCausalLM
from unsloth import FastLanguageModel

# elyza-tasks-100-TV_0.jsonl を読み込み
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r", encoding="utf-8") as f:
    item = ""
    for line in f:
        line = line.strip()
        item += line
        if item.endswith("}"):
            datasets.append(json.loads(item))
            item = ""

# RAG用にFAISSインデックスを構築
with open("noanswer.json", "r", encoding="utf-8") as f:
    nhk_data = json.load(f)

documents = []
for item in nhk_data:
    doc = Document(
        page_content=item["text"],
        metadata={"ID": item["ID"], "output": item["output"]}
    )
    documents.append(doc)

embeddings = OpenAIEmbeddings(
    openai_api_key="YOUR_OPENAI_API_KEY",  # Azure経由の場合はAzureOpenAIEmbeddingsに置き換えてください
    chunk_size=1
)
db = FAISS.from_documents(documents, embeddings)

# モデルのロード(PEFT適用済みモデルをロード)
FastLanguageModel.for_inference(model)

results = []
fewshotresults = []

for dt in tqdm(datasets):
    input_query = dt["input"]

    # 類似度が高い文書をFAISSで検索
    similar_docs = db.similarity_search(input_query, k=2)
    if len(similar_docs) > 0:
        fewshot_text_1 = similar_docs[0].page_content
        fewshot_output_1 = similar_docs[0].metadata.get("output", "出力例がありません")
    else:
        fewshot_text_1 = "該当する例が見つかりませんでした。"
        fewshot_output_1 = "申し訳ありませんが、回答が見つかりません。"

    if len(similar_docs) > 1:
        fewshot_text_2 = similar_docs[1].page_content
        fewshot_output_2 = similar_docs[1].metadata.get("output", "出力例がありません")
    else:
        fewshot_text_2 = "該当する例が見つかりませんでした。"
        fewshot_output_2 = "申し訳ありませんが、回答が見つかりません。"

    # プロンプトをテンプレートに沿って作成
    prompt = f\"\"\"### あなたは指示に対して正確に回答するヘルプデスクの担当者です。
指示に従って回答してください
Let’s think step by step
また回答方法は以下の===で囲まれた例を参考にしてください。

### 例
===
### 指示
{fewshot_text_1}
### 回答
{fewshot_output_1}
### 指示
{fewshot_text_2}
### 回答
{fewshot_output_2}
===

### 指示
{input_query}
### 回答
\"\"\"

    inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs, 
        max_new_tokens=512, 
        use_cache=True, 
        do_sample=False,
        repetition_penalty=1.2
    )
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('### 回答')[-1].strip()

    results.append({"task_id": dt["task_id"], "input": input_query, "output": prediction})
    fewshot_str = f\"{input_query}{fewshot_text_1}{fewshot_text_2}\"
    fewshotresults.append({
        "task_id": dt["task_id"],
        "fewshotresults": fewshot_str
    })

# 出力を確認
print(fewshotresults)

# 推論結果をJSONLで保存
with open(f\"{new_model_id}_rag_output.jsonl\", 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)
        f.write('\\n')

Push to Hugging Face Hub

model.push_to_hub_merged(
    new_model_id,
    tokenizer=tokenizer,
    save_method="lora",
    token=HF_TOKEN,
    private=True
)

model.push_to_hub(new_model_id, token=HF_TOKEN, private=True)
tokenizer.push_to_hub(new_model_id, token=HF_TOKEN)

© 2024 zcsn. Released under the Apache-2.0 License.
```

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .