---
base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: cc-by-nc-sa-4.0
language:
- en
---
## LLMでの出力方法
Sample Codeを使用して生成  
同リポジトリ内のLoRa_inference.pyを使用。  

## 開発手順
1. base: model_id = "llm-jp/llm-jp-3-13b"
2. ichikara-instruction-003-001-1.jsonを用いてSFT学習
3. cyberagent/chatbot-arena-ja-calm2-7b-chat-experimentalでDPO学習  
SFT: LoRa.py  
DPO: DPO.py  

---
# Uploaded  model

- **Developed by:** hiro877
- **License:** apache-2.0
- **Finetuned from model :** llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

---
# DPO Trained Model

This repository contains a DPO (Direct Preference Optimization) trained model. The model was fine-tuned using **Ichikara Instruction** for SFT (Supervised Fine-Tuning) and subsequently trained with **cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental** for DPO training. It is optimized for generating high-quality chatbot responses in Japanese.

---

## 🚀 Model Overview

- **Model Name**: DPO Trained Model (1215 version)
- **Base Model**: [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)
- **Training Steps**:
  1. **SFT (Supervised Fine-Tuning)** using [Ichikara Instruction](https://huggingface.co/datasets/ichikara/instruction)
     - **License**: CC-BY-NC-SA 4.0
  2. **DPO Training** using [cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental](https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental)
     - **License**: CC-BY 4.0

---

## 🔧 How to Use

1. Load the model using Hugging Face Transformers:

   ```python
   from transformers import AutoModelForCausalLM, AutoTokenizer

   model = AutoModelForCausalLM.from_pretrained("hiro877/dpo_trained_model_1215")
   tokenizer = AutoTokenizer.from_pretrained("hiro877/dpo_trained_model_1215")
   ```

2. Generate text:

   ```python
   input_text = "こんにちは、今日はどうされましたか？"
   inputs = tokenizer(input_text, return_tensors="pt")
   outputs = model.generate(inputs["input_ids"], max_length=50)
   print(tokenizer.decode(outputs[0], skip_special_tokens=True))
   ```

---

## 📊 Training Details

### 1. **Datasets Used**

1. **Ichikara Instruction**

   - Source: [Ichikara Instruction Dataset](https://huggingface.co/datasets/ichikara/instruction)
   - **Purpose**: Supervised fine-tuning (SFT)
   - **License**: CC-BY-NC-SA 4.0

2. **cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental**

   - Source: [CyberAgent Dataset](https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental)
   - **Purpose**: DPO training
   - **License**: CC-BY 4.0

### 2. **Pre-trained Model**

- **Base Model**: [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)
  - **License**: Apache License 2.0

---

## 🔒 License

This repository is licensed under **CC-BY-NC-SA 4.0**.\
You are free to share and adapt the material under the following terms:

1. **Attribution**: Provide appropriate credit.
2. **NonCommercial**: You may not use the material for commercial purposes.
3. **ShareAlike**: Distribute your contributions under the same license.

For more details, see the full license text here: [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).

---

## ⚙️ Acknowledgements

- Special thanks to [CyberAgent](https://huggingface.co/cyberagent) and [Ichikara](https://huggingface.co/ichikara) for providing high-quality datasets.
- The base model was developed by [llm-jp](https://huggingface.co/llm-jp).

---

## 🔄 Update Log

### [2024-12-15] Initial Release

- Fine-tuned with Ichikara Instruction (SFT)
- Trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental (DPO)