--- base_model: llm-jp/llm-jp-3-13b tags: - text-generation-inference - transformers - unsloth - llama - trl license: cc-by-nc-sa-4.0 language: - en --- ## LLMでの出力方法 Sample Codeを使用して生成 同リポジトリ内のLoRa_inference.pyを使用。 ## 開発手順 1. base: model_id = "llm-jp/llm-jp-3-13b" 2. ichikara-instruction-003-001-1.jsonを用いてSFT学習 3. cyberagent/chatbot-arena-ja-calm2-7b-chat-experimentalでDPO学習 SFT: LoRa.py DPO: DPO.py --- # Uploaded model - **Developed by:** hiro877 - **License:** apache-2.0 - **Finetuned from model :** llm-jp/llm-jp-3-13b This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) --- # DPO Trained Model This repository contains a DPO (Direct Preference Optimization) trained model. The model was fine-tuned using **Ichikara Instruction** for SFT (Supervised Fine-Tuning) and subsequently trained with **cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental** for DPO training. It is optimized for generating high-quality chatbot responses in Japanese. --- ## 🚀 Model Overview - **Model Name**: DPO Trained Model (1215 version) - **Base Model**: [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) - **Training Steps**: 1. **SFT (Supervised Fine-Tuning)** using [Ichikara Instruction](https://huggingface.co/datasets/ichikara/instruction) - **License**: CC-BY-NC-SA 4.0 2. **DPO Training** using [cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental](https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental) - **License**: CC-BY 4.0 --- ## 🔧 How to Use 1. Load the model using Hugging Face Transformers: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("hiro877/dpo_trained_model_1215") tokenizer = AutoTokenizer.from_pretrained("hiro877/dpo_trained_model_1215") ``` 2. Generate text: ```python input_text = "こんにちは、今日はどうされましたか?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(inputs["input_ids"], max_length=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## 📊 Training Details ### 1. **Datasets Used** 1. **Ichikara Instruction** - Source: [Ichikara Instruction Dataset](https://huggingface.co/datasets/ichikara/instruction) - **Purpose**: Supervised fine-tuning (SFT) - **License**: CC-BY-NC-SA 4.0 2. **cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental** - Source: [CyberAgent Dataset](https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental) - **Purpose**: DPO training - **License**: CC-BY 4.0 ### 2. **Pre-trained Model** - **Base Model**: [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) - **License**: Apache License 2.0 --- ## 🔒 License This repository is licensed under **CC-BY-NC-SA 4.0**.\ You are free to share and adapt the material under the following terms: 1. **Attribution**: Provide appropriate credit. 2. **NonCommercial**: You may not use the material for commercial purposes. 3. **ShareAlike**: Distribute your contributions under the same license. For more details, see the full license text here: [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). --- ## ⚙️ Acknowledgements - Special thanks to [CyberAgent](https://huggingface.co/cyberagent) and [Ichikara](https://huggingface.co/ichikara) for providing high-quality datasets. - The base model was developed by [llm-jp](https://huggingface.co/llm-jp). --- ## 🔄 Update Log ### [2024-12-15] Initial Release - Fine-tuned with Ichikara Instruction (SFT) - Trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental (DPO)