llm-jp-3-13b-finetune-241202 / README.md

Update README.md

96830c9 verified 3 months ago

3.82 kB

	---
	base_model: llm-jp/llm-jp-3-13b
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	license: apache-2.0
	language:
	- en
	---

	## 概要 (Overview)
	[LLM-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) をベースに、LoRA (QLoRA) と [Unsloth](https://github.com/unslothai/unsloth) 、および Hugging Face [TRL](https://github.com/huggingface/trl) を用いて高速にファインチューニングした日本語LLMモデルです。松尾研大規模言語モデル講座2024のコンペ用の提出モデル作成の一環として作成・公開しています。


	- データセット：
	- Ichikara Instruction（複数のデータセットを結合）
	- elyza/ELYZA-tasks-100
	- izumi-lab/wikipedia-ja-20230720

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)


	---


	## 推論環境 (Environment Requirements)

	- Python 3.10 以上推奨
	- GPU: 24GB 以上の VRAM (NVIDIA L4 / A5000 等)
	- 必要パッケージ (例):
	- `transformers`
	- `torch`
	- `unsloth`
	- `bitsandbytes`
	- `accelerate`
	- `peft`

	以下のようなコマンドで一括インストールできます (環境に応じて調整してください):

	```bash
	pip install transformers torch unsloth bitsandbytes accelerate peft
	```


	Google Colabの場合は以下のコマンドを実行してください。
	```bash
	!pip uninstall unsloth -y
	!pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
	```


	---

	## モデルのロード & 推論手順 (Inference)

	### 1. モデルのロード

	```python
	from unsloth import FastLanguageModel
	import torch

	# セッティング例
	max_seq_length = 2048
	dtype = None # Noneで自動検出 (GPU世代に応じて fp16 / bfloat16)
	load_in_4bit = True # 4bit量子化を有効化（メモリ節約）

	HF_TOKEN = "your_token" # Hugging Faceのアクセストークン

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "Toki-AI/llm-jp-3-13b-finetune-241202",
	max_seq_length = max_seq_length,
	dtype = dtype,
	load_in_4bit = load_in_4bit,
	token = HF_TOKEN,
	)
	```

	### 2. 推論用コード例

	```python
	from unsloth import FastLanguageModel
	from tqdm import tqdm
	import json

	# 推論モードに切り替え
	FastLanguageModel.for_inference(model)

	# 推論したいタスクのJSONLファイルを読み込む例
	datasets = []
	with open("elyza-tasks-100-TV_0.jsonl", "r") as f:
	for line in f:
	if line.strip():
	datasets.append(json.loads(line))

	# 推論の実行
	results = []
	for dt in tqdm(datasets):
	input_text = dt["input"]
	# プロンプト例
	prompt = f"""### 指示
	{input_text}
	### 回答
	"""

	inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	use_cache=True,
	do_sample=False,
	repetition_penalty=1.2
	)

	# 出力を整形
	prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]
	results.append({"task_id": dt["task_id"], "input": input_text, "output": prediction})

	# 推論結果の確認 (先頭3件)
	for res in results[:3]:
	print(res)
	```

	※ 推論パラメータ（`max_new_tokens`, `do_sample`, `repetition_penalty`, `temperature`, `top_p`など）はタスクに応じて変更してください。

	---

	## ライセンス (License)

	本モデルは [Apache License 2.0](./LICENSE) のもとで配布されています。
	ベースモデル [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) に準拠した利用規約やライセンスについてもご確認ください。


	---