kattyan
/

llm-jp-3-13b-finetune3

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kattyan commited on Dec 16, 2024

Commit

9ee41db

·

verified ·

1 Parent(s): ec730b6

update readme

Files changed (1) hide show

README.md +47 -4

README.md CHANGED Viewed

@@ -8,15 +8,58 @@ tags:
 - trl
 license: apache-2.0
 language:
-- en
 ---
-# Uploaded  model
 - **Developed by:** kattyan
 - **License:** apache-2.0
-- **Finetuned from model :** llm-jp/llm-jp-3-13b
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - trl
 license: apache-2.0
 language:
+- ja
 ---
+# Uploaded Model
 - **Developed by:** kattyan
 - **License:** apache-2.0
+- **Finetuned from model:** llm-jp/llm-jp-3-13b
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+# Required Libraries and Their Versions
+- torch>=2.3.0
+- transformers>=4.40.1
+- tokenizers>=0.19.1
+- accelerate>=0.29.3
+- flash-attn>=2.5.8
+# Usage
+```python
+from unsloth import FastLanguageModel
+model_name = "llm-jp/llm-jp-3-13b"  # モデル名
+max_seq_length = 512  # 最大シーケンス長
+dtype = None  # データ型（None で自動設定）
+load_in_4bit = True  # 4bit量子化を使用
+# モデルとトークナイザーのロード
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name=model_name,
+    max_seq_length=max_seq_length,
+    dtype=dtype,
+    load_in_4bit=load_in_4bit,
+    token="YOUR_HUGGING_FACE_TOKEN",  # Hugging Face トークンを指定
+)
+# 推論用にモデルを準備
+FastLanguageModel.for_inference(model)
+# プロンプトの設定
+prompt = "LLMとはなんですか？"
+# トークナイザーで入力をエンコード
+inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
+# モデルで生成を行う
+outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True, do_sample=False, repetition_penalty=1.2)
+# 出力のデコード
+prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]
+print(prediction)
+```