LLMJP3-13B-IT2
Overview
LLMJP3-13B-IT2 is a fine-tuned language model built on top of the "llm-jp/llm-jp-3-13b" base model. This model has been optimized for Japanese text generation and understanding tasks, leveraging advanced techniques for faster training and efficient inference.
Key Features
- Base Model:
llm-jp/llm-jp-3-13b
- Fine-tuned Dataset: DeL-TaiseiOzaki/Tengentoppa-sft-v1.0
- Training Acceleration: Utilized Unsloth and Huggingface's TRL library to achieve a 2x faster training process.
- Developer:
tshyk
- License: Apache-2.0
Dataset
The model was fine-tuned using the Tengentoppa-sft-v1.0 dataset, which is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
License
This project is distributed under the Apache License 2.0. Please review the license terms before using the model.
How to Use
Installation and Setup
Follow the steps below to set up the environment and run inference using the model.
Colab Setup
# -*- coding: utf-8 -*-
"""myModel_Inference_Template_unsloth.ipynb"""
# Install dependencies
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
from unsloth import FastLanguageModel
import torch
import json
model_name = "tshyk/llmjp3-13b-it2"
# Model Configuration
max_seq_length = 2048
dtype = None
load_in_4bit = True
# Load Model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
token="your_huggingface_token",
)
FastLanguageModel.for_inference(model)
from google.colab import drive
drive.mount('/content/drive')
# Load Dataset
datasets = []
with open("/content/drive/MyDrive/elyza100_assignment/elyza-tasks-100-TV_0.jsonl", "r") as f:
item = ""
for line in f:
line = line.strip()
item += line
if item.endswith("}"):
datasets.append(json.loads(item))
item = ""
from tqdm import tqdm
# Inference
results = []
for dt in tqdm(datasets):
input = dt["input"]
prompt = f"""### 指示
{input}
### 回ç”
"""
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True, do_sample=False, repetition_penalty=1.2)
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回ç”')[-1]
results.append({
"task_id": dt["task_id"],
"input": input,
"output": prediction
})
with open(f"/content/model_output.jsonl", 'w', encoding='utf-8') as f:
for result in results:
json.dump(result, f, ensure_ascii=False)
f.write('\n')
Notes
- Replace
your_huggingface_token
with your Huggingface token. - Ensure the dataset files are properly mounted on Colab or your local environment.
Citation
If you use this model, please cite as follows:
@misc{tshyk2024llmjp,
author = {tshyk},
title = {LLMJP3-13B-IT2},
year = {2024},
url = {https://huggingface.co/tshyk/llmjp3-13b-it2},
note = {Fine-tuned using the Tengentoppa-sft-v1.0 dataset.}
}
For further inquiries or contributions, feel free to contact tshyk
via Hugging Face.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for tshyk/llmjp3-13b-it2
Base model
llm-jp/llm-jp-3-13b