LLMJP3-13B-IT2

Overview

LLMJP3-13B-IT2 is a fine-tuned language model built on top of the "llm-jp/llm-jp-3-13b" base model. This model has been optimized for Japanese text generation and understanding tasks, leveraging advanced techniques for faster training and efficient inference.

Key Features

  • Base Model: llm-jp/llm-jp-3-13b
  • Fine-tuned Dataset: DeL-TaiseiOzaki/Tengentoppa-sft-v1.0
  • Training Acceleration: Utilized Unsloth and Huggingface's TRL library to achieve a 2x faster training process.
  • Developer: tshyk
  • License: Apache-2.0

Dataset

The model was fine-tuned using the Tengentoppa-sft-v1.0 dataset, which is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

License

This project is distributed under the Apache License 2.0. Please review the license terms before using the model.

How to Use

Installation and Setup

Follow the steps below to set up the environment and run inference using the model.

Colab Setup

# -*- coding: utf-8 -*-
"""myModel_Inference_Template_unsloth.ipynb"""

# Install dependencies
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

from unsloth import FastLanguageModel
import torch
import json

model_name = "tshyk/llmjp3-13b-it2"

# Model Configuration
max_seq_length = 2048
dtype = None
load_in_4bit = True

# Load Model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token="your_huggingface_token",
)
FastLanguageModel.for_inference(model)

from google.colab import drive
drive.mount('/content/drive')

# Load Dataset
datasets = []
with open("/content/drive/MyDrive/elyza100_assignment/elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
        line = line.strip()
        item += line
        if item.endswith("}"):
            datasets.append(json.loads(item))
            item = ""

from tqdm import tqdm

# Inference
results = []
for dt in tqdm(datasets):
    input = dt["input"]

    prompt = f"""### 指示
{input}
### 回答
"""

    inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True, do_sample=False, repetition_penalty=1.2)
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]

    results.append({
        "task_id": dt["task_id"],
        "input": input,
        "output": prediction
    })

with open(f"/content/model_output.jsonl", 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)
        f.write('\n')

Notes

  • Replace your_huggingface_token with your Huggingface token.
  • Ensure the dataset files are properly mounted on Colab or your local environment.

Citation

If you use this model, please cite as follows:

@misc{tshyk2024llmjp,
  author = {tshyk},
  title = {LLMJP3-13B-IT2},
  year = {2024},
  url = {https://huggingface.co/tshyk/llmjp3-13b-it2},
  note = {Fine-tuned using the Tengentoppa-sft-v1.0 dataset.}
}

For further inquiries or contributions, feel free to contact tshyk via Hugging Face.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for tshyk/llmjp3-13b-it2

Finetuned
(1120)
this model