ce-lery/japanese-mistral-300m-instruction-GGUF

Quantized GGUF model files for japanese-mistral-300m-instruction from ce-lery

Original Model Card:

japanese-mistral-300m-instruction

Overview

Welcome to my model card!

This Model feature is ...

Yukkuri shite ittene!

How to use the model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import os

MODEL_NAME = "ce-lery/japanese-mistral-300m-instruction"
torch.set_float32_matmul_precision('high')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=False,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME,trust_remote_code=True).to(device)

MAX_ASSISTANT_LENGTH = 100
MAX_INPUT_LENGTH = 128
INPUT_PROMPT = r'<s>\n以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。\n[SEP]\n指示:\n{instruction}\n[SEP]\n入力:\n{input}\n[SEP]\n応答:\n'
NO_INPUT_PROMPT = r'<s>\n以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。\n[SEP]\n指示:\n{instruction}\n[SEP]\n応答:\n'

def prepare_input(instruction, input_text):
    if input_text != "":
        prompt = INPUT_PROMPT.format(instruction=instruction, input=input_text)
    else:
        prompt = NO_INPUT_PROMPT.format(instruction=instruction)
    return prompt

def format_output(output):
    output = output.lstrip("<s>").rstrip("</s>").replace("[SEP]", "").replace("\\n", "\n")
    return output

def generate_response(instruction, input_text):
    prompt = prepare_input(instruction, input_text)
    token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
    n = len(token_ids[0])
    # print(n)

    with torch.no_grad():
        output_ids = model.generate(
            token_ids.to(model.device),
            min_length=n,
            max_length=min(MAX_INPUT_LENGTH, n + MAX_ASSISTANT_LENGTH),
            top_p=0.95,
            top_k=50,
            temperature=0.4,
            do_sample=True,
            no_repeat_ngram_size=2,
            num_beams=3,
            pad_token_id=tokenizer.pad_token_id,
            bos_token_id=tokenizer.bos_token_id,
            eos_token_id=tokenizer.eos_token_id,
            bad_words_ids=[[tokenizer.unk_token_id]]
        )

    output = tokenizer.decode(output_ids.tolist()[0])
    formatted_output_all = format_output(output)
    response = f"Assistant:{formatted_output_all.split('応答:')[-1].strip()}"

    return formatted_output_all, response 

instruction = "あなたは何でも正確に答えられるAIです。"
questions = [
    "日本で一番高い山は?",
    "日本で一番広い湖は?",
    "世界で一番高い山は?",
    "世界で一番広い湖は?",
    "冗談を言ってください。",
]

# 各質問に対して応答を生成して表示
for question in questions:
    formatted_output_all, response = generate_response(instruction, question)
    print(response)

Receipe

If you want to restruct this model, you can refer this Github repository.

I wrote the receipe for struction this model. For example,

  • Preprocess with sentencepiece
  • Pretraining with flash attention2 and torch.compile and DeepSpeed
  • Fine-tuning with databricks-dolly-15k-ja

If you find my mistake,error,...etc, please create issue. If you create pulreqest, I'm very happy!

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 200
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.595 3.51 40 3.5299
3.4769 7.02 80 3.3722
3.3037 10.53 120 3.1871
3.1255 14.05 160 3.0088
2.9615 17.56 200 2.8684
2.8468 21.07 240 2.7808
2.7699 24.58 280 2.7205
2.7139 28.09 320 2.6793
2.6712 31.6 360 2.6509
2.6356 35.12 400 2.6294
2.6048 38.63 440 2.6120
2.5823 42.14 480 2.5974
2.5536 45.65 520 2.5849
2.5293 49.16 560 2.5740
2.5058 52.67 600 2.5644
2.482 56.19 640 2.5556
2.4575 59.7 680 2.5477
2.4339 63.21 720 2.5405
2.4073 66.72 760 2.5350
2.3845 70.23 800 2.5303
2.3606 73.74 840 2.5253
2.329 77.26 880 2.5215
2.3071 80.77 920 2.5185
2.2768 84.28 960 2.5155
2.2479 87.79 1000 2.5144
2.2181 91.3 1040 2.5151
2.1901 94.81 1080 2.5139
2.1571 98.33 1120 2.5148
2.1308 101.84 1160 2.5166
2.1032 105.35 1200 2.5193
2.0761 108.86 1240 2.5204
2.0495 112.37 1280 2.5269
2.0231 115.88 1320 2.5285
2.0021 119.4 1360 2.5328
1.9793 122.91 1400 2.5383
1.9575 126.42 1440 2.5442
1.9368 129.93 1480 2.5488
1.9216 133.44 1520 2.5534
1.902 136.95 1560 2.5584
1.8885 140.47 1600 2.5609
1.8728 143.98 1640 2.5657
1.8605 147.49 1680 2.5697
1.8476 151.0 1720 2.5741
1.8402 154.51 1760 2.5770
1.8274 158.02 1800 2.5803
1.8218 161.54 1840 2.5829
1.8144 165.05 1880 2.5847
1.8097 168.56 1920 2.5867
1.8076 172.07 1960 2.5883
1.8014 175.58 2000 2.5892
1.8001 179.09 2040 2.5899
1.7987 182.61 2080 2.5903
1.7971 186.12 2120 2.5906
1.7979 189.63 2160 2.5907
1.7975 193.14 2200 2.5907

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.5
  • Tokenizers 0.14.1
Downloads last month
15
GGUF
Model size
355M params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for afrideva/japanese-mistral-300m-instruction-GGUF