Uploaded model
- Developed by: erikomaru
- License: apache-2.0
- Finetuned from model : llm-jp/llm-jp-3-13b
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Sample use
This script uses a qLoRA adapter trained with the Unsloth library to generate outputs for the ELYZA-tasks-100-TV benchmark tasks. It assumes that the adapter has been uploaded to Hugging Face. The code is designed specifically for loading the model using the Unsloth library and performing inference. The model generates accurate and efficient outputs by leveraging advanced quantization techniques (4-bit NF4) and fine-tuned weights. It is ideal for Japanese natural language processing tasks that require precise instruction-following capabilities.
# Import necessary libraries
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
import torch
from tqdm import tqdm
import json
# Set your Hugging Face token and model name
HF_TOKEN = "your-token" # Replace with your token
model_name = "erikomaru/llm-jp-3-13b-it"
# Step 1: Configure 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# Step 2: Load the model with LoRA adapters
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
token = HF_TOKEN
)
# Step 3: Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token = HF_TOKEN)
# Integrate the LoRA adapter into the base model.
model = PeftModel.from_pretrained(model, adapter_id, token=HF_TOKEN)
# Step 4: Load dataset
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
item = ""
for line in f:
line = line.strip()
item += line
if item.endswith("}"):
datasets.append(json.loads(item))
item = ""
# Perform inference using the model.
# Switch the model to inference mode
FastLanguageModel.for_inference(model)
# Step 5: Run inference on dataset
results = []
for data in tqdm(datasets):
input = data["input"]
# Construct the prompt
prompt = f"""### 指示
{input}
### 回答
"""
# Tokenize input
tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
attention_mask = torch.ones_like(tokenized_input)
# Generate output
with torch.no_grad():
outputs = model.generate(
tokenized_input,
attention_mask=attention_mask,
max_new_tokens=100,
do_sample=False,
repetition_penalty=1.2,
pad_token_id=tokenizer.eos_token_id
)[0]
output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)
results.append({"task_id": data["task_id"], "input": input, "output": output})
# Step 6: Save results to a JSONL file
import re
jsonl_id = re.sub(".*/", "", adapter_id)
with open(f"./{jsonl_id}-outputs.jsonl", 'w', encoding='utf-8') as f:
for result in results:
json.dump(result, f, ensure_ascii=False) # ensure_ascii=False for handling non-ASCII characters
f.write('\n')
Model tree for erikomaru/llm-jp-3-13b-it
Base model
llm-jp/llm-jp-3-13b