Deita banner

Model Card for Deita Complexity Scorer

GitHub | Paper

Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs).

Deita Complexity Scorer is a tool for automatically annotating the Instruction Complexity of SFT data.

Model description

  • Model type: Model fine tuned to automatically annotate the Instruction Complexity
  • Language(s) (NLP): Primarily English
  • Finetuned from model: Llama-1-13b-hf

Model Sources

Performance

Model Align Data Size MT-Bench AlpacaEval(%) OpenLLM (Avg.)
Proprietary Models
GPT-4-Turbo ? -- 9.32 97.70 --
GPT-4 SFT + PPO -- 8.99 95.03 --
Claude-2 SFT + PPO -- 8.06 91.36 --
GPT-3.5-turbo SFT + PPO -- 7.94 89.37 --
Open-sourced Models based on LLaMA-1-13B
LIMA SFT 1K SFT 4.29 41.98 59.82
WizardLM-13B SFT 70K SFT 6.35 75.31 58.96
Vicuna-13B-v1.3 SFT 125K SFT 6.39 82.11 60.01
Random SFT 10K SFT 6.03 71.52 60.14
DEITA-LLaMA1-13B-v1.0-sft SFT 10K SFT 6.60 78.01 64.27
Open-sourced Models based on LLaMA-2-13B
Tulu-2-13B SFT 326K SFT 6.70 78.90 --
Tulu-2-13B+DPO SFT + DPO 326K SFT + 60K DPO 7.00 89.50 --
LLaMA2-13B-Chat SFT + PPO -- 6.65 81.09 --
WizardLM-13B-v1.2 SFT >70K SFT 7.09 89.17 --
Vicuna-13B-v1.5 SFT 125K SFT 6.57 78.80 61.63
Random SFT 10K SFT 5.78 65.19 61.32
DEITA-LLaMA2-13B-v1.0-sft SFT 10K SFT 6.79 81.09 62.71
Open-sourced Models based on Mistral-7B
Mistral-7B-Instruct-v0.1 -- -- 6.84 69.65 60.45
Zephyr-7B-sft SFT 200K SFT 5.32 75.12 60.93
$\text{Zephyr-7B-}\beta$ SFT + DPO 200K SFT + 60K DPO 7.34 90.60 66.36
OpenChat-3.5 C-RLFT >> 70K C-RLFT 7.81 88.51 --
Starling-7B C-RLFT + APA >>70K C-RLFT + 183K APA 8.09 91.99 --
Random SFT 10K SFT 5.89 56.90 61.72
DEITA-7B-v1.0-sft (6K) SFT 6K SFT 7.22 80.78 64.94
DEITA-7B-v1.0-sft (10K) SFT 10K SFT 7.32 81.67 64.00
DEITA-7B-v1.0 SFT + DPO 6K SFT + 10K DPO 7.55 90.06 69.86

Usage

Please use the following format to score the complexity of the Instruction:

from transformers import AutoTokenizer, AutoModelForCausalLM
import numpy as np
from scipy.special import softmax
model_name = "hkust-nlp/deita-complexity-scorer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


def infer_complexity(model, tokenizer, input_text):
    complexity_template = ("You are a helpful assistant. Please identify the complexity score of the following user query. \n##Query: {instruction}  \n##Complexity: ")
    user_input = complexity_template.format(instruction=input_text)
    input_ids = tokenizer.encode(user_input, return_tensors="pt")
    max_length = 512
    outputs = model.generate(input_ids, max_length=512, num_return_sequences=1, return_dict_in_generate=True, output_scores=True)
    logprobs_list = outputs.scores[0][0]
    score_logits = []
    id2score = {
        29896: "1",
        29906: "2",
        29941: "3",
        29946: "4",
        29945: "5",
        29953: "6"
    }
    score_template = np.array([1,2,3,4,5,6])
    for k in id2score:
        score_logits.append(logprobs_list[k])
    score_logits = np.array(score_logits)
    score_npy = softmax(score_logits, axis=0)
    score_npy = score_npy * score_template

    score_npy = np.sum(score_npy, axis=0)
    return score_npy

# example input
input_text = "write a performance review for a junior data scientist"
complexity_score = infer_complexity(model, tokenizer, input_text)

print(complexity_score)

Citation

If you find the content of this project helpful, please cite our paper as follows:

@misc{liu2023what,
      title={What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning}, 
      author={Wei Liu and Weihao Zeng and Keqing He and Yong Jiang and Junxian He},
      year={2023},
      eprint={2312.15685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
1,030
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including hkust-nlp/deita-complexity-scorer