Quick critique

#1
by ttkciar - opened

Thank you very much for making your hard work available for general use! I have been using Medalpaca-13B as a medical research assistant for about a year, and have tried other models to see if they do a better job (Asclepius-13B, Meditron-7B). Today I tried medicine-LLM-13B.

This model is very good at explaining anatomy and biochemistry, but is not very knowledgeable about diseases (in particular diabetes and ulcerative colitis), and tends to get distracted by minor wording differences in its prompt. For example, it dealt much better with "dedifferentiation theory" than with "genetic dedifferentiation theory", even though they are exactly the same thing -- with the latter, it went off on unrelated tangents about genetic factors.

It also has some funny inference glitches which makes me think its training format might not have been strictly consistent. It will sometimes infer additional prompt questions followed by "[/SYS]" or "[/INST]" before answering the question (and its own question).

That having been said, this is a much better medical inference model than Asclepius-13B or Meditron-7B, hands-down. I think it might be a better model than Medalpaca-13B for some applications, but for my particular use-case I am not yet convinced it can replace Medalpaca-13B. I will continue to use it alongside Medalpaca-13B for a while to figure out its strengths.

Hi, thank you for sharing your detailed feedback!πŸ’—

Regarding the inference glitches you've noticed, it seems like there might be a misunderstanding about the prompt template. For the chat models, the "[/SYS]" or "[/INST]" tags are indeed part of the training template.

However, for the 13B base model, you don't need to adhere to the same prompt template. Instead, you can just simply put your request as the input, as illustrated in the example below:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("AdaptLLM/medicine-LLM-13B")
tokenizer = AutoTokenizer.from_pretrained("AdaptLLM/medicine-LLM-13B", use_fast=False)

# Put your input here:
user_input = '''Question: Which of the following is an example of monosomy?
Options:
- 46,XX
- 47,XXX
- 69,XYY
- 45,X

Please provide your choice first and then provide explanations if possible.'''

# NOTE: you do NOT need to follow the prompt template for chat models here
prompt = user_input

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids.to(model.device)

outputs = model.generate(input_ids=inputs, max_length=2048)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(f'### User Input:\n{user_input}\n\n### Assistant Output:\n{pred}')

We hope this clarifies things! Looking forward to your continued feedback!

BTW, πŸ€—we highly recommend switching to the chat model developed from llama-2-chat-7b for better response quality!

Sign up or log in to comment