MedFalcon 40b LoRA

Model Description

Architecture

nmitchko/medfalcon-40b-lora is a large language model LoRa specifically fine-tuned for medical domain tasks. It is based on Falcon-40b-instruct at 40 billion parameters.

The primary goal of this model is to improve question-answering and medical dialogue tasks. It was trained using LoRA, specifically QLora, to reduce memory footprint.

This Lora supports 4-bit and 8-bit modes.

Requirements

bitsandbytes>=0.39.0
peft
transformers

Steps to load this model:

  1. Load base model using QLORA
  2. Apply LoRA using peft
# 
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b-instruct"
LoRA = "nmitchko/medfalcon-40b-lora"

tokenizer = AutoTokenizer.from_pretrained(model)

model = AutoModelForCausalLM.from_pretrained(model,
    load_in_8bit=load_8bit,
    torch_dtype=torch.float16,
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, LoRA)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

sequences = pipeline(
   "What does the drug ceftrioxone do?\nDoctor:",
    max_length=200,
    do_sample=True,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.