File size: 1,532 Bytes

787a2e1
 
 
113939d
 
 
 
 
787a2e1
 
72bcc1a
 
787a2e1
72bcc1a
f755977
787a2e1
72bcc1a
 
 
787a2e1
72bcc1a
787a2e1
72bcc1a
 
 
 
 
 
 
787a2e1
72bcc1a
787a2e1
72bcc1a
787a2e1
72bcc1a
 
787a2e1
 
72bcc1a
 
787a2e1
72bcc1a
 
787a2e1
 
 
72bcc1a

---
library_name: peft
base_model: yahma/llama-7b-hf
language:
- en
pipeline_tag: text-generation
tags:
- text-generation-inference
---

# About :
AlpaRA 7B, a model for medical dialogue understanding. Fine-tuned using the Alpaca configuration on a curated 5,000-instruction dataset capturing nuances in patient-doctor conversations. Use Parameter Efficient Fine Tuning (PEFT) and Low Rank Adaptation (LoRA), make this model efficient on consumer-grade GPUs.

## How to Use :
## Load the AlpaRA model

```python
from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig

tokenizer = LlamaTokenizer.from_pretrained("yahma/llama-7b-hf")

model = LlamaForCausalLM.from_pretrained(
    "yahma/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "KalbeDigitalLab/alpara-7b-peft")
```

## Prompt Template :

Feel free to change the instruction

```python
PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.


### Instruction:
"how to cure flu?"

### Response:"""
```

## Evaluation

```python
inputs = tokenizer(
    PROMPT,
    return_tensors="pt"
)
input_ids = inputs["input_ids"].cuda()

print("Generating...")
generation_output = model.generate(
    input_ids=input_ids,
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=512,
)
for s in generation_output.sequences:
    result = tokenizer.decode(s).split("### Response:")[1]
    print(result)
```