|
--- |
|
license: cc |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- medical |
|
inference: false |
|
--- |
|
|
|
# medalpaca-13B GPTQ 4bit |
|
|
|
This is a [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) 4bit quantisation of [medalpaca-13b](https://huggingface.co/medalpaca/medalpaca-13b). |
|
|
|
## GIBBERISH OUTPUT IN `text-generation-webui`? |
|
|
|
Please read the Provided Files section below. You should use `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to update GPTQ-for-LLaMa. |
|
|
|
## Provided files |
|
|
|
Two files are provided. **The second file will not work unless you use recent GPTQ-for-LLaMa code** |
|
|
|
Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers. |
|
|
|
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` |
|
|
|
* `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` |
|
* Works with all versions of GPTQ-for-LLaMa code |
|
* Works with text-generation-webui one-click-installers |
|
* Parameters: Groupsize = 128g. No act-order. |
|
* Command: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python3 llama.py medalpaca-13b c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors |
|
``` |
|
* `medalpaca-13B-GPTQ-4bit-128g.safetensors` |
|
* Only works with the latest GPTQ-for-LLaMa code |
|
* **Does not** work with text-generation-webui one-click-installers |
|
* Parameters: Groupsize = 128g. act-order. |
|
* Offers highest quality quantisation, but requires recent GPTQ-for-LLaMa code |
|
* Command: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python3 llama.py medalpaca-13b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors medalpaca-13B-GPTQ-4bit-128g.safetensors |
|
``` |
|
|
|
## How to run in `text-generation-webui` |
|
|
|
File `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui). |
|
|
|
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)). |
|
|
|
The other `safetensors` model file was created with the latest GPTQ code, and uses `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI. |
|
|
|
If you want to use the act-order `safetensors` file and need to update GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI: |
|
``` |
|
# Clone text-generation-webui, if you don't already have it |
|
git clone https://github.com/oobabooga/text-generation-webui |
|
# Make a repositories directory |
|
mkdir text-generation-webui/repositories |
|
cd text-generation-webui/repositories |
|
# Clone the latest GPTQ-for-LLaMa code inside text-generation-webui |
|
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa |
|
``` |
|
|
|
Then install this model into `text-generation-webui/models` and launch the UI as follows: |
|
``` |
|
cd text-generation-webui |
|
python server.py --model vicuna-13B-1.1-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want |
|
``` |
|
|
|
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information. |
|
|
|
If you are on Windows, or cannot use the Triton branch of GPTQ for any other reason, you can try the CUDA branch instead: |
|
``` |
|
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda |
|
cd GPTQ-for-LLaMa |
|
python setup_cuda.py install |
|
``` |
|
Then link that into `text-generation-webui/repositories` as described above. |
|
|
|
However I have heard reports that the CUDA code may run quite slow. |
|
|
|
Or just use `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui. |
|
|
|
# Original model card: MedAlpaca 13b |
|
|
|
|
|
## Table of Contents |
|
|
|
[Model Description](#model-description) |
|
- [Architecture](#architecture) |
|
- [Training Data](#trainig-data) |
|
[Model Usage](#model-usage) |
|
[Limitations](#limitations) |
|
|
|
## Model Description |
|
### Architecture |
|
`medalpaca-13b` is a large language model specifically fine-tuned for medical domain tasks. |
|
It is based on LLaMA (Large Language Model Meta AI) and contains 13 billion parameters. |
|
The primary goal of this model is to improve question-answering and medical dialogue tasks. |
|
|
|
### Training Data |
|
The training data for this project was sourced from various resources. |
|
Firstly, we used Anki flashcards to automatically generate questions, |
|
from the front of the cards and anwers from the back of the card. |
|
Secondly, we generated medical question-answer pairs from [Wikidoc](https://www.wikidoc.org/index.php/Main_Page). |
|
We extracted paragraphs with relevant headings, and used Chat-GPT 3.5 |
|
to generate questions from the headings and using the corresponding paragraphs |
|
as answers. This dataset is still under development and we believe |
|
that approximately 70% of these question answer pairs are factual correct. |
|
Thirdly, we used StackExchange to extract question-answer pairs, taking the |
|
top-rated question from five categories: Academia, Bioinformatics, Biology, |
|
Fitness, and Health. Additionally, we used a dataset from [ChatDoctor](https://arxiv.org/abs/2303.14070) |
|
consisting of 200,000 question-answer pairs, available at https://github.com/Kent0n-Li/ChatDoctor. |
|
|
|
| Source | n items | |
|
|------------------------------|--------| |
|
| ChatDoc large | 200000 | |
|
| wikidoc | 67704 | |
|
| Stackexchange academia | 40865 | |
|
| Anki flashcards | 33955 | |
|
| Stackexchange biology | 27887 | |
|
| Stackexchange fitness | 9833 | |
|
| Stackexchange health | 7721 | |
|
| Wikidoc patient information | 5942 | |
|
| Stackexchange bioinformatics | 5407 | |
|
|
|
## Model Usage |
|
To evaluate the performance of the model on a specific dataset, you can use the Hugging Face Transformers library's built-in evaluation scripts. Please refer to the evaluation guide for more information. |
|
Inference |
|
|
|
You can use the model for inference tasks like question-answering and medical dialogues using the Hugging Face Transformers library. Here's an example of how to use the model for a question-answering task: |
|
|
|
```python |
|
|
|
from transformers import pipeline |
|
|
|
qa_pipeline = pipeline("question-answering", model="medalpaca/medalpaca-7b", tokenizer="medalpaca/medalpaca-7b") |
|
question = "What are the symptoms of diabetes?" |
|
context = "Diabetes is a metabolic disease that causes high blood sugar. The symptoms include increased thirst, frequent urination, and unexplained weight loss." |
|
answer = qa_pipeline({"question": question, "context": context}) |
|
print(answer) |
|
``` |
|
|
|
## Limitations |
|
The model may not perform effectively outside the scope of the medical domain. |
|
The training data primarily targets the knowledge level of medical students, |
|
which may result in limitations when addressing the needs of board-certified physicians. |
|
The model has not been tested in real-world applications, so its efficacy and accuracy are currently unknown. |
|
It should never be used as a substitute for a doctor's opinion and must be treated as a research tool only. |