DISC-MedLLM-ggml / README.md
npc0's picture
Update README.md
0d01c4c
|
raw
history blame
2.45 kB
metadata
license: apache-2.0
datasets:
  - Flmc/DISC-Med-SFT
language:
  - zh
pipeline_tag: text-generation
tags:
  - baichuan
  - medical
  - ggml

This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

The weights are converted to GGML format using baichuan13b.cpp (based on llama.cpp)

Model GGML quantize method HDD size
ggml-model-q4_0.bin q4_0 7.55 GB
ggml-model-q4_1.bin q4_1 8.36 GB
ggml-model-q5_0.bin q5_0 9.17 GB
ggml-model-q5_1.bin q5_1 9.97 GB

How to inference

  1. Compile baichuan13b, a main executable baichuan13b/build/bin/main and a server baichuan13b/build/bin/server will be generated.

  2. Download the weight in this repository to baichuan13b/build/bin/

  3. For command line interface, the following command is useful. You can also read the doc including other command line parameters

    cd baichuan13b/build/bin/
    ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
    
  4. For API interface, the following command is usefule. You can also read the doc about server command line options

    cd baichuan13b/build/bin/
    ./server -m ggml-model-q4_0.bin -c 2048
    
  5. To test API interface, you can use curl:

    curl --request POST \
    --url http://localhost:8080/completion \
    --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
    

Use it in Python

To use it in Python script like cli_demo.py all you need to do is replacing the model.chat() using import requests, POST to localhost:8080 in JSON and decode HTTP return.

import requests
llm_output = requests.post(
  "http://localhost:8080/completion"
).json({
  "prompt": "I feel sick. Nausea and Vomiting.",
  "n_predict": 512
}).json()
print(llm_output)