<div align="center">
  <img src="https://raw.githubusercontent.com/AI4Bread/.github/main/goumang_logoall.png" width="600"/>
  <br /><br />

  
  🔍 Explore our models on 
  [![Static Badge](https://img.shields.io/badge/-gery?style=social&label=🤗%20Huggingface)](https://huggingface.co/AI4Bread/GouMang/)
  

</div>

# GouMang Agriculture Large Language Model

## This is the official repository of GouMang Agriculture Large Language Model.

## We used Xtuner Framework to train and finetune the model.


## 🎉 News

- **\[2024/06\]** [GouMang_7B](https://huggingface.co/AI4Bread/GouMang) is released! Click [here](https://huggingface.co/AI4Bread/GouMang) for details!
- **\[2024/06\]** Support [Llama 3](xtuner/configs/llama) models!

## Usage


## DEMO

Install the dependencies required for the web demo

lmdeploy 没有安装，我们接下来手动安装一下，建议安装最新的稳定版。
如果是在 InternStudio 开发环境，需要先运行下面的命令，否则会报错。


```bash
# 解决 ModuleNotFoundError: No module named 'packaging' 问题
pip install packaging
# 使用 flash_attn 的预编译包解决安装过慢问题
pip install /root/share/wheels/flash_attn-2.4.2+cu118torch2.0cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
```

```bash
pip install 'lmdeploy[all]==v0.4.2'
```
由于默认安装的是 runtime 依赖包，但是我们这里还需要部署和量化，所以，这里选择 `[all]`。


### Model convert

Convert  lmdeploy TurboMind

```bash
# 转换模型（FastTransformer格式） TurboMind
lmdeploy convert internlm-chat-7b /path/to/internlm-chat-7b
```

这里我们使用我们训练好的提供的模型文件，就在用户根目录执行，如下所示。

```bash
lmdeploy convert internlm2-chat-7b /root/autodl-tmp/agri_intern/GouMang --tokenizer-path ./GouMang/tokenizer.json
```

执行完成后将会在当前目录生成一个 `workspace` 的文件夹。这里面包含的就是 TurboMind 和 Triton “模型推理”需要到的文件。

### Chat Locally

```bash
lmdeploy chat turbomind ./workspace
```

### 2.3 TurboMind推理+API服务

在上面的部分我们尝试了直接用命令行启动 Client，接下来我们尝试如何运用 lmdepoy 进行服务化。

”模型推理/服务“目前提供了 Turbomind 和 TritonServer 两种服务化方式。此时，Server 是 TurboMind 或 TritonServer，API Server 可以提供对外的 API 服务。我们推荐使用 TurboMind，TritonServer 使用方式详见《附录1》。

首先，通过下面命令启动服务。


```bash
# ApiServer+Turbomind   api_server => AsyncEngine => TurboMind
lmdeploy serve api_server ./workspace \
	--server_name 0.0.0.0 \
	--server-port 23333 \
	--instance_num 64 \
	--tp 1
```

上面的参数中 `server_name` 和 `server_port` 分别表示服务地址和端口，`tp` 参数我们之前已经提到过了，表示 Tensor 并行。还剩下一个 `instance_num` 参数，表示实例数，可以理解成 Batch 的大小。执行后如下图所示。

### 2.4 网页 Demo 演示

这一部分主要是将 Gradio 作为前端 Demo 演示。在上一节的基础上，我们不执行后面的 `api_client` 或 `triton_client`，而是执行 `gradio`。

> 由于 Gradio 需要本地访问展示界面，因此也需要通过 ssh 将数据转发到本地。命令如下：
>
> ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p <你的 ssh 端口号>

#### 2.4.1 TurboMind 服务作为后端

API Server 的启动和上一节一样，这里直接启动作为前端的 Gradio。

```bash
# Gradio+ApiServer。必须先开启 Server，此时 Gradio 为 Client
lmdeploy serve gradio http://0.0.0.0:23333 --server-port 6006
```

#### 2.4.2 TurboMind 推理作为后端

当然，Gradio 也可以直接和 TurboMind 连接，如下所示。

```bash
# Gradio+Turbomind(local)
lmdeploy serve gradio ./workspace
```

可以直接启动 Gradio，此时没有 API Server，TurboMind 直接与 Gradio 通信。


```bash
pip install streamlit==1.24.0
```

Download the [GouMang](https://huggingface.co/AI4Bread/GouMang) project model (please Star if you like it)


Replace the model path in `web_demo.py` with the path where the downloaded parameters of `GouMang` are stored 

Run the `web_demo.py` file in the directory, and after entering the following command, [**check this tutorial 5.2 for local port configuration**](https://github.com/InternLM/tutorial/blob/main/helloworld/hello_world.md#52-%E9%85%8D%E7%BD%AE%E6%9C%AC%E5%9C%B0%E7%AB%AF%E5%8F%A3)，to map the port to your local machine. Enter `http://127.0.0.1:6006` in your local browser. 

```
streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006
```

Note: The model will load only after you open the `http://127.0.0.1:6006` page in your browser. 
Once the model is loaded, you can start conversing with GouMang.