Spaces:
Sleeping
Sleeping
File size: 6,670 Bytes
9704f7c daf67f0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
---
title: ControlLLM
emoji: π
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.9.1
app_file: app.py
pinned: false
license: apache-2.0
---
<img src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/4e8b2511-ce69-4c1a-95a1-5aed4d432a82" width=10% align="left" />
# ControlLLM
ControlLLM: Augmenting Large Language Models with Tools by Searching on Graphs [[Paper](https://arxiv.org/abs/2310.17796)]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a $\textit{task decomposer}$ that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a $\textit{Thoughts-on-Graph (ToG) paradigm}$ that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an $\textit{execution engine with a rich toolbox}$ that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.
## π€ Video Demo
<!-- <table>
<tr>
<td><img width="450" src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/7fe7d1ec-e37e-4ea8-8201-dc639c82ba66" alt="Image 1"></td>
<td><img width="450" src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/a8bc6644-368b-42e3-844a-9962fdc9bd01" alt="Image 2"></td>
</tr>
</table>
-->
https://github.com/OpenGVLab/ControlLLM/assets/13723743/cf72861e-0e7b-4c15-89ee-7fa1d838d00f
## π System Overview

## π Major Features
- Image Perception
- Image Editing
- Image Generation
- Video Perception
- Video Editing
- Video Generation
- Audio Perception
- Audio Generation
- Multi-Solution
- Pointing Inputs
- Resource Type Awareness
## ποΈ Schedule
- [ ] Launch online demo
## π οΈInstallation
### Basic requirements
* Linux
* Python 3.10+
* PyTorch 2.0+
* CUDA 11.8+
### Clone project
Execute the following command in the root directory:
```bash
git clone https://github.com/OpenGVLab/ControlLLM.git
```
### Install dependencies
Setup environment:
```bash
conda create -n cllm python=3.10
conda activate cllm
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
```
Install [LLaVA](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file):
```bash
pip install git+https://github.com/haotian-liu/LLaVA.git
```
Then install other dependencies:
```bash
cd controlllm
pip install -r requirements.txt
```
## π¨βπ« Get Started
### Launch tool services
Please put your personal OpenAI Key and [Weather Key](https://www.visualcrossing.com/weather-api) into the corresponding environment variables.
```bash
cd ./controlllm
# openai key
export OPENAI_API_KEY="..."
# openai base
export OPENAI_BASE_URL="..."
# weather api key
export WEATHER_API_KEY="..."
python -m cllm.services.launch --port 10011 --host 0.0.0.0
```
### Launch ToG service
```bash
cd ./controlllm
export TOG_SERVICES_PORT=10011
export OPENAI_BASE_URL="..."
export OPENAI_API_KEY="..."
python -m cllm.services.tog.launch --port 10012 --host 0.0.0.0
```
### Launch gradio demo
Use `openssl` to generate the certificate:
```shell
mkdir certificate
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes
```
Launch gradio demo:
```bash
cd ./controlllm
export TOG_PORT=10012
export TOG_SERVICES_PORT=10011
export RESOURCE_ROOT="./client_resources"
export GRADIO_TEMP_DIR="$HOME/.tmp"
export OPENAI_BASE_URL="..."
export OPENAI_API_KEY="..."
python -m cllm.app.gradio --controller "cllm.agents.tog.Controller" --server_port 10024
```
### Tools as Services
Take image generation as an example, we first launch the service.
```bash
python -m cllm.services.image_generation.launch --port 10011 --host 0.0.0.0
```
Then, we call the services via python api.
```python
from cllm.services.image_generation.api import *
setup(port=10011)
text2image('A horse')
```
π¬ Launch all in one endpoint
```bash
python -m cllm.services.launch --port 10011 --host 0.0.0.0
```
## π οΈ Support Tools
See [Tools](TOOL.md)
## π« License
This project is released under the [Apache 2.0 license](LICENSE).
## ποΈ Citation
If you find this project useful in your research, please cite our paper:
```BibTeX
@article{2023controlllm,
title={ControlLLM: Augment Language Models with Tools by Searching on Graphs},
author={Liu, Zhaoyang and Lai, Zeqiang and Gao Zhangwei and Cui, Erfei and Li, Zhiheng and Zhu, Xizhou and Lu, Lewei and Chen, Qifeng and Qiao, Yu and Dai, Jifeng and Wang Wenhai},
journal={arXiv preprint arXiv:2305.10601},
year={2023}
}
```
## π€ Acknowledgement
- Thanks to the open source of the following projects:
[Hugging Face](https://github.com/huggingface)  
[LangChain](https://github.com/hwchase17/langchain)  
[SAM](https://github.com/facebookresearch/segment-anything)  
[Stable Diffusion](https://github.com/CompVis/stable-diffusion)  
[ControlNet](https://github.com/lllyasviel/ControlNet)  
[InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)  
[EasyOCR](https://github.com/JaidedAI/EasyOCR) 
[ImageBind](https://github.com/facebookresearch/ImageBind)  
[PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha)  
[LLaVA](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file)  
[Modelscope](https://modelscope.cn/my/overview)  
[AudioCraft](https://github.com/facebookresearch/audiocraft)  
[Whisper](https://github.com/openai/whisper)  
[Llama 2](https://github.com/facebookresearch/llama)  
[LLaMA](https://github.com/facebookresearch/llama/tree/llama_v1) 
---
If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:
<p align="center"><img width="300" alt="image" src="https://github.com/OpenGVLab/DragGAN/assets/26198430/e3f0807f-956a-474e-8fd2-1f7c22d73997"></p>
|