01-ai
/

Yi-6B-Chat-8bits

@@ -81,7 +81,9 @@ pipeline_tag: text-generation
 - [🟢 How to use Yi?](#-how-to-use-yi)
   - [Quick start](#quick-start)
     - [Choose your path](#choose-your-parth)
-    - [Tutorial](#tutorial)
   - [Fine tune](#fine-tune)
   - [Quantization](#quantization)
   - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
@@ -231,7 +233,9 @@ sequence length and can be extended to 32K during inference time.
 - [Quick start](#quick-start)
   - [Choose your path](#choose-your-parth)
-  - [Tutorial](#tutorial)
 - [Fine tune](#fine-tune)
 - [Quantization](#quantization)
 - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
@@ -252,7 +256,7 @@ Select one of the following paths to begin your journey with Yi!
 If you prefer to deploy Yi models locally,
   - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods:
-    - [pip](#tutorial)
     - [Docker](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#11-docker)
     - [conda-lock](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#12-local-development-environment)
@@ -290,18 +294,18 @@ If you want to chat with Yi with more customizable options (e.g., system prompt,
 - [Yi-34B-Chat](https://platform.lingyiwanwu.com/) (Yi official beta)
   - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)).
-## Tutorial
 This tutorial guides you through every step of running Yi (Yi-34B-Chat) locally and then performing inference.
-### Step 0: Prerequistes
 - This tutorial assumes you are running the **Yi-34B-Chat** with an **A800 (80G)** GPU.
   - For detailed deployment requirements to run Yi models, see [hardware requirements]( https://github.com/01-ai/Yi/blob/main/docs/deployment.md).
 - Make sure Python 3.10 or later version is installed.
-### Step 1: Prepare environment
 To set up the environment and install the required packages, execute the following command.
@@ -311,7 +315,7 @@ cd yi
 pip install -r requirements.txt
 ```
-### Step 2: Download Yi model
 You can download the weights and tokenizer of Yi models from the following sources:
@@ -319,11 +323,11 @@ You can download the weights and tokenizer of Yi models from the following sourc
 - [ModelScope](https://www.modelscope.cn/organization/01ai/)
 - [WiseModel](https://wisemodel.cn/organization/01.AI)
-### Step 3: Perform inference
 You can perform inference with Yi chat or base models as below.
-#### Perform inference with Yi chat model
 1. Create a file named  `quick_start.py` and copy the following content to it.
@@ -366,7 +370,7 @@ You can perform inference with Yi chat or base models as below.
     Hello! How can I assist you today?
     ```
-#### Perform inference with Yi base model
 The steps are similar to [Run Yi chat model](#run-yi-chat-model).
@@ -390,6 +394,30 @@ Then you can see an output similar to the one below. 🥳
 </details>
 ### Finetuning
 ```bash
@@ -685,4 +713,4 @@ are fully open for academic research and free commercial usage with permission
 via applications. All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
 For free commercial use, you only need to send an email to [get official commercial permission](https://www.lingyiwanwu.com/yi-license).
-<div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>

 - [🟢 How to use Yi?](#-how-to-use-yi)
   - [Quick start](#quick-start)
     - [Choose your path](#choose-your-parth)
+    - [pip](#pip)
+    - [llama.cpp](https://github.com/01-ai/Yi/blob/main/docs/yi_llama.cpp.md)
+    - [Web demo](#web-demo)
   - [Fine tune](#fine-tune)
   - [Quantization](#quantization)
   - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
 - [Quick start](#quick-start)
   - [Choose your path](#choose-your-parth)
+  - [pip](#pip)
+  - [llama.cpp](https://github.com/01-ai/Yi/blob/main/docs/yi_llama.cpp.md)
+  - [Web demo](#web-demo)
 - [Fine tune](#fine-tune)
 - [Quantization](#quantization)
 - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
 If you prefer to deploy Yi models locally,
   - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods:
+    - [pip](#pip)
     - [Docker](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#11-docker)
     - [conda-lock](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#12-local-development-environment)
 - [Yi-34B-Chat](https://platform.lingyiwanwu.com/) (Yi official beta)
   - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)).
+### pip
 This tutorial guides you through every step of running Yi (Yi-34B-Chat) locally and then performing inference.
+#### Step 0: Prerequistes
 - This tutorial assumes you are running the **Yi-34B-Chat** with an **A800 (80G)** GPU.
   - For detailed deployment requirements to run Yi models, see [hardware requirements]( https://github.com/01-ai/Yi/blob/main/docs/deployment.md).
 - Make sure Python 3.10 or later version is installed.
+#### Step 1: Prepare your environment
 To set up the environment and install the required packages, execute the following command.
 pip install -r requirements.txt
 ```
+#### Step 2: Download the Yi model
 You can download the weights and tokenizer of Yi models from the following sources:
 - [ModelScope](https://www.modelscope.cn/organization/01ai/)
 - [WiseModel](https://wisemodel.cn/organization/01.AI)
+#### Step 3: Perform inference
 You can perform inference with Yi chat or base models as below.
+##### Perform inference with Yi chat model
 1. Create a file named  `quick_start.py` and copy the following content to it.
     Hello! How can I assist you today?
     ```
+##### Perform inference with Yi base model
 The steps are similar to [Run Yi chat model](#run-yi-chat-model).
 </details>
+### Run Yi with llama.cpp
+If you have limited resources, you can try [llama.cpp](https://github.com/ggerganov/llama.cpp) or [ollama.cpp](https://ollama.ai/) (especially for Chinese users) to run Yi models in a few minutes locally.
+For a step-by-step tutorial,, see [Run Yi with llama.cpp](https://github.com/01-ai/Yi/edit/main/docs/yi_llama.cpp.md).
+### Web demo
+You can build a web UI demo for Yi **chat** models (note that Yi base models are not supported in this senario).
+[Step 1: Prepare your environment](#step-1-prepare-your-environment).
+[Step 2: Download the Yi model](#step-2-download-the-yi-model).
+Step 3. To start a web service locally, run the following command.
+```bash
+python demo/web_demo.py --checkpoint-path <your-model-path>
+```
+You can access the web UI by entering the address provided in the console into your browser.
+ ![Quick start - web demo](./assets/img/yi_34b_chat_web_demo.gif)
 ### Finetuning
 ```bash
 via applications. All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
 For free commercial use, you only need to send an email to [get official commercial permission](https://www.lingyiwanwu.com/yi-license).
+<div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>