Spaces:
Running
Running
again
Browse files- .github/workflows/build-with-latex.yml +44 -0
- Dockerfile +7 -3
- README.md +64 -60
- app.py +3 -3
- colorful.py +25 -55
- config.py +8 -0
- crazy_functional.py +179 -71
- crazy_functions/Langchain知识库.py +107 -0
- crazy_functions/Latex全文润色.py +3 -0
- crazy_functions/Latex输出PDF结果.py +300 -0
- crazy_functions/crazy_functions_test.py +102 -6
- crazy_functions/crazy_utils.py +140 -0
- crazy_functions/latex_utils.py +773 -0
- crazy_functions/对话历史存档.py +3 -4
- crazy_functions/数学动画生成manim.py +1 -1
- crazy_functions/理解PDF文档内容.py +3 -1
- crazy_functions/联网的ChatGPT_bing版.py +102 -0
- crazy_functions/虚空终端.py +131 -0
- docker-compose.yml +27 -0
- docs/Dockerfile+NoLocal+Latex +27 -0
- docs/GithubAction+NoLocal+Latex +25 -0
- docs/README.md.Italian.md +13 -7
- docs/README.md.Korean.md +4 -2
- docs/README.md.Portuguese.md +8 -4
- docs/translate_english.json +2 -0
- docs/use_azure.md +152 -0
- request_llm/bridge_all.py +40 -0
- request_llm/bridge_azure_test.py +241 -0
- toolbox.py +75 -16
- version +2 -2
.github/workflows/build-with-latex.yml
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
|
2 |
+
name: Create and publish a Docker image for Latex support
|
3 |
+
|
4 |
+
on:
|
5 |
+
push:
|
6 |
+
branches:
|
7 |
+
- 'master'
|
8 |
+
|
9 |
+
env:
|
10 |
+
REGISTRY: ghcr.io
|
11 |
+
IMAGE_NAME: ${{ github.repository }}_with_latex
|
12 |
+
|
13 |
+
jobs:
|
14 |
+
build-and-push-image:
|
15 |
+
runs-on: ubuntu-latest
|
16 |
+
permissions:
|
17 |
+
contents: read
|
18 |
+
packages: write
|
19 |
+
|
20 |
+
steps:
|
21 |
+
- name: Checkout repository
|
22 |
+
uses: actions/checkout@v3
|
23 |
+
|
24 |
+
- name: Log in to the Container registry
|
25 |
+
uses: docker/login-action@v2
|
26 |
+
with:
|
27 |
+
registry: ${{ env.REGISTRY }}
|
28 |
+
username: ${{ github.actor }}
|
29 |
+
password: ${{ secrets.GITHUB_TOKEN }}
|
30 |
+
|
31 |
+
- name: Extract metadata (tags, labels) for Docker
|
32 |
+
id: meta
|
33 |
+
uses: docker/metadata-action@v4
|
34 |
+
with:
|
35 |
+
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
|
36 |
+
|
37 |
+
- name: Build and push Docker image
|
38 |
+
uses: docker/build-push-action@v4
|
39 |
+
with:
|
40 |
+
context: .
|
41 |
+
push: true
|
42 |
+
file: docs/GithubAction+NoLocal+Latex
|
43 |
+
tags: ${{ steps.meta.outputs.tags }}
|
44 |
+
labels: ${{ steps.meta.outputs.labels }}
|
Dockerfile
CHANGED
@@ -10,12 +10,16 @@ RUN echo '[global]' > /etc/pip.conf && \
|
|
10 |
|
11 |
WORKDIR /gpt
|
12 |
|
13 |
-
|
14 |
-
|
15 |
|
16 |
# 安装依赖
|
|
|
|
|
|
|
|
|
|
|
17 |
RUN pip3 install -r requirements.txt
|
18 |
-
|
19 |
|
20 |
# 可选步骤,用于预热模块
|
21 |
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
|
|
10 |
|
11 |
WORKDIR /gpt
|
12 |
|
13 |
+
|
14 |
+
|
15 |
|
16 |
# 安装依赖
|
17 |
+
COPY requirements.txt ./
|
18 |
+
COPY ./docs/gradio-3.32.2-py3-none-any.whl ./docs/gradio-3.32.2-py3-none-any.whl
|
19 |
+
RUN pip3 install -r requirements.txt
|
20 |
+
# 装载项目文件
|
21 |
+
COPY . .
|
22 |
RUN pip3 install -r requirements.txt
|
|
|
23 |
|
24 |
# 可选步骤,用于预热模块
|
25 |
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
README.md
CHANGED
@@ -12,9 +12,9 @@ pinned: false
|
|
12 |
# ChatGPT 学术优化
|
13 |
> **Note**
|
14 |
>
|
15 |
-
> 5
|
16 |
>
|
17 |
-
> `pip install -r requirements.txt
|
18 |
>
|
19 |
|
20 |
# <img src="docs/logo.png" width="40" > GPT 学术优化 (GPT Academic)
|
@@ -28,7 +28,7 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
|
|
28 |
>
|
29 |
> 1.请注意只有**红颜色**标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR!
|
30 |
>
|
31 |
-
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/
|
32 |
>
|
33 |
> 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持多个api-key共存,可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
|
34 |
|
@@ -43,22 +43,23 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
|
|
43 |
一键中英互译 | 一键中英互译
|
44 |
一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
|
45 |
[自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
|
46 |
-
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/
|
47 |
-
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/
|
48 |
[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
|
49 |
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
|
50 |
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
|
51 |
批量注释生成 | [函数插件] 一键批量生成函数注释
|
52 |
-
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/
|
53 |
chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
|
54 |
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文(多线程)
|
55 |
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插���] 输入arxiv文章url即可一键翻译摘要+下载PDF
|
56 |
[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) | [函数插件] 给定任意谷歌学术搜索页面URL,让gpt帮你[写relatedworks](https://www.bilibili.com/video/BV1GP411U7Az/)
|
57 |
互联网信息聚合+GPT | [函数插件] 一键[让GPT先从互联网获取信息](https://www.bilibili.com/video/BV1om4y127ck),再回答问题,让信息永不过时
|
|
|
58 |
公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png),支持公式、代码高亮
|
59 |
多线程函数插件支持 | 支持多线调用chatgpt,一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
|
60 |
-
启动暗色gradio[主题](https://github.com/binary-husky/
|
61 |
-
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)
|
62 |
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama),[RWKV](https://github.com/BlinkDL/ChatRWKV)和[盘古α](https://openi.org.cn/pangu/)
|
63 |
更多新功能展示(图像生成等) …… | 见本文档结尾处 ……
|
64 |
|
@@ -102,13 +103,13 @@ chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
|
|
102 |
|
103 |
1. 下载项目
|
104 |
```sh
|
105 |
-
git clone https://github.com/binary-husky/
|
106 |
-
cd
|
107 |
```
|
108 |
|
109 |
2. 配置API_KEY
|
110 |
|
111 |
-
在`config.py`中,配置API KEY等设置,[
|
112 |
|
113 |
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
114 |
|
@@ -124,6 +125,7 @@ conda activate gptac_venv # 激活anaconda环境
|
|
124 |
python -m pip install -r requirements.txt # 这个步骤和pip安装一样的步骤
|
125 |
```
|
126 |
|
|
|
127 |
<details><summary>如果需要支持清华ChatGLM/复旦MOSS作为后端,请点击展开此处</summary>
|
128 |
<p>
|
129 |
|
@@ -150,19 +152,13 @@ AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-
|
|
150 |
python main.py
|
151 |
```
|
152 |
|
153 |
-
5. 测试函数插件
|
154 |
-
```
|
155 |
-
- 测试函数插件模板函数(要求gpt回答历史上的今天发生了什么),您可以根据此函数为模板,实现更复杂的功能
|
156 |
-
点击 "[函数插件模板Demo] 历史上的今天"
|
157 |
-
```
|
158 |
-
|
159 |
## 安装-方法2:使用Docker
|
160 |
|
161 |
-
1. 仅ChatGPT
|
162 |
|
163 |
``` sh
|
164 |
-
git clone https://github.com/binary-husky/
|
165 |
-
cd
|
166 |
nano config.py # 用任意文本编辑器编辑config.py, 配置 “Proxy”, “API_KEY” 以及 “WEB_PORT” (例如50923) 等
|
167 |
docker build -t gpt-academic . # 安装
|
168 |
|
@@ -171,37 +167,45 @@ docker run --rm -it --net=host gpt-academic
|
|
171 |
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
172 |
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
173 |
```
|
|
|
174 |
|
175 |
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
176 |
|
177 |
``` sh
|
178 |
-
# 修改docker-compose.yml
|
179 |
docker-compose up
|
180 |
```
|
181 |
|
182 |
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
183 |
``` sh
|
184 |
-
# 修改docker-compose.yml
|
185 |
docker-compose up
|
186 |
```
|
187 |
|
188 |
|
189 |
## 安装-方法3:其他部署姿势
|
|
|
|
|
|
|
|
|
|
|
|
|
190 |
|
191 |
-
|
192 |
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
193 |
|
194 |
-
|
195 |
-
|
196 |
|
197 |
-
|
198 |
-
请访问[部署wiki-
|
199 |
|
200 |
-
|
|
|
|
|
|
|
201 |
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
202 |
|
203 |
-
5. 使用docker-compose运行
|
204 |
-
请阅读docker-compose.yml后,按照其中的提示操作即可
|
205 |
---
|
206 |
# Advanced Usage
|
207 |
## 自定义新的便捷按钮 / 自定义函数插件
|
@@ -226,7 +230,7 @@ docker-compose up
|
|
226 |
|
227 |
编写强大的函数插件来执行任何你想得到的和想不到的任务。
|
228 |
本项目的插件编写、调试难度很低,只要您具备一定的python基础知识,就可以仿照我们提供的模板实现自己的插件功能。
|
229 |
-
详情请参考[函数插件指南](https://github.com/binary-husky/
|
230 |
|
231 |
---
|
232 |
# Latest Update
|
@@ -234,38 +238,33 @@ docker-compose up
|
|
234 |
|
235 |
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件,
|
236 |
另外在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
|
237 |
-
Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史html
|
238 |
<div align="center">
|
239 |
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
240 |
</div>
|
241 |
|
242 |
-
|
243 |
-
|
244 |
-
2. 生成报告。大部分插件都会在执行结束后,生成工作报告
|
245 |
<div align="center">
|
246 |
-
<img src="https://
|
247 |
-
<img src="https://
|
248 |
-
<img src="https://user-images.githubusercontent.com/96192199/227504005-efeaefe0-b687-49d0-bf95-2d7b7e66c348.png" height="300" >
|
249 |
</div>
|
250 |
|
251 |
-
3.
|
252 |
<div align="center">
|
253 |
-
<img src="https://user-images.githubusercontent.com/96192199/
|
254 |
-
<img src="https://user-images.githubusercontent.com/96192199/
|
255 |
</div>
|
256 |
|
257 |
-
4.
|
258 |
<div align="center">
|
259 |
-
<img src="https://user-images.githubusercontent.com/96192199/
|
260 |
-
|
261 |
-
|
262 |
-
5. 译解其他开源项目,不在话下
|
263 |
-
<div align="center">
|
264 |
-
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" width="500" >
|
265 |
</div>
|
266 |
|
|
|
267 |
<div align="center">
|
268 |
-
<img src="https://user-images.githubusercontent.com/96192199/
|
|
|
269 |
</div>
|
270 |
|
271 |
6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`)
|
@@ -290,13 +289,15 @@ Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史h
|
|
290 |
|
291 |
10. Latex全文校对纠错
|
292 |
<div align="center">
|
293 |
-
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033"
|
|
|
294 |
</div>
|
295 |
|
296 |
|
|
|
297 |
## 版本:
|
298 |
- version 3.5(Todo): 使用自然语言调用本项目的所有函数插件(高优先级)
|
299 |
-
- version 3.4
|
300 |
- version 3.3: +互联网信息综合功能
|
301 |
- version 3.2: 函数插件支持更多参数接口 (保存对话功能, 解读任意语言代码+同时询问任意的LLM组合)
|
302 |
- version 3.1: 支持同时问询多个gpt模型!支持api2d,支持多个apikey负载均衡
|
@@ -314,29 +315,32 @@ gpt_academic开发者QQ群-2:610599535
|
|
314 |
|
315 |
- 已知问题
|
316 |
- 某些浏览器翻译插件干扰此软件前端的运行
|
317 |
-
- 官方Gradio目前有很多兼容性Bug
|
318 |
|
319 |
## 参考与学习
|
320 |
|
321 |
```
|
322 |
-
|
323 |
|
324 |
-
#
|
325 |
https://github.com/THUDM/ChatGLM-6B
|
326 |
|
327 |
-
#
|
328 |
https://github.com/Jittor/JittorLLMs
|
329 |
|
330 |
-
#
|
|
|
|
|
|
|
331 |
https://github.com/acheong08/EdgeGPT
|
332 |
|
333 |
-
#
|
334 |
https://github.com/GaiZhenbiao/ChuanhuChatGPT
|
335 |
|
336 |
-
#
|
337 |
-
https://github.com/
|
338 |
|
339 |
-
#
|
340 |
https://github.com/gradio-app/gradio
|
341 |
https://github.com/fghrsh/live2d_demo
|
342 |
```
|
|
|
12 |
# ChatGPT 学术优化
|
13 |
> **Note**
|
14 |
>
|
15 |
+
> 2023.5.27 对Gradio依赖进行了调整,Fork并解决了官方Gradio的若干Bugs。请及时**更新代码**并重新更新pip依赖。安装依赖时,请严格选择`requirements.txt`中**指定的版本**:
|
16 |
>
|
17 |
+
> `pip install -r requirements.txt`
|
18 |
>
|
19 |
|
20 |
# <img src="docs/logo.png" width="40" > GPT 学术优化 (GPT Academic)
|
|
|
28 |
>
|
29 |
> 1.请注意只有**红颜色**标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR!
|
30 |
>
|
31 |
+
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代,您也可以随时自行点击相关函数插件,调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/gpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
|
32 |
>
|
33 |
> 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持多个api-key共存,可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
|
34 |
|
|
|
43 |
一键中英互译 | 一键中英互译
|
44 |
一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
|
45 |
[自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
|
46 |
+
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/gpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
|
47 |
+
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
|
48 |
[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
|
49 |
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
|
50 |
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
|
51 |
批量注释生成 | [函数插件] 一键批量生成函数注释
|
52 |
+
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README_EN.md)了吗?
|
53 |
chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
|
54 |
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文(多线程)
|
55 |
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插���] 输入arxiv文章url即可一键翻译摘要+下载PDF
|
56 |
[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) | [函数插件] 给定任意谷歌学术搜索页面URL,让gpt帮你[写relatedworks](https://www.bilibili.com/video/BV1GP411U7Az/)
|
57 |
互联网信息聚合+GPT | [函数插件] 一键[让GPT先从互联网获取信息](https://www.bilibili.com/video/BV1om4y127ck),再回答问题,让信息永不过时
|
58 |
+
⭐Arxiv论文精细翻译 | [函数插件] 一键[以超高质量翻译arxiv论文](https://www.bilibili.com/video/BV1dz4y1v77A/),迄今为止最好的论文翻译工具⭐
|
59 |
公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png),支持公式、代码高亮
|
60 |
多线程函数插件支持 | 支持多线调用chatgpt,一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
|
61 |
+
启动暗色gradio[主题](https://github.com/binary-husky/gpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
|
62 |
+
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧?
|
63 |
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama),[RWKV](https://github.com/BlinkDL/ChatRWKV)和[盘古α](https://openi.org.cn/pangu/)
|
64 |
更多新功能展示(图像生成等) …… | 见本文档结尾处 ……
|
65 |
|
|
|
103 |
|
104 |
1. 下载项目
|
105 |
```sh
|
106 |
+
git clone https://github.com/binary-husky/gpt_academic.git
|
107 |
+
cd gpt_academic
|
108 |
```
|
109 |
|
110 |
2. 配置API_KEY
|
111 |
|
112 |
+
在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
113 |
|
114 |
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
115 |
|
|
|
125 |
python -m pip install -r requirements.txt # 这个步骤和pip安装一样的步骤
|
126 |
```
|
127 |
|
128 |
+
|
129 |
<details><summary>如果需要支持清华ChatGLM/复旦MOSS作为后端,请点击展开此处</summary>
|
130 |
<p>
|
131 |
|
|
|
152 |
python main.py
|
153 |
```
|
154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
155 |
## 安装-方法2:使用Docker
|
156 |
|
157 |
+
1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1)
|
158 |
|
159 |
``` sh
|
160 |
+
git clone https://github.com/binary-husky/gpt_academic.git # 下载项目
|
161 |
+
cd gpt_academic # 进入路径
|
162 |
nano config.py # 用任意文本编辑器编辑config.py, 配置 “Proxy”, “API_KEY” 以及 “WEB_PORT” (例如50923) 等
|
163 |
docker build -t gpt-academic . # 安装
|
164 |
|
|
|
167 |
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
168 |
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
169 |
```
|
170 |
+
P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用docker-compose获取Latex功能(修改docker-compose.yml,保留方案4并删除其他方案)。
|
171 |
|
172 |
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
173 |
|
174 |
``` sh
|
175 |
+
# 修改docker-compose.yml,保留方案2并删除其他方案。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
176 |
docker-compose up
|
177 |
```
|
178 |
|
179 |
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
180 |
``` sh
|
181 |
+
# 修改docker-compose.yml,保留方案3并删除其他方案。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
182 |
docker-compose up
|
183 |
```
|
184 |
|
185 |
|
186 |
## 安装-方法3:其他部署姿势
|
187 |
+
1. 一键运行脚本。
|
188 |
+
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本。
|
189 |
+
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
|
190 |
+
|
191 |
+
2. 使用docker-compose运行。
|
192 |
+
请阅读docker-compose.yml后,按照其中的提示操作即可
|
193 |
|
194 |
+
3. 如何使用反代URL
|
195 |
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
196 |
|
197 |
+
4. 微软云AzureAPI
|
198 |
+
按照`config.py`中的说明配置即可(AZURE_ENDPOINT等四个配置)
|
199 |
|
200 |
+
5. 远程云服务器部署(需要云服务器知识与经验)。
|
201 |
+
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
|
202 |
|
203 |
+
6. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
204 |
+
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
|
205 |
+
|
206 |
+
7. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
207 |
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
208 |
|
|
|
|
|
209 |
---
|
210 |
# Advanced Usage
|
211 |
## 自定义新的便捷按钮 / 自定义函数插件
|
|
|
230 |
|
231 |
编写强大的函数插件来执行任何你想得到的和想不到的任务。
|
232 |
本项目的插件编写、调试难度很低,只要您具备一定的python基础知识,就可以仿照我们提供的模板实现自己的插件功能。
|
233 |
+
详情请参考[函数插件指南](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
|
234 |
|
235 |
---
|
236 |
# Latest Update
|
|
|
238 |
|
239 |
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件,
|
240 |
另外在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
|
241 |
+
Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存。
|
242 |
<div align="center">
|
243 |
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
244 |
</div>
|
245 |
|
246 |
+
2. ⭐Latex/Arxiv论文翻译功能⭐
|
|
|
|
|
247 |
<div align="center">
|
248 |
+
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/002a1a75-ace0-4e6a-94e2-ec1406a746f1" height="250" > ===>
|
249 |
+
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/9fdcc391-f823-464f-9322-f8719677043b" height="250" >
|
|
|
250 |
</div>
|
251 |
|
252 |
+
3. 生成报告。大部分插件都会在执行结束后,生成工作报告
|
253 |
<div align="center">
|
254 |
+
<img src="https://user-images.githubusercontent.com/96192199/227503770-fe29ce2c-53fd-47b0-b0ff-93805f0c2ff4.png" height="250" >
|
255 |
+
<img src="https://user-images.githubusercontent.com/96192199/227504617-7a497bb3-0a2a-4b50-9a8a-95ae60ea7afd.png" height="250" >
|
256 |
</div>
|
257 |
|
258 |
+
4. 模块化功能设计,简单的接口却能支持强大的功能
|
259 |
<div align="center">
|
260 |
+
<img src="https://user-images.githubusercontent.com/96192199/229288270-093643c1-0018-487a-81e6-1d7809b6e90f.png" height="400" >
|
261 |
+
<img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
|
|
|
|
|
|
|
|
|
262 |
</div>
|
263 |
|
264 |
+
5. 译解其他开源项目
|
265 |
<div align="center">
|
266 |
+
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" height="250" >
|
267 |
+
<img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" height="250" >
|
268 |
</div>
|
269 |
|
270 |
6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`)
|
|
|
289 |
|
290 |
10. Latex全文校对纠错
|
291 |
<div align="center">
|
292 |
+
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033" height="200" > ===>
|
293 |
+
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/476f66d9-7716-4537-b5c1-735372c25adb" height="200">
|
294 |
</div>
|
295 |
|
296 |
|
297 |
+
|
298 |
## 版本:
|
299 |
- version 3.5(Todo): 使用自然语言调用本项目的所有函数插件(高优先级)
|
300 |
+
- version 3.4: +arxiv论文翻译、latex论文批改功能
|
301 |
- version 3.3: +互联网信息综合功能
|
302 |
- version 3.2: 函数插件支持更多参数接口 (保存对话功能, 解读任意语言代码+同时询问任意的LLM组合)
|
303 |
- version 3.1: 支持同时问询多个gpt模型!支持api2d,支持多个apikey负载均衡
|
|
|
315 |
|
316 |
- 已知问题
|
317 |
- 某些浏览器翻译插件干扰此软件前端的运行
|
318 |
+
- 官方Gradio目前有很多兼容性Bug,请务必使用`requirement.txt`安装Gradio
|
319 |
|
320 |
## 参考与学习
|
321 |
|
322 |
```
|
323 |
+
代码中参考了很多其他优秀项目中的设计,顺序不分先后:
|
324 |
|
325 |
+
# 清华ChatGLM-6B:
|
326 |
https://github.com/THUDM/ChatGLM-6B
|
327 |
|
328 |
+
# 清华JittorLLMs:
|
329 |
https://github.com/Jittor/JittorLLMs
|
330 |
|
331 |
+
# ChatPaper:
|
332 |
+
https://github.com/kaixindelele/ChatPaper
|
333 |
+
|
334 |
+
# Edge-GPT:
|
335 |
https://github.com/acheong08/EdgeGPT
|
336 |
|
337 |
+
# ChuanhuChatGPT:
|
338 |
https://github.com/GaiZhenbiao/ChuanhuChatGPT
|
339 |
|
340 |
+
# Oobabooga one-click installer:
|
341 |
+
https://github.com/oobabooga/one-click-installers
|
342 |
|
343 |
+
# More:
|
344 |
https://github.com/gradio-app/gradio
|
345 |
https://github.com/fghrsh/live2d_demo
|
346 |
```
|
app.py
CHANGED
@@ -2,7 +2,7 @@ import os; os.environ['no_proxy'] = '*' # 避免代理网络产生意外污染
|
|
2 |
|
3 |
def main():
|
4 |
import subprocess, sys
|
5 |
-
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-
|
6 |
import gradio as gr
|
7 |
if gr.__version__ not in ['3.28.3','3.32.3']: assert False, "请用 pip install -r requirements.txt 安装依赖"
|
8 |
from request_llm.bridge_all import predict
|
@@ -158,7 +158,7 @@ def main():
|
|
158 |
for k in crazy_fns:
|
159 |
if not crazy_fns[k].get("AsButton", True): continue
|
160 |
click_handle = crazy_fns[k]["Button"].click(ArgsGeneralWrapper(crazy_fns[k]["Function"]), [*input_combo, gr.State(PORT)], output_combo)
|
161 |
-
click_handle.then(on_report_generated, [file_upload, chatbot], [file_upload, chatbot])
|
162 |
cancel_handles.append(click_handle)
|
163 |
# 函数插件-下拉菜单与随变按钮的互动
|
164 |
def on_dropdown_changed(k):
|
@@ -178,7 +178,7 @@ def main():
|
|
178 |
if k in [r"打开插件列表", r"请先从插件列表中选择"]: return
|
179 |
yield from ArgsGeneralWrapper(crazy_fns[k]["Function"])(*args, **kwargs)
|
180 |
click_handle = switchy_bt.click(route,[switchy_bt, *input_combo, gr.State(PORT)], output_combo)
|
181 |
-
click_handle.then(on_report_generated, [file_upload, chatbot], [file_upload, chatbot])
|
182 |
cancel_handles.append(click_handle)
|
183 |
# 终止按钮的回调函数注册
|
184 |
stopBtn.click(fn=None, inputs=None, outputs=None, cancels=cancel_handles)
|
|
|
2 |
|
3 |
def main():
|
4 |
import subprocess, sys
|
5 |
+
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'gradio-stable-fork'])
|
6 |
import gradio as gr
|
7 |
if gr.__version__ not in ['3.28.3','3.32.3']: assert False, "请用 pip install -r requirements.txt 安装依赖"
|
8 |
from request_llm.bridge_all import predict
|
|
|
158 |
for k in crazy_fns:
|
159 |
if not crazy_fns[k].get("AsButton", True): continue
|
160 |
click_handle = crazy_fns[k]["Button"].click(ArgsGeneralWrapper(crazy_fns[k]["Function"]), [*input_combo, gr.State(PORT)], output_combo)
|
161 |
+
click_handle.then(on_report_generated, [cookies, file_upload, chatbot], [cookies, file_upload, chatbot])
|
162 |
cancel_handles.append(click_handle)
|
163 |
# 函数插件-下拉菜单与随变按钮的互动
|
164 |
def on_dropdown_changed(k):
|
|
|
178 |
if k in [r"打开插件列表", r"请先从插件列表中选择"]: return
|
179 |
yield from ArgsGeneralWrapper(crazy_fns[k]["Function"])(*args, **kwargs)
|
180 |
click_handle = switchy_bt.click(route,[switchy_bt, *input_combo, gr.State(PORT)], output_combo)
|
181 |
+
click_handle.then(on_report_generated, [cookies, file_upload, chatbot], [cookies, file_upload, chatbot])
|
182 |
cancel_handles.append(click_handle)
|
183 |
# 终止按钮的回调函数注册
|
184 |
stopBtn.click(fn=None, inputs=None, outputs=None, cancels=cancel_handles)
|
colorful.py
CHANGED
@@ -34,58 +34,28 @@ def print亮紫(*kw,**kargs):
|
|
34 |
def print亮靛(*kw,**kargs):
|
35 |
print("\033[1;36m",*kw,"\033[0m",**kargs)
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
print_bold_blue = print亮蓝
|
63 |
-
print_bold_purple = print亮紫
|
64 |
-
print_bold_indigo = print亮靛
|
65 |
-
|
66 |
-
if not stdout.isatty():
|
67 |
-
# redirection, avoid a fucked up log file
|
68 |
-
print红 = print
|
69 |
-
print绿 = print
|
70 |
-
print黄 = print
|
71 |
-
print蓝 = print
|
72 |
-
print紫 = print
|
73 |
-
print靛 = print
|
74 |
-
print亮红 = print
|
75 |
-
print亮绿 = print
|
76 |
-
print亮黄 = print
|
77 |
-
print亮蓝 = print
|
78 |
-
print亮紫 = print
|
79 |
-
print亮靛 = print
|
80 |
-
print_red = print
|
81 |
-
print_green = print
|
82 |
-
print_yellow = print
|
83 |
-
print_blue = print
|
84 |
-
print_purple = print
|
85 |
-
print_indigo = print
|
86 |
-
print_bold_red = print
|
87 |
-
print_bold_green = print
|
88 |
-
print_bold_yellow = print
|
89 |
-
print_bold_blue = print
|
90 |
-
print_bold_purple = print
|
91 |
-
print_bold_indigo = print
|
|
|
34 |
def print亮靛(*kw,**kargs):
|
35 |
print("\033[1;36m",*kw,"\033[0m",**kargs)
|
36 |
|
37 |
+
# Do you like the elegance of Chinese characters?
|
38 |
+
def sprint红(*kw):
|
39 |
+
return "\033[0;31m"+' '.join(kw)+"\033[0m"
|
40 |
+
def sprint绿(*kw):
|
41 |
+
return "\033[0;32m"+' '.join(kw)+"\033[0m"
|
42 |
+
def sprint黄(*kw):
|
43 |
+
return "\033[0;33m"+' '.join(kw)+"\033[0m"
|
44 |
+
def sprint蓝(*kw):
|
45 |
+
return "\033[0;34m"+' '.join(kw)+"\033[0m"
|
46 |
+
def sprint紫(*kw):
|
47 |
+
return "\033[0;35m"+' '.join(kw)+"\033[0m"
|
48 |
+
def sprint靛(*kw):
|
49 |
+
return "\033[0;36m"+' '.join(kw)+"\033[0m"
|
50 |
+
def sprint亮红(*kw):
|
51 |
+
return "\033[1;31m"+' '.join(kw)+"\033[0m"
|
52 |
+
def sprint亮绿(*kw):
|
53 |
+
return "\033[1;32m"+' '.join(kw)+"\033[0m"
|
54 |
+
def sprint亮黄(*kw):
|
55 |
+
return "\033[1;33m"+' '.join(kw)+"\033[0m"
|
56 |
+
def sprint亮蓝(*kw):
|
57 |
+
return "\033[1;34m"+' '.join(kw)+"\033[0m"
|
58 |
+
def sprint亮紫(*kw):
|
59 |
+
return "\033[1;35m"+' '.join(kw)+"\033[0m"
|
60 |
+
def sprint亮靛(*kw):
|
61 |
+
return "\033[1;36m"+' '.join(kw)+"\033[0m"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.py
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
2 |
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
3 |
|
|
|
4 |
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
5 |
USE_PROXY = False
|
6 |
if USE_PROXY:
|
@@ -80,3 +81,10 @@ your bing cookies here
|
|
80 |
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
81 |
SLACK_CLAUDE_BOT_ID = ''
|
82 |
SLACK_CLAUDE_USER_TOKEN = ''
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
2 |
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
3 |
|
4 |
+
|
5 |
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
6 |
USE_PROXY = False
|
7 |
if USE_PROXY:
|
|
|
81 |
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
82 |
SLACK_CLAUDE_BOT_ID = ''
|
83 |
SLACK_CLAUDE_USER_TOKEN = ''
|
84 |
+
|
85 |
+
|
86 |
+
# 如果需要使用AZURE 详情请见额外文档 docs\use_azure.md
|
87 |
+
AZURE_ENDPOINT = "https://你的api名称.openai.azure.com/"
|
88 |
+
AZURE_API_KEY = "填入azure openai api的密钥"
|
89 |
+
AZURE_API_VERSION = "填入api版本"
|
90 |
+
AZURE_ENGINE = "填入ENGINE"
|
crazy_functional.py
CHANGED
@@ -112,11 +112,11 @@ def get_crazy_functions():
|
|
112 |
"AsButton": False, # 加入下拉菜单中
|
113 |
"Function": HotReload(解析项目本身)
|
114 |
},
|
115 |
-
"[老旧的Demo] 把本项目源代码切换成全英文": {
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
},
|
120 |
"[插件demo] 历史上的今天": {
|
121 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
122 |
"Function": HotReload(高阶功能模板函数)
|
@@ -126,7 +126,7 @@ def get_crazy_functions():
|
|
126 |
###################### 第二组插件 ###########################
|
127 |
# [第二组插件]: 经过充分测试
|
128 |
from crazy_functions.批量总结PDF文档 import 批量总结PDF文档
|
129 |
-
from crazy_functions.批量总结PDF文档pdfminer import 批量总结PDF文档pdfminer
|
130 |
from crazy_functions.批量翻译PDF文档_多线程 import 批量翻译PDF文档
|
131 |
from crazy_functions.谷歌检索小助手 import 谷歌检索小助手
|
132 |
from crazy_functions.理解PDF文档内容 import 理解PDF文档内容标准文件输入
|
@@ -152,17 +152,16 @@ def get_crazy_functions():
|
|
152 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
153 |
"Function": HotReload(批量总结PDF文档)
|
154 |
},
|
155 |
-
"[测试功能] 批量总结PDF文档pdfminer": {
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
},
|
160 |
"谷歌学术检索助手(输入谷歌学术搜索页url)": {
|
161 |
"Color": "stop",
|
162 |
"AsButton": False, # 加入下拉菜单中
|
163 |
"Function": HotReload(谷歌检索小助手)
|
164 |
},
|
165 |
-
|
166 |
"理解PDF文档内容 (模仿ChatPDF)": {
|
167 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
168 |
"Color": "stop",
|
@@ -181,7 +180,7 @@ def get_crazy_functions():
|
|
181 |
"AsButton": False, # 加入下拉菜单中
|
182 |
"Function": HotReload(Latex英文纠错)
|
183 |
},
|
184 |
-
"
|
185 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
186 |
"Color": "stop",
|
187 |
"AsButton": False, # 加入下拉菜单中
|
@@ -210,65 +209,96 @@ def get_crazy_functions():
|
|
210 |
})
|
211 |
|
212 |
###################### 第三组插件 ###########################
|
213 |
-
# [第三组插件]:
|
214 |
-
from crazy_functions.下载arxiv论文翻译摘要 import 下载arxiv论文并翻译摘要
|
215 |
-
function_plugins.update({
|
216 |
-
"一键下载arxiv论文并翻译摘要(先在input输入编号,如1812.10695)": {
|
217 |
-
"Color": "stop",
|
218 |
-
"AsButton": False, # 加入下拉菜单中
|
219 |
-
"Function": HotReload(下载arxiv论文并翻译摘要)
|
220 |
-
}
|
221 |
-
})
|
222 |
|
223 |
-
|
224 |
-
|
225 |
-
|
226 |
-
"
|
227 |
-
|
228 |
-
|
229 |
-
|
230 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
231 |
|
232 |
-
from crazy_functions.解析项目源代码 import 解析任意code项目
|
233 |
-
function_plugins.update({
|
234 |
-
"解析项目源代码(手动指定和筛选源代码文件类型)": {
|
235 |
-
"Color": "stop",
|
236 |
-
"AsButton": False,
|
237 |
-
"AdvancedArgs": True, # 调用时,唤起高级参数输入区(默认False)
|
238 |
-
"ArgsReminder": "输入时用逗号隔开, *代表通配符, 加了^代表不匹配; 不输入代表全部匹配。例如: \"*.c, ^*.cpp, config.toml, ^*.toml\"", # 高级参数输入区的显示提示
|
239 |
-
"Function": HotReload(解析任意code项目)
|
240 |
-
},
|
241 |
-
})
|
242 |
-
from crazy_functions.询问多个大语言模型 import 同时问询_指定模型
|
243 |
-
function_plugins.update({
|
244 |
-
"询问多个GPT模型(手动指定询问哪些模型)": {
|
245 |
-
"Color": "stop",
|
246 |
-
"AsButton": False,
|
247 |
-
"AdvancedArgs": True, # 调用时,唤起高级参数输入区(默认False)
|
248 |
-
"ArgsReminder": "支持任意数量的llm接口,用&符号分隔。例如chatglm&gpt-3.5-turbo&api2d-gpt-4", # 高级参数输入区的显示提示
|
249 |
-
"Function": HotReload(同时问询_指定模型)
|
250 |
-
},
|
251 |
-
})
|
252 |
-
from crazy_functions.图片生成 import 图片生成
|
253 |
-
function_plugins.update({
|
254 |
-
"图片生成(先切换模型到openai或api2d)": {
|
255 |
-
"Color": "stop",
|
256 |
-
"AsButton": False,
|
257 |
-
"AdvancedArgs": True, # 调用时,唤起高级参数输入区(默认False)
|
258 |
-
"ArgsReminder": "在这里输入分辨率, 如256x256(默认)", # 高级参数输入区的显示��示
|
259 |
-
"Function": HotReload(图片生成)
|
260 |
-
},
|
261 |
-
})
|
262 |
-
from crazy_functions.总结音视频 import 总结音视频
|
263 |
-
function_plugins.update({
|
264 |
-
"批量总结音视频(输入路径或上传压缩包)": {
|
265 |
-
"Color": "stop",
|
266 |
-
"AsButton": False,
|
267 |
-
"AdvancedArgs": True,
|
268 |
-
"ArgsReminder": "调用openai api 使用whisper-1模型, 目前支持的格式:mp4, m4a, wav, mpga, mpeg, mp3。此处可以输入解析提示,例如:解析为简体中文(默认)。",
|
269 |
-
"Function": HotReload(总结音视频)
|
270 |
-
}
|
271 |
-
})
|
272 |
try:
|
273 |
from crazy_functions.数学动画生成manim import 动画生成
|
274 |
function_plugins.update({
|
@@ -295,5 +325,83 @@ def get_crazy_functions():
|
|
295 |
except:
|
296 |
print('Load function plugin failed')
|
297 |
|
298 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
299 |
return function_plugins
|
|
|
112 |
"AsButton": False, # 加入下拉菜单中
|
113 |
"Function": HotReload(解析项目本身)
|
114 |
},
|
115 |
+
# "[老旧的Demo] 把本项目源代码切换成全英文": {
|
116 |
+
# # HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
117 |
+
# "AsButton": False, # 加入下拉菜单中
|
118 |
+
# "Function": HotReload(全项目切换英文)
|
119 |
+
# },
|
120 |
"[插件demo] 历史上的今天": {
|
121 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
122 |
"Function": HotReload(高阶功能模板函数)
|
|
|
126 |
###################### 第二组插件 ###########################
|
127 |
# [第二组插件]: 经过充分测试
|
128 |
from crazy_functions.批量总结PDF文档 import 批量总结PDF文档
|
129 |
+
# from crazy_functions.批量总结PDF文档pdfminer import 批量总结PDF文档pdfminer
|
130 |
from crazy_functions.批量翻译PDF文档_多线程 import 批量翻译PDF文档
|
131 |
from crazy_functions.谷歌检索小助手 import 谷歌检索小助手
|
132 |
from crazy_functions.理解PDF文档内容 import 理解PDF文档内容标准文件输入
|
|
|
152 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
153 |
"Function": HotReload(批量总结PDF文档)
|
154 |
},
|
155 |
+
# "[测试功能] 批量总结PDF文档pdfminer": {
|
156 |
+
# "Color": "stop",
|
157 |
+
# "AsButton": False, # 加入下拉菜单中
|
158 |
+
# "Function": HotReload(批量总结PDF文档pdfminer)
|
159 |
+
# },
|
160 |
"谷歌学术检索助手(输入谷歌学术搜索页url)": {
|
161 |
"Color": "stop",
|
162 |
"AsButton": False, # 加入下拉菜单中
|
163 |
"Function": HotReload(谷歌检索小助手)
|
164 |
},
|
|
|
165 |
"理解PDF文档内容 (模仿ChatPDF)": {
|
166 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
167 |
"Color": "stop",
|
|
|
180 |
"AsButton": False, # 加入下拉菜单中
|
181 |
"Function": HotReload(Latex英文纠错)
|
182 |
},
|
183 |
+
"中文Latex项目全文润色(输入路径或上传压缩包)": {
|
184 |
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
185 |
"Color": "stop",
|
186 |
"AsButton": False, # 加入下拉菜单中
|
|
|
209 |
})
|
210 |
|
211 |
###################### 第三组插件 ###########################
|
212 |
+
# [第三组插件]: 尚未充分测试的函数插件
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
213 |
|
214 |
+
try:
|
215 |
+
from crazy_functions.下载arxiv论文翻译摘要 import 下载arxiv论文并翻译摘要
|
216 |
+
function_plugins.update({
|
217 |
+
"一键下载arxiv论文并翻译摘要(先在input输入编号,如1812.10695)": {
|
218 |
+
"Color": "stop",
|
219 |
+
"AsButton": False, # 加入下拉菜单中
|
220 |
+
"Function": HotReload(下载arxiv论文并翻译摘要)
|
221 |
+
}
|
222 |
+
})
|
223 |
+
except:
|
224 |
+
print('Load function plugin failed')
|
225 |
+
|
226 |
+
try:
|
227 |
+
from crazy_functions.联网的ChatGPT import 连接网络回答问题
|
228 |
+
function_plugins.update({
|
229 |
+
"连接网络回答问题(输入问题后点击该插件,需要访问谷歌)": {
|
230 |
+
"Color": "stop",
|
231 |
+
"AsButton": False, # 加入下拉菜单中
|
232 |
+
"Function": HotReload(连接网络回答问题)
|
233 |
+
}
|
234 |
+
})
|
235 |
+
from crazy_functions.联网的ChatGPT_bing版 import 连接bing搜索回答问题
|
236 |
+
function_plugins.update({
|
237 |
+
"连接网络回答问题(中文Bing版,输入问题后点击该插件)": {
|
238 |
+
"Color": "stop",
|
239 |
+
"AsButton": False, # 加入下拉菜单中
|
240 |
+
"Function": HotReload(连接bing搜索回答问题)
|
241 |
+
}
|
242 |
+
})
|
243 |
+
except:
|
244 |
+
print('Load function plugin failed')
|
245 |
+
|
246 |
+
try:
|
247 |
+
from crazy_functions.解析项目源代码 import 解析任意code项目
|
248 |
+
function_plugins.update({
|
249 |
+
"解析项目源代码(手动指定和筛选源代码文件类型)": {
|
250 |
+
"Color": "stop",
|
251 |
+
"AsButton": False,
|
252 |
+
"AdvancedArgs": True, # 调用时,唤起高级参数输入区(默认False)
|
253 |
+
"ArgsReminder": "输入时用逗号隔开, *代表通配符, 加了^代表不匹配; 不输入代表全部匹配。例如: \"*.c, ^*.cpp, config.toml, ^*.toml\"", # 高级参数输入区的显示提示
|
254 |
+
"Function": HotReload(解析任意code项目)
|
255 |
+
},
|
256 |
+
})
|
257 |
+
except:
|
258 |
+
print('Load function plugin failed')
|
259 |
+
|
260 |
+
try:
|
261 |
+
from crazy_functions.询问多个大语言模型 import 同时问询_指定模型
|
262 |
+
function_plugins.update({
|
263 |
+
"询问多个GPT模型(手动指定询问哪些模型)": {
|
264 |
+
"Color": "stop",
|
265 |
+
"AsButton": False,
|
266 |
+
"AdvancedArgs": True, # 调用时,唤起高级参数输入区(默认False)
|
267 |
+
"ArgsReminder": "支持任意数量的llm接口,用&符号分隔。例如chatglm&gpt-3.5-turbo&api2d-gpt-4", # 高级参数输入区的显示提示
|
268 |
+
"Function": HotReload(同时问询_指定模型)
|
269 |
+
},
|
270 |
+
})
|
271 |
+
except:
|
272 |
+
print('Load function plugin failed')
|
273 |
+
|
274 |
+
try:
|
275 |
+
from crazy_functions.图片生成 import 图片生成
|
276 |
+
function_plugins.update({
|
277 |
+
"图片生成(先切换模型到openai或api2d)": {
|
278 |
+
"Color": "stop",
|
279 |
+
"AsButton": False,
|
280 |
+
"AdvancedArgs": True, # 调用时,唤起高级参数输入区(默认False)
|
281 |
+
"ArgsReminder": "在这里输入分辨率, 如256x256(默认)", # 高级参数输入区的显示提示
|
282 |
+
"Function": HotReload(图片生成)
|
283 |
+
},
|
284 |
+
})
|
285 |
+
except:
|
286 |
+
print('Load function plugin failed')
|
287 |
+
|
288 |
+
try:
|
289 |
+
from crazy_functions.总结音视频 import 总结音视频
|
290 |
+
function_plugins.update({
|
291 |
+
"批量总结音视频(输入路径或上传压缩包)": {
|
292 |
+
"Color": "stop",
|
293 |
+
"AsButton": False,
|
294 |
+
"AdvancedArgs": True,
|
295 |
+
"ArgsReminder": "调用openai api 使用whisper-1模型, 目前支持的格式:mp4, m4a, wav, mpga, mpeg, mp3。此处可以输入解析提示,例如:解析为简体中文(默认)。",
|
296 |
+
"Function": HotReload(总结音视频)
|
297 |
+
}
|
298 |
+
})
|
299 |
+
except:
|
300 |
+
print('Load function plugin failed')
|
301 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
302 |
try:
|
303 |
from crazy_functions.数学动画生成manim import 动画生成
|
304 |
function_plugins.update({
|
|
|
325 |
except:
|
326 |
print('Load function plugin failed')
|
327 |
|
328 |
+
try:
|
329 |
+
from crazy_functions.Langchain知识库 import 知识库问答
|
330 |
+
function_plugins.update({
|
331 |
+
"[功能尚不稳定] 构建知识库(请先上传文件素材)": {
|
332 |
+
"Color": "stop",
|
333 |
+
"AsButton": False,
|
334 |
+
"AdvancedArgs": True,
|
335 |
+
"ArgsReminder": "待注入的知识库名称id, 默认为default",
|
336 |
+
"Function": HotReload(知识库问答)
|
337 |
+
}
|
338 |
+
})
|
339 |
+
except:
|
340 |
+
print('Load function plugin failed')
|
341 |
+
|
342 |
+
try:
|
343 |
+
from crazy_functions.Langchain知识库 import 读取知识库作答
|
344 |
+
function_plugins.update({
|
345 |
+
"[功能尚不稳定] 知识库问答": {
|
346 |
+
"Color": "stop",
|
347 |
+
"AsButton": False,
|
348 |
+
"AdvancedArgs": True,
|
349 |
+
"ArgsReminder": "待提取的知识库名称id, 默认为default, 您需要首先调用构建知识库",
|
350 |
+
"Function": HotReload(读取知识库作答)
|
351 |
+
}
|
352 |
+
})
|
353 |
+
except:
|
354 |
+
print('Load function plugin failed')
|
355 |
+
|
356 |
+
try:
|
357 |
+
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比
|
358 |
+
function_plugins.update({
|
359 |
+
"Latex英文纠错+高亮修正位置 [需Latex]": {
|
360 |
+
"Color": "stop",
|
361 |
+
"AsButton": False,
|
362 |
+
"AdvancedArgs": True,
|
363 |
+
"ArgsReminder": "如果有必要, 请在此处追加更细致的矫错指令(使用英文)。",
|
364 |
+
"Function": HotReload(Latex英文纠错加PDF对比)
|
365 |
+
}
|
366 |
+
})
|
367 |
+
from crazy_functions.Latex输出PDF结果 import Latex翻译中文并重新编译PDF
|
368 |
+
function_plugins.update({
|
369 |
+
"Arixv翻译(输入arxivID)[需Latex]": {
|
370 |
+
"Color": "stop",
|
371 |
+
"AsButton": False,
|
372 |
+
"AdvancedArgs": True,
|
373 |
+
"ArgsReminder":
|
374 |
+
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "+
|
375 |
+
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " + 'If the term "agent" is used in this section, it should be translated to "智能体". ',
|
376 |
+
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
377 |
+
}
|
378 |
+
})
|
379 |
+
function_plugins.update({
|
380 |
+
"本地论文翻译(上传Latex压缩包)[需Latex]": {
|
381 |
+
"Color": "stop",
|
382 |
+
"AsButton": False,
|
383 |
+
"AdvancedArgs": True,
|
384 |
+
"ArgsReminder":
|
385 |
+
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "+
|
386 |
+
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " + 'If the term "agent" is used in this section, it should be translated to "智能体". ',
|
387 |
+
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
388 |
+
}
|
389 |
+
})
|
390 |
+
except:
|
391 |
+
print('Load function plugin failed')
|
392 |
+
|
393 |
+
# try:
|
394 |
+
# from crazy_functions.虚空终端 import 终端
|
395 |
+
# function_plugins.update({
|
396 |
+
# "超级终端": {
|
397 |
+
# "Color": "stop",
|
398 |
+
# "AsButton": False,
|
399 |
+
# # "AdvancedArgs": True,
|
400 |
+
# # "ArgsReminder": "",
|
401 |
+
# "Function": HotReload(终端)
|
402 |
+
# }
|
403 |
+
# })
|
404 |
+
# except:
|
405 |
+
# print('Load function plugin failed')
|
406 |
+
|
407 |
return function_plugins
|
crazy_functions/Langchain知识库.py
ADDED
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from toolbox import CatchException, update_ui, ProxyNetworkActivate
|
2 |
+
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_files_from_everything
|
3 |
+
|
4 |
+
|
5 |
+
|
6 |
+
@CatchException
|
7 |
+
def 知识库问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
8 |
+
"""
|
9 |
+
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
|
10 |
+
llm_kwargs gpt模型参数, 如温度和top_p等, 一般原样传递下去就行
|
11 |
+
plugin_kwargs 插件模型的参数,暂时没有用武之地
|
12 |
+
chatbot 聊天显示框的句柄,用于显示给用户
|
13 |
+
history 聊天历史,前情提要
|
14 |
+
system_prompt 给gpt的静默提醒
|
15 |
+
web_port 当前软件运行的端口号
|
16 |
+
"""
|
17 |
+
history = [] # 清空历史,以免输入溢出
|
18 |
+
chatbot.append(("这是什么功能?", "[Local Message] 从一批文件(txt, md, tex)中读取数据构建知识库, 然后进行问答。"))
|
19 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
20 |
+
|
21 |
+
# resolve deps
|
22 |
+
try:
|
23 |
+
from zh_langchain import construct_vector_store
|
24 |
+
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
|
25 |
+
from .crazy_utils import knowledge_archive_interface
|
26 |
+
except Exception as e:
|
27 |
+
chatbot.append(
|
28 |
+
["依赖不足",
|
29 |
+
"导入依赖失败。正在尝试自动安装,请查看终端的输出或耐心等待..."]
|
30 |
+
)
|
31 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
32 |
+
from .crazy_utils import try_install_deps
|
33 |
+
try_install_deps(['zh_langchain==0.2.1'])
|
34 |
+
|
35 |
+
# < --------------------读取参数--------------- >
|
36 |
+
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
37 |
+
kai_id = plugin_kwargs.get("advanced_arg", 'default')
|
38 |
+
|
39 |
+
# < --------------------读取文件--------------- >
|
40 |
+
file_manifest = []
|
41 |
+
spl = ["txt", "doc", "docx", "email", "epub", "html", "json", "md", "msg", "pdf", "ppt", "pptx", "rtf"]
|
42 |
+
for sp in spl:
|
43 |
+
_, file_manifest_tmp, _ = get_files_from_everything(txt, type=f'.{sp}')
|
44 |
+
file_manifest += file_manifest_tmp
|
45 |
+
|
46 |
+
if len(file_manifest) == 0:
|
47 |
+
chatbot.append(["没有找到任何可读取文件", "当前支持的格式包括: txt, md, docx, pptx, pdf, json等"])
|
48 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
49 |
+
return
|
50 |
+
|
51 |
+
# < -------------------预热文本向量化模组--------------- >
|
52 |
+
chatbot.append(['<br/>'.join(file_manifest), "正在预热文本向量化模组, 如果是第一次运行, 将消耗较长时间下载中文向量化模型..."])
|
53 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
54 |
+
print('Checking Text2vec ...')
|
55 |
+
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
|
56 |
+
with ProxyNetworkActivate(): # 临时地激活代理网络
|
57 |
+
HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
|
58 |
+
|
59 |
+
# < -------------------构建知识库--------------- >
|
60 |
+
chatbot.append(['<br/>'.join(file_manifest), "正在构建知识库..."])
|
61 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
62 |
+
print('Establishing knowledge archive ...')
|
63 |
+
with ProxyNetworkActivate(): # 临时地激活代理网络
|
64 |
+
kai = knowledge_archive_interface()
|
65 |
+
kai.feed_archive(file_manifest=file_manifest, id=kai_id)
|
66 |
+
kai_files = kai.get_loaded_file()
|
67 |
+
kai_files = '<br/>'.join(kai_files)
|
68 |
+
# chatbot.append(['知识库构建成功', "正在将知识库存储至cookie中"])
|
69 |
+
# yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
70 |
+
# chatbot._cookies['langchain_plugin_embedding'] = kai.get_current_archive_id()
|
71 |
+
# chatbot._cookies['lock_plugin'] = 'crazy_functions.Langchain知识库->读取知识库作答'
|
72 |
+
# chatbot.append(['完成', "“根据知识库作答”函数插件已经接管问答系统, 提问吧! 但注意, 您接下来不能再使用其他插件了,刷新页面即可以退出知识库问答模式。"])
|
73 |
+
chatbot.append(['构建完成', f"当前知识库内的有效文件:\n\n---\n\n{kai_files}\n\n---\n\n请切换至“知识库问答”插件进行知识库访问, 或者使用此插件继续上传更多文件。"])
|
74 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
|
75 |
+
|
76 |
+
@CatchException
|
77 |
+
def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port=-1):
|
78 |
+
# resolve deps
|
79 |
+
try:
|
80 |
+
from zh_langchain import construct_vector_store
|
81 |
+
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
|
82 |
+
from .crazy_utils import knowledge_archive_interface
|
83 |
+
except Exception as e:
|
84 |
+
chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装,请查看终端的输出或耐��等待..."])
|
85 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
86 |
+
from .crazy_utils import try_install_deps
|
87 |
+
try_install_deps(['zh_langchain==0.2.1'])
|
88 |
+
|
89 |
+
# < ------------------- --------------- >
|
90 |
+
kai = knowledge_archive_interface()
|
91 |
+
|
92 |
+
if 'langchain_plugin_embedding' in chatbot._cookies:
|
93 |
+
resp, prompt = kai.answer_with_archive_by_id(txt, chatbot._cookies['langchain_plugin_embedding'])
|
94 |
+
else:
|
95 |
+
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
96 |
+
kai_id = plugin_kwargs.get("advanced_arg", 'default')
|
97 |
+
resp, prompt = kai.answer_with_archive_by_id(txt, kai_id)
|
98 |
+
|
99 |
+
chatbot.append((txt, '[Local Message] ' + prompt))
|
100 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
|
101 |
+
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
102 |
+
inputs=prompt, inputs_show_user=txt,
|
103 |
+
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
|
104 |
+
sys_prompt=system_prompt
|
105 |
+
)
|
106 |
+
history.extend((prompt, gpt_say))
|
107 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
|
crazy_functions/Latex全文润色.py
CHANGED
@@ -238,3 +238,6 @@ def Latex英文纠错(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
|
|
238 |
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
239 |
return
|
240 |
yield from 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='proofread')
|
|
|
|
|
|
|
|
238 |
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
239 |
return
|
240 |
yield from 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='proofread')
|
241 |
+
|
242 |
+
|
243 |
+
|
crazy_functions/Latex输出PDF结果.py
ADDED
@@ -0,0 +1,300 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from toolbox import update_ui, trimmed_format_exc, get_conf, objdump, objload, promote_file_to_downloadzone
|
2 |
+
from toolbox import CatchException, report_execption, update_ui_lastest_msg, zip_result, gen_time_str
|
3 |
+
from functools import partial
|
4 |
+
import glob, os, requests, time
|
5 |
+
pj = os.path.join
|
6 |
+
ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
|
7 |
+
|
8 |
+
# =================================== 工具函数 ===============================================
|
9 |
+
专业词汇声明 = 'If the term "agent" is used in this section, it should be translated to "智能体". '
|
10 |
+
def switch_prompt(pfg, mode, more_requirement):
|
11 |
+
"""
|
12 |
+
Generate prompts and system prompts based on the mode for proofreading or translating.
|
13 |
+
Args:
|
14 |
+
- pfg: Proofreader or Translator instance.
|
15 |
+
- mode: A string specifying the mode, either 'proofread' or 'translate_zh'.
|
16 |
+
|
17 |
+
Returns:
|
18 |
+
- inputs_array: A list of strings containing prompts for users to respond to.
|
19 |
+
- sys_prompt_array: A list of strings containing prompts for system prompts.
|
20 |
+
"""
|
21 |
+
n_split = len(pfg.sp_file_contents)
|
22 |
+
if mode == 'proofread_en':
|
23 |
+
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
|
24 |
+
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
|
25 |
+
r"Answer me only with the revised text:" +
|
26 |
+
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
27 |
+
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
|
28 |
+
elif mode == 'translate_zh':
|
29 |
+
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement +
|
30 |
+
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
|
31 |
+
r"Answer me only with the translated text:" +
|
32 |
+
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
33 |
+
sys_prompt_array = ["You are a professional translator." for _ in range(n_split)]
|
34 |
+
else:
|
35 |
+
assert False, "未知指令"
|
36 |
+
return inputs_array, sys_prompt_array
|
37 |
+
|
38 |
+
def desend_to_extracted_folder_if_exist(project_folder):
|
39 |
+
"""
|
40 |
+
Descend into the extracted folder if it exists, otherwise return the original folder.
|
41 |
+
|
42 |
+
Args:
|
43 |
+
- project_folder: A string specifying the folder path.
|
44 |
+
|
45 |
+
Returns:
|
46 |
+
- A string specifying the path to the extracted folder, or the original folder if there is no extracted folder.
|
47 |
+
"""
|
48 |
+
maybe_dir = [f for f in glob.glob(f'{project_folder}/*') if os.path.isdir(f)]
|
49 |
+
if len(maybe_dir) == 0: return project_folder
|
50 |
+
if maybe_dir[0].endswith('.extract'): return maybe_dir[0]
|
51 |
+
return project_folder
|
52 |
+
|
53 |
+
def move_project(project_folder, arxiv_id=None):
|
54 |
+
"""
|
55 |
+
Create a new work folder and copy the project folder to it.
|
56 |
+
|
57 |
+
Args:
|
58 |
+
- project_folder: A string specifying the folder path of the project.
|
59 |
+
|
60 |
+
Returns:
|
61 |
+
- A string specifying the path to the new work folder.
|
62 |
+
"""
|
63 |
+
import shutil, time
|
64 |
+
time.sleep(2) # avoid time string conflict
|
65 |
+
if arxiv_id is not None:
|
66 |
+
new_workfolder = pj(ARXIV_CACHE_DIR, arxiv_id, 'workfolder')
|
67 |
+
else:
|
68 |
+
new_workfolder = f'gpt_log/{gen_time_str()}'
|
69 |
+
try:
|
70 |
+
shutil.rmtree(new_workfolder)
|
71 |
+
except:
|
72 |
+
pass
|
73 |
+
|
74 |
+
# align subfolder if there is a folder wrapper
|
75 |
+
items = glob.glob(pj(project_folder,'*'))
|
76 |
+
if len(glob.glob(pj(project_folder,'*.tex'))) == 0 and len(items) == 1:
|
77 |
+
if os.path.isdir(items[0]): project_folder = items[0]
|
78 |
+
|
79 |
+
shutil.copytree(src=project_folder, dst=new_workfolder)
|
80 |
+
return new_workfolder
|
81 |
+
|
82 |
+
def arxiv_download(chatbot, history, txt):
|
83 |
+
def check_cached_translation_pdf(arxiv_id):
|
84 |
+
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'translation')
|
85 |
+
if not os.path.exists(translation_dir):
|
86 |
+
os.makedirs(translation_dir)
|
87 |
+
target_file = pj(translation_dir, 'translate_zh.pdf')
|
88 |
+
if os.path.exists(target_file):
|
89 |
+
promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
|
90 |
+
return target_file
|
91 |
+
return False
|
92 |
+
def is_float(s):
|
93 |
+
try:
|
94 |
+
float(s)
|
95 |
+
return True
|
96 |
+
except ValueError:
|
97 |
+
return False
|
98 |
+
if ('.' in txt) and ('/' not in txt) and is_float(txt): # is arxiv ID
|
99 |
+
txt = 'https://arxiv.org/abs/' + txt.strip()
|
100 |
+
if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]): # is arxiv ID
|
101 |
+
txt = 'https://arxiv.org/abs/' + txt[:10]
|
102 |
+
if not txt.startswith('https://arxiv.org'):
|
103 |
+
return txt, None
|
104 |
+
|
105 |
+
# <-------------- inspect format ------------->
|
106 |
+
chatbot.append([f"检测到arxiv文档连接", '尝试下载 ...'])
|
107 |
+
yield from update_ui(chatbot=chatbot, history=history)
|
108 |
+
time.sleep(1) # 刷新界面
|
109 |
+
|
110 |
+
url_ = txt # https://arxiv.org/abs/1707.06690
|
111 |
+
if not txt.startswith('https://arxiv.org/abs/'):
|
112 |
+
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}"
|
113 |
+
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
|
114 |
+
return msg, None
|
115 |
+
# <-------------- set format ------------->
|
116 |
+
arxiv_id = url_.split('/abs/')[-1]
|
117 |
+
if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
|
118 |
+
cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
|
119 |
+
if cached_translation_pdf: return cached_translation_pdf, arxiv_id
|
120 |
+
|
121 |
+
url_tar = url_.replace('/abs/', '/e-print/')
|
122 |
+
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
|
123 |
+
extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
|
124 |
+
os.makedirs(translation_dir, exist_ok=True)
|
125 |
+
|
126 |
+
# <-------------- download arxiv source file ------------->
|
127 |
+
dst = pj(translation_dir, arxiv_id+'.tar')
|
128 |
+
if os.path.exists(dst):
|
129 |
+
yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history) # 刷新界面
|
130 |
+
else:
|
131 |
+
yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history) # 刷新界面
|
132 |
+
proxies, = get_conf('proxies')
|
133 |
+
r = requests.get(url_tar, proxies=proxies)
|
134 |
+
with open(dst, 'wb+') as f:
|
135 |
+
f.write(r.content)
|
136 |
+
# <-------------- extract file ------------->
|
137 |
+
yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history) # 刷新界面
|
138 |
+
from toolbox import extract_archive
|
139 |
+
extract_archive(file_path=dst, dest_dir=extract_dst)
|
140 |
+
return extract_dst, arxiv_id
|
141 |
+
# ========================================= 插件主程序1 =====================================================
|
142 |
+
|
143 |
+
|
144 |
+
@CatchException
|
145 |
+
def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
146 |
+
# <-------------- information about this plugin ------------->
|
147 |
+
chatbot.append([ "函数插件功能?",
|
148 |
+
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
|
149 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
150 |
+
|
151 |
+
# <-------------- more requirements ------------->
|
152 |
+
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
153 |
+
more_req = plugin_kwargs.get("advanced_arg", "")
|
154 |
+
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
|
155 |
+
|
156 |
+
# <-------------- check deps ------------->
|
157 |
+
try:
|
158 |
+
import glob, os, time, subprocess
|
159 |
+
subprocess.Popen(['pdflatex', '-version'])
|
160 |
+
from .latex_utils import Latex精细分解与转化, 编译Latex
|
161 |
+
except Exception as e:
|
162 |
+
chatbot.append([ f"解析项目: {txt}",
|
163 |
+
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
|
164 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
165 |
+
return
|
166 |
+
|
167 |
+
|
168 |
+
# <-------------- clear history and read input ------------->
|
169 |
+
history = []
|
170 |
+
if os.path.exists(txt):
|
171 |
+
project_folder = txt
|
172 |
+
else:
|
173 |
+
if txt == "": txt = '空空如也的输入栏'
|
174 |
+
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
|
175 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
176 |
+
return
|
177 |
+
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
|
178 |
+
if len(file_manifest) == 0:
|
179 |
+
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
|
180 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
181 |
+
return
|
182 |
+
|
183 |
+
|
184 |
+
# <-------------- if is a zip/tar file ------------->
|
185 |
+
project_folder = desend_to_extracted_folder_if_exist(project_folder)
|
186 |
+
|
187 |
+
|
188 |
+
# <-------------- move latex project away from temp folder ------------->
|
189 |
+
project_folder = move_project(project_folder, arxiv_id=None)
|
190 |
+
|
191 |
+
|
192 |
+
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
193 |
+
if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
|
194 |
+
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
195 |
+
chatbot, history, system_prompt, mode='proofread_en', switch_prompt=_switch_prompt_)
|
196 |
+
|
197 |
+
|
198 |
+
# <-------------- compile PDF ------------->
|
199 |
+
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread_en',
|
200 |
+
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
201 |
+
|
202 |
+
|
203 |
+
# <-------------- zip PDF ------------->
|
204 |
+
zip_res = zip_result(project_folder)
|
205 |
+
if success:
|
206 |
+
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
207 |
+
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
208 |
+
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
209 |
+
else:
|
210 |
+
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
211 |
+
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
212 |
+
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
213 |
+
|
214 |
+
# <-------------- we are done ------------->
|
215 |
+
return success
|
216 |
+
|
217 |
+
|
218 |
+
# ========================================= 插件主程序2 =====================================================
|
219 |
+
|
220 |
+
@CatchException
|
221 |
+
def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
222 |
+
# <-------------- information about this plugin ------------->
|
223 |
+
chatbot.append([
|
224 |
+
"函数插件功能?",
|
225 |
+
"对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳,Linux下必须使用Docker安装,详见项目主README.md。目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
|
226 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
227 |
+
|
228 |
+
# <-------------- more requirements ------------->
|
229 |
+
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
230 |
+
more_req = plugin_kwargs.get("advanced_arg", "")
|
231 |
+
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
|
232 |
+
|
233 |
+
# <-------------- check deps ------------->
|
234 |
+
try:
|
235 |
+
import glob, os, time, subprocess
|
236 |
+
subprocess.Popen(['pdflatex', '-version'])
|
237 |
+
from .latex_utils import Latex精细分解与转化, 编译Latex
|
238 |
+
except Exception as e:
|
239 |
+
chatbot.append([ f"解析项目: {txt}",
|
240 |
+
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
|
241 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
242 |
+
return
|
243 |
+
|
244 |
+
|
245 |
+
# <-------------- clear history and read input ------------->
|
246 |
+
history = []
|
247 |
+
txt, arxiv_id = yield from arxiv_download(chatbot, history, txt)
|
248 |
+
if txt.endswith('.pdf'):
|
249 |
+
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"发现已经存在翻译好的PDF文档")
|
250 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
251 |
+
return
|
252 |
+
|
253 |
+
|
254 |
+
if os.path.exists(txt):
|
255 |
+
project_folder = txt
|
256 |
+
else:
|
257 |
+
if txt == "": txt = '空空如也的输入栏'
|
258 |
+
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
|
259 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
260 |
+
return
|
261 |
+
|
262 |
+
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
|
263 |
+
if len(file_manifest) == 0:
|
264 |
+
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
|
265 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
266 |
+
return
|
267 |
+
|
268 |
+
|
269 |
+
# <-------------- if is a zip/tar file ------------->
|
270 |
+
project_folder = desend_to_extracted_folder_if_exist(project_folder)
|
271 |
+
|
272 |
+
|
273 |
+
# <-------------- move latex project away from temp folder ------------->
|
274 |
+
project_folder = move_project(project_folder, arxiv_id)
|
275 |
+
|
276 |
+
|
277 |
+
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
278 |
+
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
|
279 |
+
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
280 |
+
chatbot, history, system_prompt, mode='translate_zh', switch_prompt=_switch_prompt_)
|
281 |
+
|
282 |
+
|
283 |
+
# <-------------- compile PDF ------------->
|
284 |
+
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_translate_zh', mode='translate_zh',
|
285 |
+
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
286 |
+
|
287 |
+
# <-------------- zip PDF ------------->
|
288 |
+
zip_res = zip_result(project_folder)
|
289 |
+
if success:
|
290 |
+
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
291 |
+
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
292 |
+
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
293 |
+
else:
|
294 |
+
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
295 |
+
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
296 |
+
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
297 |
+
|
298 |
+
|
299 |
+
# <-------------- we are done ------------->
|
300 |
+
return success
|
crazy_functions/crazy_functions_test.py
CHANGED
@@ -3,6 +3,8 @@
|
|
3 |
这个文件用于函数插件的单元测试
|
4 |
运行方法 python crazy_functions/crazy_functions_test.py
|
5 |
"""
|
|
|
|
|
6 |
|
7 |
def validate_path():
|
8 |
import os, sys
|
@@ -10,10 +12,16 @@ def validate_path():
|
|
10 |
root_dir_assume = os.path.abspath(os.path.dirname(__file__) + '/..')
|
11 |
os.chdir(root_dir_assume)
|
12 |
sys.path.append(root_dir_assume)
|
13 |
-
|
14 |
validate_path() # validate path so you can run from base directory
|
|
|
|
|
|
|
15 |
from colorful import *
|
16 |
from toolbox import get_conf, ChatBotWithCookies
|
|
|
|
|
|
|
|
|
17 |
proxies, WEB_PORT, LLM_MODEL, CONCURRENT_COUNT, AUTHENTICATION, CHATBOT_HEIGHT, LAYOUT, API_KEY = \
|
18 |
get_conf('proxies', 'WEB_PORT', 'LLM_MODEL', 'CONCURRENT_COUNT', 'AUTHENTICATION', 'CHATBOT_HEIGHT', 'LAYOUT', 'API_KEY')
|
19 |
|
@@ -30,7 +38,43 @@ history = []
|
|
30 |
system_prompt = "Serve me as a writing and programming assistant."
|
31 |
web_port = 1024
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
def test_解析一个Python项目():
|
35 |
from crazy_functions.解析项目源代码 import 解析一个Python项目
|
36 |
txt = "crazy_functions/test_project/python/dqn"
|
@@ -116,6 +160,56 @@ def test_Markdown多语言():
|
|
116 |
for cookies, cb, hist, msg in Markdown翻译指定语言(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
117 |
print(cb)
|
118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
|
120 |
|
121 |
# test_解析一个Python项目()
|
@@ -129,7 +223,9 @@ def test_Markdown多语言():
|
|
129 |
# test_联网回答问题()
|
130 |
# test_解析ipynb文件()
|
131 |
# test_数学动画生成manim()
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
|
|
|
|
|
3 |
这个文件用于函数插件的单元测试
|
4 |
运行方法 python crazy_functions/crazy_functions_test.py
|
5 |
"""
|
6 |
+
|
7 |
+
# ==============================================================================================================================
|
8 |
|
9 |
def validate_path():
|
10 |
import os, sys
|
|
|
12 |
root_dir_assume = os.path.abspath(os.path.dirname(__file__) + '/..')
|
13 |
os.chdir(root_dir_assume)
|
14 |
sys.path.append(root_dir_assume)
|
|
|
15 |
validate_path() # validate path so you can run from base directory
|
16 |
+
|
17 |
+
# ==============================================================================================================================
|
18 |
+
|
19 |
from colorful import *
|
20 |
from toolbox import get_conf, ChatBotWithCookies
|
21 |
+
import contextlib
|
22 |
+
import os
|
23 |
+
import sys
|
24 |
+
from functools import wraps
|
25 |
proxies, WEB_PORT, LLM_MODEL, CONCURRENT_COUNT, AUTHENTICATION, CHATBOT_HEIGHT, LAYOUT, API_KEY = \
|
26 |
get_conf('proxies', 'WEB_PORT', 'LLM_MODEL', 'CONCURRENT_COUNT', 'AUTHENTICATION', 'CHATBOT_HEIGHT', 'LAYOUT', 'API_KEY')
|
27 |
|
|
|
38 |
system_prompt = "Serve me as a writing and programming assistant."
|
39 |
web_port = 1024
|
40 |
|
41 |
+
# ==============================================================================================================================
|
42 |
+
|
43 |
+
def silence_stdout(func):
|
44 |
+
@wraps(func)
|
45 |
+
def wrapper(*args, **kwargs):
|
46 |
+
_original_stdout = sys.stdout
|
47 |
+
sys.stdout = open(os.devnull, 'w')
|
48 |
+
for q in func(*args, **kwargs):
|
49 |
+
sys.stdout = _original_stdout
|
50 |
+
yield q
|
51 |
+
sys.stdout = open(os.devnull, 'w')
|
52 |
+
sys.stdout.close()
|
53 |
+
sys.stdout = _original_stdout
|
54 |
+
return wrapper
|
55 |
+
|
56 |
+
class CLI_Printer():
|
57 |
+
def __init__(self) -> None:
|
58 |
+
self.pre_buf = ""
|
59 |
+
|
60 |
+
def print(self, buf):
|
61 |
+
bufp = ""
|
62 |
+
for index, chat in enumerate(buf):
|
63 |
+
a, b = chat
|
64 |
+
bufp += sprint亮靛('[Me]:' + a) + '\n'
|
65 |
+
bufp += '[GPT]:' + b
|
66 |
+
if index < len(buf)-1:
|
67 |
+
bufp += '\n'
|
68 |
+
|
69 |
+
if self.pre_buf!="" and bufp.startswith(self.pre_buf):
|
70 |
+
print(bufp[len(self.pre_buf):], end='')
|
71 |
+
else:
|
72 |
+
print('\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'+bufp, end='')
|
73 |
+
self.pre_buf = bufp
|
74 |
+
return
|
75 |
+
|
76 |
+
cli_printer = CLI_Printer()
|
77 |
+
# ==============================================================================================================================
|
78 |
def test_解析一个Python项目():
|
79 |
from crazy_functions.解析项目源代码 import 解析一个Python项目
|
80 |
txt = "crazy_functions/test_project/python/dqn"
|
|
|
160 |
for cookies, cb, hist, msg in Markdown翻译指定语言(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
161 |
print(cb)
|
162 |
|
163 |
+
def test_Langchain知识库():
|
164 |
+
from crazy_functions.Langchain知识库 import 知识库问答
|
165 |
+
txt = "./"
|
166 |
+
chatbot = ChatBotWithCookies(llm_kwargs)
|
167 |
+
for cookies, cb, hist, msg in silence_stdout(知识库问答)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
168 |
+
cli_printer.print(cb) # print(cb)
|
169 |
+
|
170 |
+
chatbot = ChatBotWithCookies(cookies)
|
171 |
+
from crazy_functions.Langchain知识库 import 读取知识库作答
|
172 |
+
txt = "What is the installation method?"
|
173 |
+
for cookies, cb, hist, msg in silence_stdout(读取知识库作答)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
174 |
+
cli_printer.print(cb) # print(cb)
|
175 |
+
|
176 |
+
def test_Langchain知识库读取():
|
177 |
+
from crazy_functions.Langchain知识库 import 读取知识库作答
|
178 |
+
txt = "远程云服务器部署?"
|
179 |
+
for cookies, cb, hist, msg in silence_stdout(读取知识库作答)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
180 |
+
cli_printer.print(cb) # print(cb)
|
181 |
+
|
182 |
+
def test_Latex():
|
183 |
+
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比, Latex翻译中文并重新编译PDF
|
184 |
+
|
185 |
+
# txt = r"https://arxiv.org/abs/1706.03762"
|
186 |
+
# txt = r"https://arxiv.org/abs/1902.03185"
|
187 |
+
# txt = r"https://arxiv.org/abs/2305.18290"
|
188 |
+
# txt = r"https://arxiv.org/abs/2305.17608"
|
189 |
+
# txt = r"https://arxiv.org/abs/2211.16068" # ACE
|
190 |
+
# txt = r"C:\Users\x\arxiv_cache\2211.16068\workfolder" # ACE
|
191 |
+
# txt = r"https://arxiv.org/abs/2002.09253"
|
192 |
+
# txt = r"https://arxiv.org/abs/2306.07831"
|
193 |
+
# txt = r"https://arxiv.org/abs/2212.10156"
|
194 |
+
# txt = r"https://arxiv.org/abs/2211.11559"
|
195 |
+
# txt = r"https://arxiv.org/abs/2303.08774"
|
196 |
+
txt = r"https://arxiv.org/abs/2303.12712"
|
197 |
+
# txt = r"C:\Users\fuqingxu\arxiv_cache\2303.12712\workfolder"
|
198 |
+
|
199 |
+
|
200 |
+
for cookies, cb, hist, msg in (Latex翻译中文并重新编译PDF)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
201 |
+
cli_printer.print(cb) # print(cb)
|
202 |
+
|
203 |
+
|
204 |
+
|
205 |
+
# txt = "2302.02948.tar"
|
206 |
+
# print(txt)
|
207 |
+
# main_tex, work_folder = Latex预处理(txt)
|
208 |
+
# print('main tex:', main_tex)
|
209 |
+
# res = 编译Latex(main_tex, work_folder)
|
210 |
+
# # for cookies, cb, hist, msg in silence_stdout(编译Latex)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
211 |
+
# cli_printer.print(cb) # print(cb)
|
212 |
+
|
213 |
|
214 |
|
215 |
# test_解析一个Python项目()
|
|
|
223 |
# test_联网回答问题()
|
224 |
# test_解析ipynb文件()
|
225 |
# test_数学动画生成manim()
|
226 |
+
# test_Langchain知识库()
|
227 |
+
# test_Langchain知识库读取()
|
228 |
+
if __name__ == "__main__":
|
229 |
+
test_Latex()
|
230 |
+
input("程序完成,回车退出。")
|
231 |
+
print("退出。")
|
crazy_functions/crazy_utils.py
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
from toolbox import update_ui, get_conf, trimmed_format_exc
|
|
|
2 |
|
3 |
def input_clipping(inputs, history, max_token_limit):
|
4 |
import numpy as np
|
@@ -606,3 +607,142 @@ def get_files_from_everything(txt, type): # type='.md'
|
|
606 |
success = False
|
607 |
|
608 |
return success, file_manifest, project_folder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
from toolbox import update_ui, get_conf, trimmed_format_exc
|
2 |
+
import threading
|
3 |
|
4 |
def input_clipping(inputs, history, max_token_limit):
|
5 |
import numpy as np
|
|
|
607 |
success = False
|
608 |
|
609 |
return success, file_manifest, project_folder
|
610 |
+
|
611 |
+
|
612 |
+
|
613 |
+
|
614 |
+
def Singleton(cls):
|
615 |
+
_instance = {}
|
616 |
+
|
617 |
+
def _singleton(*args, **kargs):
|
618 |
+
if cls not in _instance:
|
619 |
+
_instance[cls] = cls(*args, **kargs)
|
620 |
+
return _instance[cls]
|
621 |
+
|
622 |
+
return _singleton
|
623 |
+
|
624 |
+
|
625 |
+
@Singleton
|
626 |
+
class knowledge_archive_interface():
|
627 |
+
def __init__(self) -> None:
|
628 |
+
self.threadLock = threading.Lock()
|
629 |
+
self.current_id = ""
|
630 |
+
self.kai_path = None
|
631 |
+
self.qa_handle = None
|
632 |
+
self.text2vec_large_chinese = None
|
633 |
+
|
634 |
+
def get_chinese_text2vec(self):
|
635 |
+
if self.text2vec_large_chinese is None:
|
636 |
+
# < -------------------预热文本向量化模组--------------- >
|
637 |
+
from toolbox import ProxyNetworkActivate
|
638 |
+
print('Checking Text2vec ...')
|
639 |
+
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
|
640 |
+
with ProxyNetworkActivate(): # 临时地激活代理网络
|
641 |
+
self.text2vec_large_chinese = HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
|
642 |
+
|
643 |
+
return self.text2vec_large_chinese
|
644 |
+
|
645 |
+
|
646 |
+
def feed_archive(self, file_manifest, id="default"):
|
647 |
+
self.threadLock.acquire()
|
648 |
+
# import uuid
|
649 |
+
self.current_id = id
|
650 |
+
from zh_langchain import construct_vector_store
|
651 |
+
self.qa_handle, self.kai_path = construct_vector_store(
|
652 |
+
vs_id=self.current_id,
|
653 |
+
files=file_manifest,
|
654 |
+
sentence_size=100,
|
655 |
+
history=[],
|
656 |
+
one_conent="",
|
657 |
+
one_content_segmentation="",
|
658 |
+
text2vec = self.get_chinese_text2vec(),
|
659 |
+
)
|
660 |
+
self.threadLock.release()
|
661 |
+
|
662 |
+
def get_current_archive_id(self):
|
663 |
+
return self.current_id
|
664 |
+
|
665 |
+
def get_loaded_file(self):
|
666 |
+
return self.qa_handle.get_loaded_file()
|
667 |
+
|
668 |
+
def answer_with_archive_by_id(self, txt, id):
|
669 |
+
self.threadLock.acquire()
|
670 |
+
if not self.current_id == id:
|
671 |
+
self.current_id = id
|
672 |
+
from zh_langchain import construct_vector_store
|
673 |
+
self.qa_handle, self.kai_path = construct_vector_store(
|
674 |
+
vs_id=self.current_id,
|
675 |
+
files=[],
|
676 |
+
sentence_size=100,
|
677 |
+
history=[],
|
678 |
+
one_conent="",
|
679 |
+
one_content_segmentation="",
|
680 |
+
text2vec = self.get_chinese_text2vec(),
|
681 |
+
)
|
682 |
+
VECTOR_SEARCH_SCORE_THRESHOLD = 0
|
683 |
+
VECTOR_SEARCH_TOP_K = 4
|
684 |
+
CHUNK_SIZE = 512
|
685 |
+
resp, prompt = self.qa_handle.get_knowledge_based_conent_test(
|
686 |
+
query = txt,
|
687 |
+
vs_path = self.kai_path,
|
688 |
+
score_threshold=VECTOR_SEARCH_SCORE_THRESHOLD,
|
689 |
+
vector_search_top_k=VECTOR_SEARCH_TOP_K,
|
690 |
+
chunk_conent=True,
|
691 |
+
chunk_size=CHUNK_SIZE,
|
692 |
+
text2vec = self.get_chinese_text2vec(),
|
693 |
+
)
|
694 |
+
self.threadLock.release()
|
695 |
+
return resp, prompt
|
696 |
+
|
697 |
+
def try_install_deps(deps):
|
698 |
+
for dep in deps:
|
699 |
+
import subprocess, sys
|
700 |
+
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--user', dep])
|
701 |
+
|
702 |
+
|
703 |
+
class construct_html():
|
704 |
+
def __init__(self) -> None:
|
705 |
+
self.css = """
|
706 |
+
.row {
|
707 |
+
display: flex;
|
708 |
+
flex-wrap: wrap;
|
709 |
+
}
|
710 |
+
|
711 |
+
.column {
|
712 |
+
flex: 1;
|
713 |
+
padding: 10px;
|
714 |
+
}
|
715 |
+
|
716 |
+
.table-header {
|
717 |
+
font-weight: bold;
|
718 |
+
border-bottom: 1px solid black;
|
719 |
+
}
|
720 |
+
|
721 |
+
.table-row {
|
722 |
+
border-bottom: 1px solid lightgray;
|
723 |
+
}
|
724 |
+
|
725 |
+
.table-cell {
|
726 |
+
padding: 5px;
|
727 |
+
}
|
728 |
+
"""
|
729 |
+
self.html_string = f'<!DOCTYPE html><head><meta charset="utf-8"><title>翻译结果</title><style>{self.css}</style></head>'
|
730 |
+
|
731 |
+
|
732 |
+
def add_row(self, a, b):
|
733 |
+
tmp = """
|
734 |
+
<div class="row table-row">
|
735 |
+
<div class="column table-cell">REPLACE_A</div>
|
736 |
+
<div class="column table-cell">REPLACE_B</div>
|
737 |
+
</div>
|
738 |
+
"""
|
739 |
+
from toolbox import markdown_convertion
|
740 |
+
tmp = tmp.replace('REPLACE_A', markdown_convertion(a))
|
741 |
+
tmp = tmp.replace('REPLACE_B', markdown_convertion(b))
|
742 |
+
self.html_string += tmp
|
743 |
+
|
744 |
+
|
745 |
+
def save_file(self, file_name):
|
746 |
+
with open(f'./gpt_log/{file_name}', 'w', encoding='utf8') as f:
|
747 |
+
f.write(self.html_string.encode('utf-8', 'ignore').decode())
|
748 |
+
|
crazy_functions/latex_utils.py
ADDED
@@ -0,0 +1,773 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from toolbox import update_ui, update_ui_lastest_msg # 刷新Gradio前端界面
|
2 |
+
from toolbox import zip_folder, objdump, objload, promote_file_to_downloadzone
|
3 |
+
import os, shutil
|
4 |
+
import re
|
5 |
+
import numpy as np
|
6 |
+
pj = os.path.join
|
7 |
+
|
8 |
+
"""
|
9 |
+
========================================================================
|
10 |
+
Part One
|
11 |
+
Latex segmentation with a binary mask (PRESERVE=0, TRANSFORM=1)
|
12 |
+
========================================================================
|
13 |
+
"""
|
14 |
+
PRESERVE = 0
|
15 |
+
TRANSFORM = 1
|
16 |
+
|
17 |
+
def set_forbidden_text(text, mask, pattern, flags=0):
|
18 |
+
"""
|
19 |
+
Add a preserve text area in this paper
|
20 |
+
e.g. with pattern = r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}"
|
21 |
+
you can mask out (mask = PRESERVE so that text become untouchable for GPT)
|
22 |
+
everything between "\begin{equation}" and "\end{equation}"
|
23 |
+
"""
|
24 |
+
if isinstance(pattern, list): pattern = '|'.join(pattern)
|
25 |
+
pattern_compile = re.compile(pattern, flags)
|
26 |
+
for res in pattern_compile.finditer(text):
|
27 |
+
mask[res.span()[0]:res.span()[1]] = PRESERVE
|
28 |
+
return text, mask
|
29 |
+
|
30 |
+
def set_forbidden_text_careful_brace(text, mask, pattern, flags=0):
|
31 |
+
"""
|
32 |
+
Add a preserve text area in this paper (text become untouchable for GPT).
|
33 |
+
count the number of the braces so as to catch compelete text area.
|
34 |
+
e.g.
|
35 |
+
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
36 |
+
"""
|
37 |
+
pattern_compile = re.compile(pattern, flags)
|
38 |
+
for res in pattern_compile.finditer(text):
|
39 |
+
brace_level = -1
|
40 |
+
p = begin = end = res.regs[0][0]
|
41 |
+
for _ in range(1024*16):
|
42 |
+
if text[p] == '}' and brace_level == 0: break
|
43 |
+
elif text[p] == '}': brace_level -= 1
|
44 |
+
elif text[p] == '{': brace_level += 1
|
45 |
+
p += 1
|
46 |
+
end = p+1
|
47 |
+
mask[begin:end] = PRESERVE
|
48 |
+
return text, mask
|
49 |
+
|
50 |
+
def reverse_forbidden_text_careful_brace(text, mask, pattern, flags=0, forbid_wrapper=True):
|
51 |
+
"""
|
52 |
+
Move area out of preserve area (make text editable for GPT)
|
53 |
+
count the number of the braces so as to catch compelete text area.
|
54 |
+
e.g.
|
55 |
+
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
56 |
+
"""
|
57 |
+
pattern_compile = re.compile(pattern, flags)
|
58 |
+
for res in pattern_compile.finditer(text):
|
59 |
+
brace_level = 0
|
60 |
+
p = begin = end = res.regs[1][0]
|
61 |
+
for _ in range(1024*16):
|
62 |
+
if text[p] == '}' and brace_level == 0: break
|
63 |
+
elif text[p] == '}': brace_level -= 1
|
64 |
+
elif text[p] == '{': brace_level += 1
|
65 |
+
p += 1
|
66 |
+
end = p
|
67 |
+
mask[begin:end] = TRANSFORM
|
68 |
+
if forbid_wrapper:
|
69 |
+
mask[res.regs[0][0]:begin] = PRESERVE
|
70 |
+
mask[end:res.regs[0][1]] = PRESERVE
|
71 |
+
return text, mask
|
72 |
+
|
73 |
+
def set_forbidden_text_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
74 |
+
"""
|
75 |
+
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
|
76 |
+
Add it to preserve area
|
77 |
+
"""
|
78 |
+
pattern_compile = re.compile(pattern, flags)
|
79 |
+
def search_with_line_limit(text, mask):
|
80 |
+
for res in pattern_compile.finditer(text):
|
81 |
+
cmd = res.group(1) # begin{what}
|
82 |
+
this = res.group(2) # content between begin and end
|
83 |
+
this_mask = mask[res.regs[2][0]:res.regs[2][1]]
|
84 |
+
white_list = ['document', 'abstract', 'lemma', 'definition', 'sproof',
|
85 |
+
'em', 'emph', 'textit', 'textbf', 'itemize', 'enumerate']
|
86 |
+
if (cmd in white_list) or this.count('\n') >= limit_n_lines: # use a magical number 42
|
87 |
+
this, this_mask = search_with_line_limit(this, this_mask)
|
88 |
+
mask[res.regs[2][0]:res.regs[2][1]] = this_mask
|
89 |
+
else:
|
90 |
+
mask[res.regs[0][0]:res.regs[0][1]] = PRESERVE
|
91 |
+
return text, mask
|
92 |
+
return search_with_line_limit(text, mask)
|
93 |
+
|
94 |
+
class LinkedListNode():
|
95 |
+
"""
|
96 |
+
Linked List Node
|
97 |
+
"""
|
98 |
+
def __init__(self, string, preserve=True) -> None:
|
99 |
+
self.string = string
|
100 |
+
self.preserve = preserve
|
101 |
+
self.next = None
|
102 |
+
# self.begin_line = 0
|
103 |
+
# self.begin_char = 0
|
104 |
+
|
105 |
+
def convert_to_linklist(text, mask):
|
106 |
+
root = LinkedListNode("", preserve=True)
|
107 |
+
current_node = root
|
108 |
+
for c, m, i in zip(text, mask, range(len(text))):
|
109 |
+
if (m==PRESERVE and current_node.preserve) \
|
110 |
+
or (m==TRANSFORM and not current_node.preserve):
|
111 |
+
# add
|
112 |
+
current_node.string += c
|
113 |
+
else:
|
114 |
+
current_node.next = LinkedListNode(c, preserve=(m==PRESERVE))
|
115 |
+
current_node = current_node.next
|
116 |
+
return root
|
117 |
+
"""
|
118 |
+
========================================================================
|
119 |
+
Latex Merge File
|
120 |
+
========================================================================
|
121 |
+
"""
|
122 |
+
|
123 |
+
def 寻找Latex主文件(file_manifest, mode):
|
124 |
+
"""
|
125 |
+
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
|
126 |
+
P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码)
|
127 |
+
"""
|
128 |
+
canidates = []
|
129 |
+
for texf in file_manifest:
|
130 |
+
if os.path.basename(texf).startswith('merge'):
|
131 |
+
continue
|
132 |
+
with open(texf, 'r', encoding='utf8') as f:
|
133 |
+
file_content = f.read()
|
134 |
+
if r'\documentclass' in file_content:
|
135 |
+
canidates.append(texf)
|
136 |
+
else:
|
137 |
+
continue
|
138 |
+
|
139 |
+
if len(canidates) == 0:
|
140 |
+
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
141 |
+
elif len(canidates) == 1:
|
142 |
+
return canidates[0]
|
143 |
+
else: # if len(canidates) >= 2 通过一些Latex模板中常见(但通常不会出现在正文)的单词,对不同latex源文件扣分,取评分最高者返回
|
144 |
+
canidates_score = []
|
145 |
+
# 给出一些判定模板文档的词作为扣分项
|
146 |
+
unexpected_words = ['\LaTeX', 'manuscript', 'Guidelines', 'font', 'citations', 'rejected', 'blind review', 'reviewers']
|
147 |
+
expected_words = ['\input', '\ref', '\cite']
|
148 |
+
for texf in canidates:
|
149 |
+
canidates_score.append(0)
|
150 |
+
with open(texf, 'r', encoding='utf8') as f:
|
151 |
+
file_content = f.read()
|
152 |
+
for uw in unexpected_words:
|
153 |
+
if uw in file_content:
|
154 |
+
canidates_score[-1] -= 1
|
155 |
+
for uw in expected_words:
|
156 |
+
if uw in file_content:
|
157 |
+
canidates_score[-1] += 1
|
158 |
+
select = np.argmax(canidates_score) # 取评分最高者返回
|
159 |
+
return canidates[select]
|
160 |
+
|
161 |
+
def rm_comments(main_file):
|
162 |
+
new_file_remove_comment_lines = []
|
163 |
+
for l in main_file.splitlines():
|
164 |
+
# 删除整行的空注释
|
165 |
+
if l.lstrip().startswith("%"):
|
166 |
+
pass
|
167 |
+
else:
|
168 |
+
new_file_remove_comment_lines.append(l)
|
169 |
+
main_file = '\n'.join(new_file_remove_comment_lines)
|
170 |
+
# main_file = re.sub(r"\\include{(.*?)}", r"\\input{\1}", main_file) # 将 \include 命令转换为 \input 命令
|
171 |
+
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
|
172 |
+
return main_file
|
173 |
+
|
174 |
+
def merge_tex_files_(project_foler, main_file, mode):
|
175 |
+
"""
|
176 |
+
Merge Tex project recrusively
|
177 |
+
"""
|
178 |
+
main_file = rm_comments(main_file)
|
179 |
+
for s in reversed([q for q in re.finditer(r"\\input\{(.*?)\}", main_file, re.M)]):
|
180 |
+
f = s.group(1)
|
181 |
+
fp = os.path.join(project_foler, f)
|
182 |
+
if os.path.exists(fp):
|
183 |
+
# e.g., \input{srcs/07_appendix.tex}
|
184 |
+
with open(fp, 'r', encoding='utf-8', errors='replace') as fx:
|
185 |
+
c = fx.read()
|
186 |
+
else:
|
187 |
+
# e.g., \input{srcs/07_appendix}
|
188 |
+
with open(fp+'.tex', 'r', encoding='utf-8', errors='replace') as fx:
|
189 |
+
c = fx.read()
|
190 |
+
c = merge_tex_files_(project_foler, c, mode)
|
191 |
+
main_file = main_file[:s.span()[0]] + c + main_file[s.span()[1]:]
|
192 |
+
return main_file
|
193 |
+
|
194 |
+
def merge_tex_files(project_foler, main_file, mode):
|
195 |
+
"""
|
196 |
+
Merge Tex project recrusively
|
197 |
+
P.S. 顺便把CTEX塞进去以支持中文
|
198 |
+
P.S. 顺便把Latex的注释去除
|
199 |
+
"""
|
200 |
+
main_file = merge_tex_files_(project_foler, main_file, mode)
|
201 |
+
main_file = rm_comments(main_file)
|
202 |
+
|
203 |
+
if mode == 'translate_zh':
|
204 |
+
# find paper documentclass
|
205 |
+
pattern = re.compile(r'\\documentclass.*\n')
|
206 |
+
match = pattern.search(main_file)
|
207 |
+
assert match is not None, "Cannot find documentclass statement!"
|
208 |
+
position = match.end()
|
209 |
+
add_ctex = '\\usepackage{ctex}\n'
|
210 |
+
add_url = '\\usepackage{url}\n' if '{url}' not in main_file else ''
|
211 |
+
main_file = main_file[:position] + add_ctex + add_url + main_file[position:]
|
212 |
+
# fontset=windows
|
213 |
+
import platform
|
214 |
+
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows,UTF8]{\2}",main_file)
|
215 |
+
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows,UTF8]{\1}",main_file)
|
216 |
+
# find paper abstract
|
217 |
+
pattern_opt1 = re.compile(r'\\begin\{abstract\}.*\n')
|
218 |
+
pattern_opt2 = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
219 |
+
match_opt1 = pattern_opt1.search(main_file)
|
220 |
+
match_opt2 = pattern_opt2.search(main_file)
|
221 |
+
assert (match_opt1 is not None) or (match_opt2 is not None), "Cannot find paper abstract section!"
|
222 |
+
return main_file
|
223 |
+
|
224 |
+
|
225 |
+
|
226 |
+
"""
|
227 |
+
========================================================================
|
228 |
+
Post process
|
229 |
+
========================================================================
|
230 |
+
"""
|
231 |
+
def mod_inbraket(match):
|
232 |
+
"""
|
233 |
+
为啥chatgpt会把cite里面的逗号换成中文逗号呀
|
234 |
+
"""
|
235 |
+
# get the matched string
|
236 |
+
cmd = match.group(1)
|
237 |
+
str_to_modify = match.group(2)
|
238 |
+
# modify the matched string
|
239 |
+
str_to_modify = str_to_modify.replace(':', ':') # 前面是中文冒号,后面是英文冒号
|
240 |
+
str_to_modify = str_to_modify.replace(',', ',') # 前面是中文逗号,后面是英文逗号
|
241 |
+
# str_to_modify = 'BOOM'
|
242 |
+
return "\\" + cmd + "{" + str_to_modify + "}"
|
243 |
+
|
244 |
+
def fix_content(final_tex, node_string):
|
245 |
+
"""
|
246 |
+
Fix common GPT errors to increase success rate
|
247 |
+
"""
|
248 |
+
final_tex = re.sub(r"(?<!\\)%", "\\%", final_tex)
|
249 |
+
final_tex = re.sub(r"\\([a-z]{2,10})\ \{", r"\\\1{", string=final_tex)
|
250 |
+
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
|
251 |
+
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
|
252 |
+
|
253 |
+
if "Traceback" in final_tex and "[Local Message]" in final_tex:
|
254 |
+
final_tex = node_string # 出问题了,还原原文
|
255 |
+
if node_string.count('\\begin') != final_tex.count('\\begin'):
|
256 |
+
final_tex = node_string # 出问题了,还原原文
|
257 |
+
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
|
258 |
+
# walk and replace any _ without \
|
259 |
+
final_tex = re.sub(r"(?<!\\)_", "\\_", final_tex)
|
260 |
+
|
261 |
+
def compute_brace_level(string):
|
262 |
+
# this function count the number of { and }
|
263 |
+
brace_level = 0
|
264 |
+
for c in string:
|
265 |
+
if c == "{": brace_level += 1
|
266 |
+
elif c == "}": brace_level -= 1
|
267 |
+
return brace_level
|
268 |
+
def join_most(tex_t, tex_o):
|
269 |
+
# this function join translated string and original string when something goes wrong
|
270 |
+
p_t = 0
|
271 |
+
p_o = 0
|
272 |
+
def find_next(string, chars, begin):
|
273 |
+
p = begin
|
274 |
+
while p < len(string):
|
275 |
+
if string[p] in chars: return p, string[p]
|
276 |
+
p += 1
|
277 |
+
return None, None
|
278 |
+
while True:
|
279 |
+
res1, char = find_next(tex_o, ['{','}'], p_o)
|
280 |
+
if res1 is None: break
|
281 |
+
res2, char = find_next(tex_t, [char], p_t)
|
282 |
+
if res2 is None: break
|
283 |
+
p_o = res1 + 1
|
284 |
+
p_t = res2 + 1
|
285 |
+
return tex_t[:p_t] + tex_o[p_o:]
|
286 |
+
|
287 |
+
if compute_brace_level(final_tex) != compute_brace_level(node_string):
|
288 |
+
# 出问题了,还原部分原文,保证括号正确
|
289 |
+
final_tex = join_most(final_tex, node_string)
|
290 |
+
return final_tex
|
291 |
+
|
292 |
+
def split_subprocess(txt, project_folder, return_dict, opts):
|
293 |
+
"""
|
294 |
+
break down latex file to a linked list,
|
295 |
+
each node use a preserve flag to indicate whether it should
|
296 |
+
be proccessed by GPT.
|
297 |
+
"""
|
298 |
+
text = txt
|
299 |
+
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
|
300 |
+
|
301 |
+
# 吸收title与作者以上的部分
|
302 |
+
text, mask = set_forbidden_text(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
303 |
+
# 吸收iffalse注释
|
304 |
+
text, mask = set_forbidden_text(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
305 |
+
# 吸收在42行以内的begin-end组合
|
306 |
+
text, mask = set_forbidden_text_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=42)
|
307 |
+
# 吸收匿名公式
|
308 |
+
text, mask = set_forbidden_text(text, mask, [ r"\$\$(.*?)\$\$", r"\\\[.*?\\\]" ], re.DOTALL)
|
309 |
+
# 吸收其他杂项
|
310 |
+
text, mask = set_forbidden_text(text, mask, [ r"\\section\{(.*?)\}", r"\\section\*\{(.*?)\}", r"\\subsection\{(.*?)\}", r"\\subsubsection\{(.*?)\}" ])
|
311 |
+
text, mask = set_forbidden_text(text, mask, [ r"\\bibliography\{(.*?)\}", r"\\bibliographystyle\{(.*?)\}" ])
|
312 |
+
text, mask = set_forbidden_text(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
|
313 |
+
text, mask = set_forbidden_text(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
314 |
+
text, mask = set_forbidden_text(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
315 |
+
text, mask = set_forbidden_text(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
316 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}"], re.DOTALL)
|
317 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{figure\}(.*?)\\end\{figure\}", r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}"], re.DOTALL)
|
318 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{multline\}(.*?)\\end\{multline\}", r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}"], re.DOTALL)
|
319 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{table\}(.*?)\\end\{table\}", r"\\begin\{table\*\}(.*?)\\end\{table\*\}"], re.DOTALL)
|
320 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{minipage\}(.*?)\\end\{minipage\}", r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}"], re.DOTALL)
|
321 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{align\*\}(.*?)\\end\{align\*\}", r"\\begin\{align\}(.*?)\\end\{align\}"], re.DOTALL)
|
322 |
+
text, mask = set_forbidden_text(text, mask, [r"\\begin\{equation\}(.*?)\\end\{equation\}", r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}"], re.DOTALL)
|
323 |
+
text, mask = set_forbidden_text(text, mask, [r"\\includepdf\[(.*?)\]\{(.*?)\}", r"\\clearpage", r"\\newpage", r"\\appendix", r"\\tableofcontents", r"\\include\{(.*?)\}"])
|
324 |
+
text, mask = set_forbidden_text(text, mask, [r"\\vspace\{(.*?)\}", r"\\hspace\{(.*?)\}", r"\\label\{(.*?)\}", r"\\begin\{(.*?)\}", r"\\end\{(.*?)\}", r"\\item "])
|
325 |
+
text, mask = set_forbidden_text_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
|
326 |
+
# reverse 操作必须放在最后
|
327 |
+
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
328 |
+
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\abstract\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
329 |
+
root = convert_to_linklist(text, mask)
|
330 |
+
|
331 |
+
# 修复括号
|
332 |
+
node = root
|
333 |
+
while True:
|
334 |
+
string = node.string
|
335 |
+
if node.preserve:
|
336 |
+
node = node.next
|
337 |
+
if node is None: break
|
338 |
+
continue
|
339 |
+
def break_check(string):
|
340 |
+
str_stack = [""] # (lv, index)
|
341 |
+
for i, c in enumerate(string):
|
342 |
+
if c == '{':
|
343 |
+
str_stack.append('{')
|
344 |
+
elif c == '}':
|
345 |
+
if len(str_stack) == 1:
|
346 |
+
print('stack fix')
|
347 |
+
return i
|
348 |
+
str_stack.pop(-1)
|
349 |
+
else:
|
350 |
+
str_stack[-1] += c
|
351 |
+
return -1
|
352 |
+
bp = break_check(string)
|
353 |
+
|
354 |
+
if bp == -1:
|
355 |
+
pass
|
356 |
+
elif bp == 0:
|
357 |
+
node.string = string[:1]
|
358 |
+
q = LinkedListNode(string[1:], False)
|
359 |
+
q.next = node.next
|
360 |
+
node.next = q
|
361 |
+
else:
|
362 |
+
node.string = string[:bp]
|
363 |
+
q = LinkedListNode(string[bp:], False)
|
364 |
+
q.next = node.next
|
365 |
+
node.next = q
|
366 |
+
|
367 |
+
node = node.next
|
368 |
+
if node is None: break
|
369 |
+
|
370 |
+
# 屏蔽空行和太短的句子
|
371 |
+
node = root
|
372 |
+
while True:
|
373 |
+
if len(node.string.strip('\n').strip(''))==0: node.preserve = True
|
374 |
+
if len(node.string.strip('\n').strip(''))<42: node.preserve = True
|
375 |
+
node = node.next
|
376 |
+
if node is None: break
|
377 |
+
node = root
|
378 |
+
while True:
|
379 |
+
if node.next and node.preserve and node.next.preserve:
|
380 |
+
node.string += node.next.string
|
381 |
+
node.next = node.next.next
|
382 |
+
node = node.next
|
383 |
+
if node is None: break
|
384 |
+
|
385 |
+
# 将前后断行符脱离
|
386 |
+
node = root
|
387 |
+
prev_node = None
|
388 |
+
while True:
|
389 |
+
if not node.preserve:
|
390 |
+
lstriped_ = node.string.lstrip().lstrip('\n')
|
391 |
+
if (prev_node is not None) and (prev_node.preserve) and (len(lstriped_)!=len(node.string)):
|
392 |
+
prev_node.string += node.string[:-len(lstriped_)]
|
393 |
+
node.string = lstriped_
|
394 |
+
rstriped_ = node.string.rstrip().rstrip('\n')
|
395 |
+
if (node.next is not None) and (node.next.preserve) and (len(rstriped_)!=len(node.string)):
|
396 |
+
node.next.string = node.string[len(rstriped_):] + node.next.string
|
397 |
+
node.string = rstriped_
|
398 |
+
# =====
|
399 |
+
prev_node = node
|
400 |
+
node = node.next
|
401 |
+
if node is None: break
|
402 |
+
# 输出html调试文件,用红色标注处保留区(PRESERVE),用黑色标注转换区(TRANSFORM)
|
403 |
+
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
|
404 |
+
segment_parts_for_gpt = []
|
405 |
+
nodes = []
|
406 |
+
node = root
|
407 |
+
while True:
|
408 |
+
nodes.append(node)
|
409 |
+
show_html = node.string.replace('\n','<br/>')
|
410 |
+
if not node.preserve:
|
411 |
+
segment_parts_for_gpt.append(node.string)
|
412 |
+
f.write(f'<p style="color:black;">#{show_html}#</p>')
|
413 |
+
else:
|
414 |
+
f.write(f'<p style="color:red;">{show_html}</p>')
|
415 |
+
node = node.next
|
416 |
+
if node is None: break
|
417 |
+
|
418 |
+
for n in nodes: n.next = None # break
|
419 |
+
return_dict['nodes'] = nodes
|
420 |
+
return_dict['segment_parts_for_gpt'] = segment_parts_for_gpt
|
421 |
+
return return_dict
|
422 |
+
|
423 |
+
|
424 |
+
|
425 |
+
class LatexPaperSplit():
|
426 |
+
"""
|
427 |
+
break down latex file to a linked list,
|
428 |
+
each node use a preserve flag to indicate whether it should
|
429 |
+
be proccessed by GPT.
|
430 |
+
"""
|
431 |
+
def __init__(self) -> None:
|
432 |
+
self.nodes = None
|
433 |
+
self.msg = "*{\\scriptsize\\textbf{警告:该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \
|
434 |
+
"版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \
|
435 |
+
"项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。"
|
436 |
+
# 请您不要删除或修改这行警告,除非您是论文的原作者(如果您是论文原作者,欢迎加REAME中的QQ联系开发者)
|
437 |
+
self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\"
|
438 |
+
|
439 |
+
def merge_result(self, arr, mode, msg):
|
440 |
+
"""
|
441 |
+
Merge the result after the GPT process completed
|
442 |
+
"""
|
443 |
+
result_string = ""
|
444 |
+
p = 0
|
445 |
+
for node in self.nodes:
|
446 |
+
if node.preserve:
|
447 |
+
result_string += node.string
|
448 |
+
else:
|
449 |
+
result_string += fix_content(arr[p], node.string)
|
450 |
+
p += 1
|
451 |
+
if mode == 'translate_zh':
|
452 |
+
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
453 |
+
match = pattern.search(result_string)
|
454 |
+
if not match:
|
455 |
+
# match \abstract{xxxx}
|
456 |
+
pattern_compile = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
457 |
+
match = pattern_compile.search(result_string)
|
458 |
+
position = match.regs[1][0]
|
459 |
+
else:
|
460 |
+
# match \begin{abstract}xxxx\end{abstract}
|
461 |
+
position = match.end()
|
462 |
+
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
|
463 |
+
return result_string
|
464 |
+
|
465 |
+
def split(self, txt, project_folder, opts):
|
466 |
+
"""
|
467 |
+
break down latex file to a linked list,
|
468 |
+
each node use a preserve flag to indicate whether it should
|
469 |
+
be proccessed by GPT.
|
470 |
+
P.S. use multiprocessing to avoid timeout error
|
471 |
+
"""
|
472 |
+
import multiprocessing
|
473 |
+
manager = multiprocessing.Manager()
|
474 |
+
return_dict = manager.dict()
|
475 |
+
p = multiprocessing.Process(
|
476 |
+
target=split_subprocess,
|
477 |
+
args=(txt, project_folder, return_dict, opts))
|
478 |
+
p.start()
|
479 |
+
p.join()
|
480 |
+
p.close()
|
481 |
+
self.nodes = return_dict['nodes']
|
482 |
+
self.sp = return_dict['segment_parts_for_gpt']
|
483 |
+
return self.sp
|
484 |
+
|
485 |
+
|
486 |
+
|
487 |
+
class LatexPaperFileGroup():
|
488 |
+
"""
|
489 |
+
use tokenizer to break down text according to max_token_limit
|
490 |
+
"""
|
491 |
+
def __init__(self):
|
492 |
+
self.file_paths = []
|
493 |
+
self.file_contents = []
|
494 |
+
self.sp_file_contents = []
|
495 |
+
self.sp_file_index = []
|
496 |
+
self.sp_file_tag = []
|
497 |
+
|
498 |
+
# count_token
|
499 |
+
from request_llm.bridge_all import model_info
|
500 |
+
enc = model_info["gpt-3.5-turbo"]['tokenizer']
|
501 |
+
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
|
502 |
+
self.get_token_num = get_token_num
|
503 |
+
|
504 |
+
def run_file_split(self, max_token_limit=1900):
|
505 |
+
"""
|
506 |
+
use tokenizer to break down text according to max_token_limit
|
507 |
+
"""
|
508 |
+
for index, file_content in enumerate(self.file_contents):
|
509 |
+
if self.get_token_num(file_content) < max_token_limit:
|
510 |
+
self.sp_file_contents.append(file_content)
|
511 |
+
self.sp_file_index.append(index)
|
512 |
+
self.sp_file_tag.append(self.file_paths[index])
|
513 |
+
else:
|
514 |
+
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
515 |
+
segments = breakdown_txt_to_satisfy_token_limit_for_pdf(file_content, self.get_token_num, max_token_limit)
|
516 |
+
for j, segment in enumerate(segments):
|
517 |
+
self.sp_file_contents.append(segment)
|
518 |
+
self.sp_file_index.append(index)
|
519 |
+
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
|
520 |
+
print('Segmentation: done')
|
521 |
+
|
522 |
+
def merge_result(self):
|
523 |
+
self.file_result = ["" for _ in range(len(self.file_paths))]
|
524 |
+
for r, k in zip(self.sp_file_result, self.sp_file_index):
|
525 |
+
self.file_result[k] += r
|
526 |
+
|
527 |
+
def write_result(self):
|
528 |
+
manifest = []
|
529 |
+
for path, res in zip(self.file_paths, self.file_result):
|
530 |
+
with open(path + '.polish.tex', 'w', encoding='utf8') as f:
|
531 |
+
manifest.append(path + '.polish.tex')
|
532 |
+
f.write(res)
|
533 |
+
return manifest
|
534 |
+
|
535 |
+
def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
|
536 |
+
|
537 |
+
# write html
|
538 |
+
try:
|
539 |
+
import shutil
|
540 |
+
from .crazy_utils import construct_html
|
541 |
+
from toolbox import gen_time_str
|
542 |
+
ch = construct_html()
|
543 |
+
orig = ""
|
544 |
+
trans = ""
|
545 |
+
final = []
|
546 |
+
for c,r in zip(sp_file_contents, sp_file_result):
|
547 |
+
final.append(c)
|
548 |
+
final.append(r)
|
549 |
+
for i, k in enumerate(final):
|
550 |
+
if i%2==0:
|
551 |
+
orig = k
|
552 |
+
if i%2==1:
|
553 |
+
trans = k
|
554 |
+
ch.add_row(a=orig, b=trans)
|
555 |
+
create_report_file_name = f"{gen_time_str()}.trans.html"
|
556 |
+
ch.save_file(create_report_file_name)
|
557 |
+
shutil.copyfile(pj('./gpt_log/', create_report_file_name), pj(project_folder, create_report_file_name))
|
558 |
+
promote_file_to_downloadzone(file=f'./gpt_log/{create_report_file_name}', chatbot=chatbot)
|
559 |
+
except:
|
560 |
+
from toolbox import trimmed_format_exc
|
561 |
+
print('writing html result failed:', trimmed_format_exc())
|
562 |
+
|
563 |
+
def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None, opts=[]):
|
564 |
+
import time, os, re
|
565 |
+
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
|
566 |
+
from .latex_utils import LatexPaperFileGroup, merge_tex_files, LatexPaperSplit, 寻找Latex主文件
|
567 |
+
|
568 |
+
# <-------- 寻找主tex文件 ---------->
|
569 |
+
maintex = 寻找Latex主文件(file_manifest, mode)
|
570 |
+
chatbot.append((f"定位主Latex文件", f'[Local Message] 分析结果:该项目的Latex主文件是{maintex}, 如果分析错误, 请立即终止程序, 删除或修改歧义文件, 然后重试。主程序即将开始, 请稍候。'))
|
571 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
572 |
+
time.sleep(3)
|
573 |
+
|
574 |
+
# <-------- 读取Latex文件, 将多文件tex工程融合为一个巨型tex ---------->
|
575 |
+
main_tex_basename = os.path.basename(maintex)
|
576 |
+
assert main_tex_basename.endswith('.tex')
|
577 |
+
main_tex_basename_bare = main_tex_basename[:-4]
|
578 |
+
may_exist_bbl = pj(project_folder, f'{main_tex_basename_bare}.bbl')
|
579 |
+
if os.path.exists(may_exist_bbl):
|
580 |
+
shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge.bbl'))
|
581 |
+
shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge_{mode}.bbl'))
|
582 |
+
shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge_diff.bbl'))
|
583 |
+
|
584 |
+
with open(maintex, 'r', encoding='utf-8', errors='replace') as f:
|
585 |
+
content = f.read()
|
586 |
+
merged_content = merge_tex_files(project_folder, content, mode)
|
587 |
+
|
588 |
+
with open(project_folder + '/merge.tex', 'w', encoding='utf-8', errors='replace') as f:
|
589 |
+
f.write(merged_content)
|
590 |
+
|
591 |
+
# <-------- 精细切分latex文件 ---------->
|
592 |
+
chatbot.append((f"Latex文件融合完成", f'[Local Message] 正在精细切分latex文件,这需要一段时间计算,文档越长耗时越长,请耐心等待。'))
|
593 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
594 |
+
lps = LatexPaperSplit()
|
595 |
+
res = lps.split(merged_content, project_folder, opts) # 消耗时间的函数
|
596 |
+
|
597 |
+
# <-------- 拆分过长的latex片段 ---------->
|
598 |
+
pfg = LatexPaperFileGroup()
|
599 |
+
for index, r in enumerate(res):
|
600 |
+
pfg.file_paths.append('segment-' + str(index))
|
601 |
+
pfg.file_contents.append(r)
|
602 |
+
|
603 |
+
pfg.run_file_split(max_token_limit=1024)
|
604 |
+
n_split = len(pfg.sp_file_contents)
|
605 |
+
|
606 |
+
# <-------- 根据需要切换prompt ---------->
|
607 |
+
inputs_array, sys_prompt_array = switch_prompt(pfg, mode)
|
608 |
+
inputs_show_user_array = [f"{mode} {f}" for f in pfg.sp_file_tag]
|
609 |
+
|
610 |
+
if os.path.exists(pj(project_folder,'temp.pkl')):
|
611 |
+
|
612 |
+
# <-------- 【仅调试】如果存在调试缓存文件,则跳过GPT请求环节 ---------->
|
613 |
+
pfg = objload(file=pj(project_folder,'temp.pkl'))
|
614 |
+
|
615 |
+
else:
|
616 |
+
# <-------- gpt 多线程请求 ---------->
|
617 |
+
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
|
618 |
+
inputs_array=inputs_array,
|
619 |
+
inputs_show_user_array=inputs_show_user_array,
|
620 |
+
llm_kwargs=llm_kwargs,
|
621 |
+
chatbot=chatbot,
|
622 |
+
history_array=[[""] for _ in range(n_split)],
|
623 |
+
sys_prompt_array=sys_prompt_array,
|
624 |
+
# max_workers=5, # 并行任务数量限制, 最多同时执行5个, 其他的排队等待
|
625 |
+
scroller_max_len = 40
|
626 |
+
)
|
627 |
+
|
628 |
+
# <-------- 文本碎片重组为完整的tex片段 ---------->
|
629 |
+
pfg.sp_file_result = []
|
630 |
+
for i_say, gpt_say, orig_content in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], pfg.sp_file_contents):
|
631 |
+
pfg.sp_file_result.append(gpt_say)
|
632 |
+
pfg.merge_result()
|
633 |
+
|
634 |
+
# <-------- 临时存储用于调试 ---------->
|
635 |
+
pfg.get_token_num = None
|
636 |
+
objdump(pfg, file=pj(project_folder,'temp.pkl'))
|
637 |
+
|
638 |
+
write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder)
|
639 |
+
|
640 |
+
# <-------- 写出文件 ---------->
|
641 |
+
msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}。"
|
642 |
+
final_tex = lps.merge_result(pfg.file_result, mode, msg)
|
643 |
+
with open(project_folder + f'/merge_{mode}.tex', 'w', encoding='utf-8', errors='replace') as f:
|
644 |
+
if mode != 'translate_zh' or "binary" in final_tex: f.write(final_tex)
|
645 |
+
|
646 |
+
|
647 |
+
# <-------- 整理结果, 退出 ---------->
|
648 |
+
chatbot.append((f"完成了吗?", 'GPT结果已输出, 正在编译PDF'))
|
649 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
650 |
+
|
651 |
+
# <-------- 返回 ---------->
|
652 |
+
return project_folder + f'/merge_{mode}.tex'
|
653 |
+
|
654 |
+
|
655 |
+
|
656 |
+
def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work_folder_modified):
|
657 |
+
try:
|
658 |
+
with open(log_path, 'r', encoding='utf-8', errors='replace') as f:
|
659 |
+
log = f.read()
|
660 |
+
with open(file_path, 'r', encoding='utf-8', errors='replace') as f:
|
661 |
+
file_lines = f.readlines()
|
662 |
+
import re
|
663 |
+
buggy_lines = re.findall(tex_name+':([0-9]{1,5}):', log)
|
664 |
+
buggy_lines = [int(l) for l in buggy_lines]
|
665 |
+
buggy_lines = sorted(buggy_lines)
|
666 |
+
print("removing lines that has errors", buggy_lines)
|
667 |
+
file_lines.pop(buggy_lines[0]-1)
|
668 |
+
with open(pj(work_folder_modified, f"{tex_name_pure}_fix_{n_fix}.tex"), 'w', encoding='utf-8', errors='replace') as f:
|
669 |
+
f.writelines(file_lines)
|
670 |
+
return True, f"{tex_name_pure}_fix_{n_fix}", buggy_lines
|
671 |
+
except:
|
672 |
+
print("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.")
|
673 |
+
return False, -1, [-1]
|
674 |
+
|
675 |
+
|
676 |
+
def compile_latex_with_timeout(command, timeout=60):
|
677 |
+
import subprocess
|
678 |
+
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
679 |
+
try:
|
680 |
+
stdout, stderr = process.communicate(timeout=timeout)
|
681 |
+
except subprocess.TimeoutExpired:
|
682 |
+
process.kill()
|
683 |
+
stdout, stderr = process.communicate()
|
684 |
+
print("Process timed out!")
|
685 |
+
return False
|
686 |
+
return True
|
687 |
+
|
688 |
+
def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_folder_original, work_folder_modified, work_folder, mode='default'):
|
689 |
+
import os, time
|
690 |
+
current_dir = os.getcwd()
|
691 |
+
n_fix = 1
|
692 |
+
max_try = 32
|
693 |
+
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder},如果程序停顿5分钟以上,请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
|
694 |
+
chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
|
695 |
+
yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面
|
696 |
+
|
697 |
+
while True:
|
698 |
+
import os
|
699 |
+
|
700 |
+
# https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error
|
701 |
+
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面
|
702 |
+
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
703 |
+
|
704 |
+
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译转化后的PDF ...', chatbot, history) # 刷新Gradio前端界面
|
705 |
+
os.chdir(work_folder_modified); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex'); os.chdir(current_dir)
|
706 |
+
|
707 |
+
if ok and os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf')):
|
708 |
+
# 只有第二步成功,才能继续下面的步骤
|
709 |
+
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译BibTex ...', chatbot, history) # 刷新Gradio前端界面
|
710 |
+
if not os.path.exists(pj(work_folder_original, f'{main_file_original}.bbl')):
|
711 |
+
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'bibtex {main_file_original}.aux'); os.chdir(current_dir)
|
712 |
+
if not os.path.exists(pj(work_folder_modified, f'{main_file_modified}.bbl')):
|
713 |
+
os.chdir(work_folder_modified); ok = compile_latex_with_timeout(f'bibtex {main_file_modified}.aux'); os.chdir(current_dir)
|
714 |
+
|
715 |
+
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译文献交叉引用 ...', chatbot, history) # 刷新Gradio前端界面
|
716 |
+
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
717 |
+
os.chdir(work_folder_modified); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex'); os.chdir(current_dir)
|
718 |
+
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
719 |
+
os.chdir(work_folder_modified); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex'); os.chdir(current_dir)
|
720 |
+
|
721 |
+
if mode!='translate_zh':
|
722 |
+
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面
|
723 |
+
print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
|
724 |
+
ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
|
725 |
+
|
726 |
+
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面
|
727 |
+
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
728 |
+
os.chdir(work_folder); ok = compile_latex_with_timeout(f'bibtex merge_diff.aux'); os.chdir(current_dir)
|
729 |
+
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
730 |
+
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
731 |
+
|
732 |
+
# <--------------------->
|
733 |
+
os.chdir(current_dir)
|
734 |
+
|
735 |
+
# <---------- 检查结果 ----------->
|
736 |
+
results_ = ""
|
737 |
+
original_pdf_success = os.path.exists(pj(work_folder_original, f'{main_file_original}.pdf'))
|
738 |
+
modified_pdf_success = os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf'))
|
739 |
+
diff_pdf_success = os.path.exists(pj(work_folder, f'merge_diff.pdf'))
|
740 |
+
results_ += f"原始PDF编译是否成功: {original_pdf_success};"
|
741 |
+
results_ += f"转化PDF编译是否成功: {modified_pdf_success};"
|
742 |
+
results_ += f"对比PDF编译是否成功: {diff_pdf_success};"
|
743 |
+
yield from update_ui_lastest_msg(f'第{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面
|
744 |
+
|
745 |
+
if diff_pdf_success:
|
746 |
+
result_pdf = pj(work_folder_modified, f'merge_diff.pdf') # get pdf path
|
747 |
+
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
|
748 |
+
if modified_pdf_success:
|
749 |
+
yield from update_ui_lastest_msg(f'转化PDF编译已经成功, 即将退出 ...', chatbot, history) # 刷新Gradio前端界面
|
750 |
+
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf') # get pdf path
|
751 |
+
if os.path.exists(pj(work_folder, '..', 'translation')):
|
752 |
+
shutil.copyfile(result_pdf, pj(work_folder, '..', 'translation', 'translate_zh.pdf'))
|
753 |
+
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
|
754 |
+
return True # 成功啦
|
755 |
+
else:
|
756 |
+
if n_fix>=max_try: break
|
757 |
+
n_fix += 1
|
758 |
+
can_retry, main_file_modified, buggy_lines = remove_buggy_lines(
|
759 |
+
file_path=pj(work_folder_modified, f'{main_file_modified}.tex'),
|
760 |
+
log_path=pj(work_folder_modified, f'{main_file_modified}.log'),
|
761 |
+
tex_name=f'{main_file_modified}.tex',
|
762 |
+
tex_name_pure=f'{main_file_modified}',
|
763 |
+
n_fix=n_fix,
|
764 |
+
work_folder_modified=work_folder_modified,
|
765 |
+
)
|
766 |
+
yield from update_ui_lastest_msg(f'由于最为关键的转化PDF编译失败, 将根据报错信息修正tex源文件并重试, 当前报错的latex代码处于第{buggy_lines}行 ...', chatbot, history) # 刷新Gradio前端界面
|
767 |
+
if not can_retry: break
|
768 |
+
|
769 |
+
os.chdir(current_dir)
|
770 |
+
return False # 失败啦
|
771 |
+
|
772 |
+
|
773 |
+
|
crazy_functions/对话历史存档.py
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
from toolbox import CatchException, update_ui
|
2 |
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
3 |
import re
|
4 |
|
@@ -29,9 +29,8 @@ def write_chat_to_file(chatbot, history=None, file_name=None):
|
|
29 |
for h in history:
|
30 |
f.write("\n>>>" + h)
|
31 |
f.write('</code>')
|
32 |
-
|
33 |
-
|
34 |
-
return res
|
35 |
|
36 |
def gen_file_preview(file_name):
|
37 |
try:
|
|
|
1 |
+
from toolbox import CatchException, update_ui, promote_file_to_downloadzone
|
2 |
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
3 |
import re
|
4 |
|
|
|
29 |
for h in history:
|
30 |
f.write("\n>>>" + h)
|
31 |
f.write('</code>')
|
32 |
+
promote_file_to_downloadzone(f'./gpt_log/{file_name}', rename_file=file_name, chatbot=chatbot)
|
33 |
+
return '对话历史写入:' + os.path.abspath(f'./gpt_log/{file_name}')
|
|
|
34 |
|
35 |
def gen_file_preview(file_name):
|
36 |
try:
|
crazy_functions/数学动画生成manim.py
CHANGED
@@ -8,7 +8,7 @@ def inspect_dependency(chatbot, history):
|
|
8 |
import manim
|
9 |
return True
|
10 |
except:
|
11 |
-
chatbot.append(["导入依赖失败", "使用该模块需要额外依赖,安装方法:```pip install manimgl```"])
|
12 |
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
13 |
return False
|
14 |
|
|
|
8 |
import manim
|
9 |
return True
|
10 |
except:
|
11 |
+
chatbot.append(["导入依赖失败", "使用该模块需要额外依赖,安装方法:```pip install manim manimgl```"])
|
12 |
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
13 |
return False
|
14 |
|
crazy_functions/理解PDF文档内容.py
CHANGED
@@ -13,7 +13,9 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
|
|
13 |
# 递归地切割PDF文件,每一块(尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割)
|
14 |
# 的长度必须小于 2500 个 Token
|
15 |
file_content, page_one = read_and_clean_pdf_text(file_name) # (尝试)按照章节切割PDF
|
16 |
-
|
|
|
|
|
17 |
TOKEN_LIMIT_PER_FRAGMENT = 2500
|
18 |
|
19 |
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
|
|
13 |
# 递归地切割PDF文件,每一块(尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割)
|
14 |
# 的长度必须小于 2500 个 Token
|
15 |
file_content, page_one = read_and_clean_pdf_text(file_name) # (尝试)按照章节切割PDF
|
16 |
+
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
|
17 |
+
page_one = str(page_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
|
18 |
+
|
19 |
TOKEN_LIMIT_PER_FRAGMENT = 2500
|
20 |
|
21 |
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
crazy_functions/联网的ChatGPT_bing版.py
ADDED
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from toolbox import CatchException, update_ui
|
2 |
+
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
|
3 |
+
import requests
|
4 |
+
from bs4 import BeautifulSoup
|
5 |
+
from request_llm.bridge_all import model_info
|
6 |
+
|
7 |
+
|
8 |
+
def bing_search(query, proxies=None):
|
9 |
+
query = query
|
10 |
+
url = f"https://cn.bing.com/search?q={query}"
|
11 |
+
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'}
|
12 |
+
response = requests.get(url, headers=headers, proxies=proxies)
|
13 |
+
soup = BeautifulSoup(response.content, 'html.parser')
|
14 |
+
results = []
|
15 |
+
for g in soup.find_all('li', class_='b_algo'):
|
16 |
+
anchors = g.find_all('a')
|
17 |
+
if anchors:
|
18 |
+
link = anchors[0]['href']
|
19 |
+
if not link.startswith('http'):
|
20 |
+
continue
|
21 |
+
title = g.find('h2').text
|
22 |
+
item = {'title': title, 'link': link}
|
23 |
+
results.append(item)
|
24 |
+
|
25 |
+
for r in results:
|
26 |
+
print(r['link'])
|
27 |
+
return results
|
28 |
+
|
29 |
+
|
30 |
+
def scrape_text(url, proxies) -> str:
|
31 |
+
"""Scrape text from a webpage
|
32 |
+
|
33 |
+
Args:
|
34 |
+
url (str): The URL to scrape text from
|
35 |
+
|
36 |
+
Returns:
|
37 |
+
str: The scraped text
|
38 |
+
"""
|
39 |
+
headers = {
|
40 |
+
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
|
41 |
+
'Content-Type': 'text/plain',
|
42 |
+
}
|
43 |
+
try:
|
44 |
+
response = requests.get(url, headers=headers, proxies=proxies, timeout=8)
|
45 |
+
if response.encoding == "ISO-8859-1": response.encoding = response.apparent_encoding
|
46 |
+
except:
|
47 |
+
return "无法连接到该网页"
|
48 |
+
soup = BeautifulSoup(response.text, "html.parser")
|
49 |
+
for script in soup(["script", "style"]):
|
50 |
+
script.extract()
|
51 |
+
text = soup.get_text()
|
52 |
+
lines = (line.strip() for line in text.splitlines())
|
53 |
+
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
|
54 |
+
text = "\n".join(chunk for chunk in chunks if chunk)
|
55 |
+
return text
|
56 |
+
|
57 |
+
@CatchException
|
58 |
+
def 连接bing搜索回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
59 |
+
"""
|
60 |
+
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
|
61 |
+
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
|
62 |
+
plugin_kwargs 插件模型的参数,暂时没有用武之地
|
63 |
+
chatbot 聊天显示框的句柄,用于显示给用户
|
64 |
+
history 聊天历史,前情提要
|
65 |
+
system_prompt 给gpt的静默提醒
|
66 |
+
web_port 当前软件运行的端口号
|
67 |
+
"""
|
68 |
+
history = [] # 清空历史,以免输入溢出
|
69 |
+
chatbot.append((f"请结合互联网信息回答以下问题:{txt}",
|
70 |
+
"[Local Message] 请注意,您正在调用一个[函数插件]的模板,该模板可以实现ChatGPT联网信息综合。该函数面向希望实现更多有趣功能的开发者,它可以作为创建新功能函数的模板。您若希望分享新的功能模组,请不吝PR!"))
|
71 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
|
72 |
+
|
73 |
+
# ------------- < 第1步:爬取搜索引擎的结果 > -------------
|
74 |
+
from toolbox import get_conf
|
75 |
+
proxies, = get_conf('proxies')
|
76 |
+
urls = bing_search(txt, proxies)
|
77 |
+
history = []
|
78 |
+
|
79 |
+
# ------------- < 第2步:依次访问网页 > -------------
|
80 |
+
max_search_result = 8 # 最多收纳多少个网页的结果
|
81 |
+
for index, url in enumerate(urls[:max_search_result]):
|
82 |
+
res = scrape_text(url['link'], proxies)
|
83 |
+
history.extend([f"第{index}份搜索结果:", res])
|
84 |
+
chatbot.append([f"第{index}份搜索结果:", res[:500]+"......"])
|
85 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
|
86 |
+
|
87 |
+
# ------------- < 第3步:ChatGPT综合 > -------------
|
88 |
+
i_say = f"从以上搜索结果中抽取信息,然后回答问题:{txt}"
|
89 |
+
i_say, history = input_clipping( # 裁剪输入,从最长的条目开始裁剪,防止爆token
|
90 |
+
inputs=i_say,
|
91 |
+
history=history,
|
92 |
+
max_token_limit=model_info[llm_kwargs['llm_model']]['max_token']*3//4
|
93 |
+
)
|
94 |
+
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
95 |
+
inputs=i_say, inputs_show_user=i_say,
|
96 |
+
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
|
97 |
+
sys_prompt="请从给定的若干条搜索结果中抽取信息,对最相关的两个搜索结果进行总结,然后回答问题。"
|
98 |
+
)
|
99 |
+
chatbot[-1] = (i_say, gpt_say)
|
100 |
+
history.append(i_say);history.append(gpt_say)
|
101 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
|
102 |
+
|
crazy_functions/虚空终端.py
ADDED
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from toolbox import CatchException, update_ui, gen_time_str
|
2 |
+
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
3 |
+
from .crazy_utils import input_clipping
|
4 |
+
|
5 |
+
|
6 |
+
prompt = """
|
7 |
+
I have to achieve some functionalities by calling one of the functions below.
|
8 |
+
Your job is to find the correct funtion to use to satisfy my requirement,
|
9 |
+
and then write python code to call this function with correct parameters.
|
10 |
+
|
11 |
+
These are functions you are allowed to choose from:
|
12 |
+
1.
|
13 |
+
功能描述: 总结音视频内容
|
14 |
+
调用函数: ConcludeAudioContent(txt, llm_kwargs)
|
15 |
+
参数说明:
|
16 |
+
txt: 音频文件的路径
|
17 |
+
llm_kwargs: 模型参数, 永远给定None
|
18 |
+
2.
|
19 |
+
功能描述: 将每次对话记录写入Markdown格式的文件中
|
20 |
+
调用函数: WriteMarkdown()
|
21 |
+
3.
|
22 |
+
功能描述: 将指定目录下的PDF文件从英文翻译成中文
|
23 |
+
调用函数: BatchTranslatePDFDocuments_MultiThreaded(txt, llm_kwargs)
|
24 |
+
参数说明:
|
25 |
+
txt: PDF文件所在的路径
|
26 |
+
llm_kwargs: 模型参数, 永远给定None
|
27 |
+
4.
|
28 |
+
功能描述: 根据文本使用GPT模型生成相应的图像
|
29 |
+
调用函数: ImageGeneration(txt, llm_kwargs)
|
30 |
+
参数说明:
|
31 |
+
txt: 图像生成所用到的提示文本
|
32 |
+
llm_kwargs: 模型参数, 永远给定None
|
33 |
+
5.
|
34 |
+
功能描述: 对输入的word文档进行摘要生成
|
35 |
+
调用函数: SummarizingWordDocuments(input_path, output_path)
|
36 |
+
参数说明:
|
37 |
+
input_path: 待处理的word文档路径
|
38 |
+
output_path: 摘要生成后的文档路径
|
39 |
+
|
40 |
+
|
41 |
+
You should always anwser with following format:
|
42 |
+
----------------
|
43 |
+
Code:
|
44 |
+
```
|
45 |
+
class AutoAcademic(object):
|
46 |
+
def __init__(self):
|
47 |
+
self.selected_function = "FILL_CORRECT_FUNCTION_HERE" # e.g., "GenerateImage"
|
48 |
+
self.txt = "FILL_MAIN_PARAMETER_HERE" # e.g., "荷叶上的蜻蜓"
|
49 |
+
self.llm_kwargs = None
|
50 |
+
```
|
51 |
+
Explanation:
|
52 |
+
只有GenerateImage和生成图像相关, 因此选择GenerateImage函数。
|
53 |
+
----------------
|
54 |
+
|
55 |
+
Now, this is my requirement:
|
56 |
+
|
57 |
+
"""
|
58 |
+
def get_fn_lib():
|
59 |
+
return {
|
60 |
+
"BatchTranslatePDFDocuments_MultiThreaded": ("crazy_functions.批量翻译PDF文档_多线程", "批量翻译PDF文档"),
|
61 |
+
"SummarizingWordDocuments": ("crazy_functions.总结word文档", "总结word文档"),
|
62 |
+
"ImageGeneration": ("crazy_functions.图片生成", "图片生成"),
|
63 |
+
"TranslateMarkdownFromEnglishToChinese": ("crazy_functions.批量Markdown翻译", "Markdown中译英"),
|
64 |
+
"SummaryAudioVideo": ("crazy_functions.总结音视频", "总结音视频"),
|
65 |
+
}
|
66 |
+
|
67 |
+
def inspect_dependency(chatbot, history):
|
68 |
+
return True
|
69 |
+
|
70 |
+
def eval_code(code, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
71 |
+
import subprocess, sys, os, shutil, importlib
|
72 |
+
|
73 |
+
with open('gpt_log/void_terminal_runtime.py', 'w', encoding='utf8') as f:
|
74 |
+
f.write(code)
|
75 |
+
|
76 |
+
try:
|
77 |
+
AutoAcademic = getattr(importlib.import_module('gpt_log.void_terminal_runtime', 'AutoAcademic'), 'AutoAcademic')
|
78 |
+
# importlib.reload(AutoAcademic)
|
79 |
+
auto_dict = AutoAcademic()
|
80 |
+
selected_function = auto_dict.selected_function
|
81 |
+
txt = auto_dict.txt
|
82 |
+
fp, fn = get_fn_lib()[selected_function]
|
83 |
+
fn_plugin = getattr(importlib.import_module(fp, fn), fn)
|
84 |
+
yield from fn_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port)
|
85 |
+
except:
|
86 |
+
from toolbox import trimmed_format_exc
|
87 |
+
chatbot.append(["执行错误", f"\n```\n{trimmed_format_exc()}\n```\n"])
|
88 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
89 |
+
|
90 |
+
def get_code_block(reply):
|
91 |
+
import re
|
92 |
+
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
|
93 |
+
matches = re.findall(pattern, reply) # find all code blocks in text
|
94 |
+
if len(matches) != 1:
|
95 |
+
raise RuntimeError("GPT is not generating proper code.")
|
96 |
+
return matches[0].strip('python') # code block
|
97 |
+
|
98 |
+
@CatchException
|
99 |
+
def 终端(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
100 |
+
"""
|
101 |
+
txt 输入栏用户输入的文本, 例如需要翻译的一段话, 再例如一个包含了待处理文件的路径
|
102 |
+
llm_kwargs gpt模型参数, 如温度和top_p等, 一般原样传递下去就行
|
103 |
+
plugin_kwargs 插件模型的参数, 暂时没有用武之地
|
104 |
+
chatbot 聊天显示框的句柄, 用于显示给用户
|
105 |
+
history 聊天历史, 前情提要
|
106 |
+
system_prompt 给gpt的静默提醒
|
107 |
+
web_port 当前软件运行的端口号
|
108 |
+
"""
|
109 |
+
# 清空历史, 以免输入溢出
|
110 |
+
history = []
|
111 |
+
|
112 |
+
# 基本信息:功能、贡献者
|
113 |
+
chatbot.append(["函数插件功能?", "根据自然语言执行插件命令, 作者: binary-husky, 插件初始化中 ..."])
|
114 |
+
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
115 |
+
|
116 |
+
# # 尝试导入依赖, 如果缺少依赖, 则给出安装建议
|
117 |
+
# dep_ok = yield from inspect_dependency(chatbot=chatbot, history=history) # 刷新界面
|
118 |
+
# if not dep_ok: return
|
119 |
+
|
120 |
+
# 输入
|
121 |
+
i_say = prompt + txt
|
122 |
+
# 开始
|
123 |
+
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
124 |
+
inputs=i_say, inputs_show_user=txt,
|
125 |
+
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
|
126 |
+
sys_prompt=""
|
127 |
+
)
|
128 |
+
|
129 |
+
# 将代码转为动画
|
130 |
+
code = get_code_block(gpt_say)
|
131 |
+
yield from eval_code(code, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port)
|
docker-compose.yml
CHANGED
@@ -103,3 +103,30 @@ services:
|
|
103 |
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
104 |
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
105 |
python3 -u main.py"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
104 |
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
105 |
python3 -u main.py"
|
106 |
+
|
107 |
+
|
108 |
+
## ===================================================
|
109 |
+
## 【方案四】 chatgpt + Latex
|
110 |
+
## ===================================================
|
111 |
+
version: '3'
|
112 |
+
services:
|
113 |
+
gpt_academic_with_latex:
|
114 |
+
image: ghcr.io/binary-husky/gpt_academic_with_latex:master
|
115 |
+
environment:
|
116 |
+
# 请查阅 `config.py` 以查看所有的配置信息
|
117 |
+
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
|
118 |
+
USE_PROXY: ' True '
|
119 |
+
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
|
120 |
+
LLM_MODEL: ' gpt-3.5-turbo '
|
121 |
+
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4"] '
|
122 |
+
LOCAL_MODEL_DEVICE: ' cuda '
|
123 |
+
DEFAULT_WORKER_NUM: ' 10 '
|
124 |
+
WEB_PORT: ' 12303 '
|
125 |
+
|
126 |
+
# 与宿主的网络融合
|
127 |
+
network_mode: "host"
|
128 |
+
|
129 |
+
# 不使用代理网络拉取最新代码
|
130 |
+
command: >
|
131 |
+
bash -c "python3 -u main.py"
|
132 |
+
|
docs/Dockerfile+NoLocal+Latex
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
|
2 |
+
# - 1 修改 `config.py`
|
3 |
+
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/Dockerfile+NoLocal+Latex .
|
4 |
+
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
|
5 |
+
|
6 |
+
FROM fuqingxu/python311_texlive_ctex:latest
|
7 |
+
|
8 |
+
# 指定路径
|
9 |
+
WORKDIR /gpt
|
10 |
+
|
11 |
+
ARG useProxyNetwork=''
|
12 |
+
|
13 |
+
RUN $useProxyNetwork pip3 install gradio openai numpy arxiv rich -i https://pypi.douban.com/simple/
|
14 |
+
RUN $useProxyNetwork pip3 install colorama Markdown pygments pymupdf -i https://pypi.douban.com/simple/
|
15 |
+
|
16 |
+
# 装载项目文件
|
17 |
+
COPY . .
|
18 |
+
|
19 |
+
|
20 |
+
# 安装依赖
|
21 |
+
RUN $useProxyNetwork pip3 install -r requirements.txt -i https://pypi.douban.com/simple/
|
22 |
+
|
23 |
+
# 可选步骤,用于预热模块
|
24 |
+
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
25 |
+
|
26 |
+
# 启动
|
27 |
+
CMD ["python3", "-u", "main.py"]
|
docs/GithubAction+NoLocal+Latex
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
|
2 |
+
# - 1 修改 `config.py`
|
3 |
+
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/Dockerfile+NoLocal+Latex .
|
4 |
+
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
|
5 |
+
|
6 |
+
FROM fuqingxu/python311_texlive_ctex:latest
|
7 |
+
|
8 |
+
# 指定路径
|
9 |
+
WORKDIR /gpt
|
10 |
+
|
11 |
+
RUN pip3 install gradio openai numpy arxiv rich
|
12 |
+
RUN pip3 install colorama Markdown pygments pymupdf
|
13 |
+
|
14 |
+
# 装载项目文件
|
15 |
+
COPY . .
|
16 |
+
|
17 |
+
|
18 |
+
# 安装依赖
|
19 |
+
RUN pip3 install -r requirements.txt
|
20 |
+
|
21 |
+
# 可选步骤,用于预热模块
|
22 |
+
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
23 |
+
|
24 |
+
# 启动
|
25 |
+
CMD ["python3", "-u", "main.py"]
|
docs/README.md.Italian.md
CHANGED
@@ -2,11 +2,11 @@
|
|
2 |
>
|
3 |
> Durante l'installazione delle dipendenze, selezionare rigorosamente le **versioni specificate** nel file requirements.txt.
|
4 |
>
|
5 |
-
> ` pip install -r requirements.txt
|
6 |
|
7 |
-
# <img src="
|
8 |
|
9 |
-
**Se ti piace questo progetto, ti preghiamo di dargli una stella. Se hai sviluppato scorciatoie accademiche o plugin funzionali più utili, non esitare ad aprire una issue o pull request. Abbiamo anche una README in [Inglese|](
|
10 |
Per tradurre questo progetto in qualsiasi lingua con GPT, leggere e eseguire [`multi_language.py`](multi_language.py) (sperimentale).
|
11 |
|
12 |
> **Nota**
|
@@ -17,7 +17,9 @@ Per tradurre questo progetto in qualsiasi lingua con GPT, leggere e eseguire [`m
|
|
17 |
>
|
18 |
> 3. Questo progetto è compatibile e incoraggia l'utilizzo di grandi modelli di linguaggio di produzione nazionale come chatglm, RWKV, Pangu ecc. Supporta la coesistenza di più api-key e può essere compilato nel file di configurazione come `API_KEY="openai-key1,openai-key2,api2d-key3"`. Per sostituire temporaneamente `API_KEY`, inserire `API_KEY` temporaneo nell'area di input e premere Invio per renderlo effettivo.
|
19 |
|
20 |
-
<div align="center">
|
|
|
|
|
21 |
--- | ---
|
22 |
Correzione immediata | Supporta correzione immediata e ricerca degli errori di grammatica del documento con un solo clic
|
23 |
Traduzione cinese-inglese immediata | Traduzione cinese-inglese immediata con un solo clic
|
@@ -41,6 +43,8 @@ Avvia il tema di gradio [scuro](https://github.com/binary-husky/chatgpt_academic
|
|
41 |
Supporto per maggiori modelli LLM, supporto API2D | Sentirsi serviti simultaneamente da GPT3.5, GPT4, [Tsinghua ChatGLM](https://github.com/THUDM/ChatGLM-6B), [Fudan MOSS](https://github.com/OpenLMLab/MOSS) deve essere una grande sensazione, giusto?
|
42 |
Ulteriori modelli LLM supportat,i supporto per l'implementazione di Huggingface | Aggiunta di un'interfaccia Newbing (Nuovo Bing), introdotta la compatibilità con Tsinghua [Jittorllms](https://github.com/Jittor/JittorLLMs), [LLaMA](https://github.com/facebookresearch/llama), [RWKV](https://github.com/BlinkDL/ChatRWKV) e [PanGu-α](https://openi.org.cn/pangu/)
|
43 |
Ulteriori dimostrazioni di nuove funzionalità (generazione di immagini, ecc.)... | Vedere la fine di questo documento...
|
|
|
|
|
44 |
|
45 |
- Nuova interfaccia (modificare l'opzione LAYOUT in `config.py` per passare dal layout a sinistra e a destra al layout superiore e inferiore)
|
46 |
<div align="center">
|
@@ -202,11 +206,13 @@ ad esempio
|
|
202 |
2. Plugin di funzione personalizzati
|
203 |
|
204 |
Scrivi plugin di funzione personalizzati e esegui tutte le attività che desideri o non hai mai pensato di fare.
|
205 |
-
La difficoltà di scrittura e debug dei plugin del nostro progetto è molto bassa. Se si dispone di una certa conoscenza di base di Python, è possibile realizzare la propria funzione del plugin seguendo il nostro modello. Per maggiori dettagli, consultare la [guida al plugin per funzioni]
|
206 |
|
207 |
---
|
208 |
# Ultimo aggiornamento
|
209 |
-
## Nuove funzionalità
|
|
|
|
|
210 |
<div align="center">
|
211 |
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
212 |
</div>
|
@@ -307,4 +313,4 @@ https://github.com/kaixindelele/ChatPaper
|
|
307 |
# Altro:
|
308 |
https://github.com/gradio-app/gradio
|
309 |
https://github.com/fghrsh/live2d_demo
|
310 |
-
```
|
|
|
2 |
>
|
3 |
> Durante l'installazione delle dipendenze, selezionare rigorosamente le **versioni specificate** nel file requirements.txt.
|
4 |
>
|
5 |
+
> ` pip install -r requirements.txt`
|
6 |
|
7 |
+
# <img src="logo.png" width="40" > GPT Ottimizzazione Accademica (GPT Academic)
|
8 |
|
9 |
+
**Se ti piace questo progetto, ti preghiamo di dargli una stella. Se hai sviluppato scorciatoie accademiche o plugin funzionali più utili, non esitare ad aprire una issue o pull request. Abbiamo anche una README in [Inglese|](README_EN.md)[Giapponese|](README_JP.md)[Coreano|](https://github.com/mldljyh/ko_gpt_academic)[Russo|](README_RS.md)[Francese](README_FR.md) tradotta da questo stesso progetto.
|
10 |
Per tradurre questo progetto in qualsiasi lingua con GPT, leggere e eseguire [`multi_language.py`](multi_language.py) (sperimentale).
|
11 |
|
12 |
> **Nota**
|
|
|
17 |
>
|
18 |
> 3. Questo progetto è compatibile e incoraggia l'utilizzo di grandi modelli di linguaggio di produzione nazionale come chatglm, RWKV, Pangu ecc. Supporta la coesistenza di più api-key e può essere compilato nel file di configurazione come `API_KEY="openai-key1,openai-key2,api2d-key3"`. Per sostituire temporaneamente `API_KEY`, inserire `API_KEY` temporaneo nell'area di input e premere Invio per renderlo effettivo.
|
19 |
|
20 |
+
<div align="center">
|
21 |
+
|
22 |
+
Funzione | Descrizione
|
23 |
--- | ---
|
24 |
Correzione immediata | Supporta correzione immediata e ricerca degli errori di grammatica del documento con un solo clic
|
25 |
Traduzione cinese-inglese immediata | Traduzione cinese-inglese immediata con un solo clic
|
|
|
43 |
Supporto per maggiori modelli LLM, supporto API2D | Sentirsi serviti simultaneamente da GPT3.5, GPT4, [Tsinghua ChatGLM](https://github.com/THUDM/ChatGLM-6B), [Fudan MOSS](https://github.com/OpenLMLab/MOSS) deve essere una grande sensazione, giusto?
|
44 |
Ulteriori modelli LLM supportat,i supporto per l'implementazione di Huggingface | Aggiunta di un'interfaccia Newbing (Nuovo Bing), introdotta la compatibilità con Tsinghua [Jittorllms](https://github.com/Jittor/JittorLLMs), [LLaMA](https://github.com/facebookresearch/llama), [RWKV](https://github.com/BlinkDL/ChatRWKV) e [PanGu-α](https://openi.org.cn/pangu/)
|
45 |
Ulteriori dimostrazioni di nuove funzionalità (generazione di immagini, ecc.)... | Vedere la fine di questo documento...
|
46 |
+
</div>
|
47 |
+
|
48 |
|
49 |
- Nuova interfaccia (modificare l'opzione LAYOUT in `config.py` per passare dal layout a sinistra e a destra al layout superiore e inferiore)
|
50 |
<div align="center">
|
|
|
206 |
2. Plugin di funzione personalizzati
|
207 |
|
208 |
Scrivi plugin di funzione personalizzati e esegui tutte le attività che desideri o non hai mai pensato di fare.
|
209 |
+
La difficoltà di scrittura e debug dei plugin del nostro progetto è molto bassa. Se si dispone di una certa conoscenza di base di Python, è possibile realizzare la propria funzione del plugin seguendo il nostro modello. Per maggiori dettagli, consultare la [guida al plugin per funzioni](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97).
|
210 |
|
211 |
---
|
212 |
# Ultimo aggiornamento
|
213 |
+
## Nuove funzionalità dinamiche
|
214 |
+
|
215 |
+
1. Funzionalità di salvataggio della conversazione. Nell'area dei plugin della funzione, fare clic su "Salva la conversazione corrente" per salvare la conversazione corrente come file html leggibile e ripristinabile, inoltre, nell'area dei plugin della funzione (menu a discesa), fare clic su "Carica la cronologia della conversazione archiviata" per ripristinare la conversazione precedente. Suggerimento: fare clic su "Carica la cronologia della conversazione archiviata" senza specificare il file consente di visualizzare la cache degli archivi html di cronologia, fare clic su "Elimina tutti i record di cronologia delle conversazioni locali" per eliminare tutte le cache degli archivi html.
|
216 |
<div align="center">
|
217 |
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
218 |
</div>
|
|
|
313 |
# Altro:
|
314 |
https://github.com/gradio-app/gradio
|
315 |
https://github.com/fghrsh/live2d_demo
|
316 |
+
```
|
docs/README.md.Korean.md
CHANGED
@@ -17,7 +17,9 @@ GPT를 이용하여 프로젝트를 임의의 언어로 번역하려면 [`multi_
|
|
17 |
>
|
18 |
> 3. 이 프로젝트는 국내 언어 모델 chatglm과 RWKV, 판고 등의 시도와 호환 가능합니다. 여러 개의 api-key를 지원하며 설정 파일에 "API_KEY="openai-key1,openai-key2,api2d-key3""와 같이 작성할 수 있습니다. `API_KEY`를 임시로 변경해야하는 경우 입력 영역에 임시 `API_KEY`를 입력 한 후 엔터 키를 누르면 즉시 적용됩니다.
|
19 |
|
20 |
-
<div align="center"
|
|
|
|
|
21 |
--- | ---
|
22 |
원 키워드 | 원 키워드 및 논문 문법 오류를 찾는 기능 지원
|
23 |
한-영 키워드 | 한-영 키워드 지원
|
@@ -265,4 +267,4 @@ https://github.com/kaixindelele/ChatPaper
|
|
265 |
# 더 많은 :
|
266 |
https://github.com/gradio-app/gradio
|
267 |
https://github.com/fghrsh/live2d_demo
|
268 |
-
```
|
|
|
17 |
>
|
18 |
> 3. 이 프로젝트는 국내 언어 모델 chatglm과 RWKV, 판고 등의 시도와 호환 가능합니다. 여러 개의 api-key를 지원하며 설정 파일에 "API_KEY="openai-key1,openai-key2,api2d-key3""와 같이 작성할 수 있습니다. `API_KEY`를 임시로 변경해야하는 경우 입력 영역에 임시 `API_KEY`를 입력 한 후 엔터 키를 누르면 즉시 적용됩니다.
|
19 |
|
20 |
+
<div align="center">
|
21 |
+
|
22 |
+
기능 | 설명
|
23 |
--- | ---
|
24 |
원 키워드 | 원 키워드 및 논문 문법 오류를 찾는 기능 지원
|
25 |
한-영 키워드 | 한-영 키워드 지원
|
|
|
267 |
# 더 많은 :
|
268 |
https://github.com/gradio-app/gradio
|
269 |
https://github.com/fghrsh/live2d_demo
|
270 |
+
```
|
docs/README.md.Portuguese.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
>
|
3 |
> Ao instalar as dependências, por favor, selecione rigorosamente as versões **especificadas** no arquivo requirements.txt.
|
4 |
>
|
5 |
-
> `pip install -r requirements.txt
|
6 |
>
|
7 |
|
8 |
# <img src="logo.png" width="40" > Otimização acadêmica GPT (GPT Academic)
|
@@ -18,7 +18,9 @@ Para traduzir este projeto para qualquer idioma com o GPT, leia e execute [`mult
|
|
18 |
>
|
19 |
> 3. Este projeto é compatível com e incentiva o uso de modelos de linguagem nacionais, como chatglm e RWKV, Pangolin, etc. Suporta a coexistência de várias chaves de API e pode ser preenchido no arquivo de configuração como `API_KEY="openai-key1,openai-key2,api2d-key3"`. Quando precisar alterar temporariamente o `API_KEY`, basta digitar o `API_KEY` temporário na área de entrada e pressionar Enter para que ele entre em vigor.
|
20 |
|
21 |
-
<div align="center">
|
|
|
|
|
22 |
--- | ---
|
23 |
Um clique de polimento | Suporte a um clique polimento, um clique encontrar erros de gramática no artigo
|
24 |
Tradução chinês-inglês de um clique | Tradução chinês-inglês de um clique
|
@@ -216,7 +218,9 @@ Para mais detalhes, consulte o [Guia do plug-in de função.](https://github.com
|
|
216 |
|
217 |
---
|
218 |
# Última atualização
|
219 |
-
## Novas funções dinâmicas.
|
|
|
|
|
220 |
<div align="center">
|
221 |
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
222 |
</div>
|
@@ -317,4 +321,4 @@ https://github.com/kaixindelele/ChatPaper
|
|
317 |
# Mais:
|
318 |
https://github.com/gradio-app/gradio
|
319 |
https://github.com/fghrsh/live2d_demo
|
320 |
-
```
|
|
|
2 |
>
|
3 |
> Ao instalar as dependências, por favor, selecione rigorosamente as versões **especificadas** no arquivo requirements.txt.
|
4 |
>
|
5 |
+
> `pip install -r requirements.txt`
|
6 |
>
|
7 |
|
8 |
# <img src="logo.png" width="40" > Otimização acadêmica GPT (GPT Academic)
|
|
|
18 |
>
|
19 |
> 3. Este projeto é compatível com e incentiva o uso de modelos de linguagem nacionais, como chatglm e RWKV, Pangolin, etc. Suporta a coexistência de várias chaves de API e pode ser preenchido no arquivo de configuração como `API_KEY="openai-key1,openai-key2,api2d-key3"`. Quando precisar alterar temporariamente o `API_KEY`, basta digitar o `API_KEY` temporário na área de entrada e pressionar Enter para que ele entre em vigor.
|
20 |
|
21 |
+
<div align="center">
|
22 |
+
|
23 |
+
Funcionalidade | Descrição
|
24 |
--- | ---
|
25 |
Um clique de polimento | Suporte a um clique polimento, um clique encontrar erros de gramática no artigo
|
26 |
Tradução chinês-inglês de um clique | Tradução chinês-inglês de um clique
|
|
|
218 |
|
219 |
---
|
220 |
# Última atualização
|
221 |
+
## Novas funções dinâmicas.
|
222 |
+
|
223 |
+
1. Função de salvamento de diálogo. Ao chamar o plug-in de função "Salvar diálogo atual", é possível salvar o diálogo atual em um arquivo html legível e reversível. Além disso, ao chamar o plug-in de função "Carregar arquivo de histórico de diálogo" no menu suspenso da área de plug-in, é possível restaurar uma conversa anterior. Dica: clicar em "Carregar arquivo de histórico de diálogo" sem especificar um arquivo permite visualizar o cache do arquivo html de histórico. Clicar em "Excluir todo o registro de histórico de diálogo local" permite excluir todo o cache de arquivo html.
|
224 |
<div align="center">
|
225 |
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
226 |
</div>
|
|
|
321 |
# Mais:
|
322 |
https://github.com/gradio-app/gradio
|
323 |
https://github.com/fghrsh/live2d_demo
|
324 |
+
```
|
docs/translate_english.json
CHANGED
@@ -58,6 +58,8 @@
|
|
58 |
"连接网络回答问题": "ConnectToNetworkToAnswerQuestions",
|
59 |
"联网的ChatGPT": "ChatGPTConnectedToNetwork",
|
60 |
"解析任意code项目": "ParseAnyCodeProject",
|
|
|
|
|
61 |
"同时问询_指定模型": "InquireSimultaneously_SpecifiedModel",
|
62 |
"图片生成": "ImageGeneration",
|
63 |
"test_解析ipynb文件": "Test_ParseIpynbFile",
|
|
|
58 |
"连接网络回答问题": "ConnectToNetworkToAnswerQuestions",
|
59 |
"联网的ChatGPT": "ChatGPTConnectedToNetwork",
|
60 |
"解析任意code项目": "ParseAnyCodeProject",
|
61 |
+
"读取知识库作答": "ReadKnowledgeArchiveAnswerQuestions",
|
62 |
+
"知识库问答": "UpdateKnowledgeArchive",
|
63 |
"同时问询_指定模型": "InquireSimultaneously_SpecifiedModel",
|
64 |
"图片生成": "ImageGeneration",
|
65 |
"test_解析ipynb文件": "Test_ParseIpynbFile",
|
docs/use_azure.md
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 通过微软Azure云服务申请 Openai API
|
2 |
+
|
3 |
+
由于Openai和微软的关系,现在是可以通过微软的Azure云计算服务直接访问openai的api,免去了注册和网络的问题。
|
4 |
+
|
5 |
+
快速入门的官方文档的链接是:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
6 |
+
|
7 |
+
# 申请API
|
8 |
+
|
9 |
+
按文档中的“先决条件”的介绍,出了编程的环境以外,还需要以下三个条件:
|
10 |
+
|
11 |
+
1. Azure账号并创建订阅
|
12 |
+
|
13 |
+
2. 为订阅添加Azure OpenAI 服务
|
14 |
+
|
15 |
+
3. 部署模型
|
16 |
+
|
17 |
+
## Azure账号并创建订阅
|
18 |
+
|
19 |
+
### Azure账号
|
20 |
+
|
21 |
+
创建Azure的账号时最好是有微软的账号,这样似乎更容易获得免费额度(第一个月的200美元,实测了一下,如果用一个刚注册的微软账号登录Azure的话,并没有这一个月的免费额度)。
|
22 |
+
|
23 |
+
创建Azure账号的网址是:[立即创建 Azure 免费帐户 | Microsoft Azure](https://azure.microsoft.com/zh-cn/free/)
|
24 |
+
|
25 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_944786_iH6AECuZ_tY0EaBd_1685327219?w=1327\&h=695\&type=image/png)
|
26 |
+
|
27 |
+
打开网页后,点击 “免费开始使用” 会跳转到登录或注册页面,如果有微软的账户,直接登录即可,如果没有微软账户,那就需要到微软的网页再另行注册一个。
|
28 |
+
|
29 |
+
注意,Azure的页面和政策时不时会变化,已实际最新显示的为准就好。
|
30 |
+
|
31 |
+
### 创建订阅
|
32 |
+
|
33 |
+
注册好Azure后便可进入主页:
|
34 |
+
|
35 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_444847_tk-9S-pxOYuaLs_K_1685327675?w=1865\&h=969\&type=image/png)
|
36 |
+
|
37 |
+
首先需要在订阅里进行添加操作,点开后即可进入订阅的页面:
|
38 |
+
|
39 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_612820_z_1AlaEgnJR-rUl0_1685327892?w=1865\&h=969\&type=image/png)
|
40 |
+
|
41 |
+
第一次进来应该是空的,点添加即可创建新的订阅(可以是“免费”或者“即付即用”的订阅),其中订阅ID是后面申请Azure OpenAI需要使用的。
|
42 |
+
|
43 |
+
## 为订阅添加Azure OpenAI服务
|
44 |
+
|
45 |
+
之后回到首页,点Azure OpenAI即可进入OpenAI服务的页面(如果不显示的话,则在首页上方的搜索栏里搜索“openai”即可)。
|
46 |
+
|
47 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_269759_nExkGcPC0EuAR5cp_1685328130?w=1865\&h=969\&type=image/png)
|
48 |
+
|
49 |
+
不过现在这个服务还不能用。在使用前,还需要在这个网址申请一下:
|
50 |
+
|
51 |
+
[Request Access to Azure OpenAI Service (microsoft.com)](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu)
|
52 |
+
|
53 |
+
这里有二十来个问题,按照要求和自己的实际情况填写即可。
|
54 |
+
|
55 |
+
其中需要注意的是
|
56 |
+
|
57 |
+
1. 千万记得填对"订阅ID"
|
58 |
+
|
59 |
+
2. 需要填一个公司邮箱(可以不是注册用的邮箱)和公司网址
|
60 |
+
|
61 |
+
之后,在回到上面那个页面,点创建,就会进入创建页面了:
|
62 |
+
|
63 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_72708_9d9JYhylPVz3dFWL_1685328372?w=824\&h=590\&type=image/png)
|
64 |
+
|
65 |
+
需要填入“资源组”和“名称”,按照自己的需要填入即可。
|
66 |
+
|
67 |
+
完成后,在主页的“资源”里就可以看到刚才创建的“资源”了,点击进入后,就可以进行最后的部署了。
|
68 |
+
|
69 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_871541_CGCnbgtV9Uk1Jccy_1685329861?w=1217\&h=628\&type=image/png)
|
70 |
+
|
71 |
+
## 部署模型
|
72 |
+
|
73 |
+
进入资源页面后,在部署模型前,可以先点击“开发”,把密钥和终结点记下来。
|
74 |
+
|
75 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_852567_dxCZOrkMlWDSLH0d_1685330736?w=856\&h=568\&type=image/png)
|
76 |
+
|
77 |
+
之后,就可以去部署模型了,点击“部署”即可,会跳转到 Azure OpenAI Stuido 进行下面的操作:
|
78 |
+
|
79 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_169225_uWs1gMhpNbnwW4h2_1685329901?w=1865\&h=969\&type=image/png)
|
80 |
+
|
81 |
+
进入 Azure OpenAi Studio 后,点击新建部署,会弹出如下对话框:
|
82 |
+
|
83 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_391255_iXUSZAzoud5qlxjJ_1685330224?w=656\&h=641\&type=image/png)
|
84 |
+
|
85 |
+
在这里选 gpt-35-turbo 或需要的模型并按需要填入“部署名”即可完成模型的部署。
|
86 |
+
|
87 |
+
![](https://wdcdn.qpic.cn/MTY4ODg1Mjk4NzI5NTU1NQ_724099_vBaHcUilsm1EtPgK_1685330396?w=1869\&h=482\&type=image/png)
|
88 |
+
|
89 |
+
这个部署名需要记下来。
|
90 |
+
|
91 |
+
到现在为止,申请操作就完成了,需要记下来的有下面几个东西:
|
92 |
+
|
93 |
+
● 密钥(1或2都可以)
|
94 |
+
|
95 |
+
● 终结点
|
96 |
+
|
97 |
+
● 部署名(不是模型名)
|
98 |
+
|
99 |
+
# 修改 config.py
|
100 |
+
|
101 |
+
```
|
102 |
+
AZURE_ENDPOINT = "填入终结点"
|
103 |
+
AZURE_API_KEY = "填入azure openai api的密钥"
|
104 |
+
AZURE_API_VERSION = "2023-05-15" # 默认使用 2023-05-15 版本,无需修改
|
105 |
+
AZURE_ENGINE = "填入部署名"
|
106 |
+
|
107 |
+
```
|
108 |
+
# API的使用
|
109 |
+
|
110 |
+
接下来就是具体怎么使用API了,还是可以参考官方文档:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
111 |
+
|
112 |
+
和openai自己的api调用有点类似,都需要安装openai库,不同的是调用方式
|
113 |
+
|
114 |
+
```
|
115 |
+
import openai
|
116 |
+
openai.api_type = "azure" #固定格式,无需修改
|
117 |
+
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") #这里填入“终结点”
|
118 |
+
openai.api_version = "2023-05-15" #固定格式,无需修改
|
119 |
+
openai.api_key = os.getenv("AZURE_OPENAI_KEY") #这里填入“密钥1”或“密钥2”
|
120 |
+
|
121 |
+
response = openai.ChatCompletion.create(
|
122 |
+
engine="gpt-35-turbo", #这里填入的不是模型名,是部署名
|
123 |
+
messages=[
|
124 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
125 |
+
{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
|
126 |
+
{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
|
127 |
+
{"role": "user", "content": "Do other Azure Cognitive Services support this too?"}
|
128 |
+
]
|
129 |
+
)
|
130 |
+
|
131 |
+
print(response)
|
132 |
+
print(response['choices'][0]['message']['content'])
|
133 |
+
|
134 |
+
```
|
135 |
+
|
136 |
+
需要注意的是:
|
137 |
+
|
138 |
+
1. engine那里填入的是部署名,不是模型名
|
139 |
+
|
140 |
+
2. 通过openai库获得的这个 response 和通过 request 库访问 url 获得的 response 不同,不需要 decode,已经是解析好的 json 了,直接根据键值读取即可。
|
141 |
+
|
142 |
+
更细节的使用方法,详见官方API文档。
|
143 |
+
|
144 |
+
# 关于费用
|
145 |
+
|
146 |
+
Azure OpenAI API 还是需要一些费用的(免费订阅只有1个月有效期),费用如下:
|
147 |
+
|
148 |
+
![image.png](https://note.youdao.com/yws/res/18095/WEBRESOURCEeba0ab6d3127b79e143ef2d5627c0e44)
|
149 |
+
|
150 |
+
具体可以可以看这个网址 :[Azure OpenAI 服务 - 定价| Microsoft Azure](https://azure.microsoft.com/zh-cn/pricing/details/cognitive-services/openai-service/?cdn=disable)
|
151 |
+
|
152 |
+
并非网上说的什么“一年白嫖”,但注册方法以及网络问题都比直接使用openai的api要简单一些。
|
request_llm/bridge_all.py
CHANGED
@@ -16,6 +16,9 @@ from toolbox import get_conf, trimmed_format_exc
|
|
16 |
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
17 |
from .bridge_chatgpt import predict as chatgpt_ui
|
18 |
|
|
|
|
|
|
|
19 |
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
20 |
from .bridge_chatglm import predict as chatglm_ui
|
21 |
|
@@ -83,6 +86,33 @@ model_info = {
|
|
83 |
"tokenizer": tokenizer_gpt35,
|
84 |
"token_cnt": get_token_num_gpt35,
|
85 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
|
87 |
"gpt-4": {
|
88 |
"fn_with_ui": chatgpt_ui,
|
@@ -93,6 +123,16 @@ model_info = {
|
|
93 |
"token_cnt": get_token_num_gpt4,
|
94 |
},
|
95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
# api_2d
|
97 |
"api2d-gpt-3.5-turbo": {
|
98 |
"fn_with_ui": chatgpt_ui,
|
|
|
16 |
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
17 |
from .bridge_chatgpt import predict as chatgpt_ui
|
18 |
|
19 |
+
from .bridge_azure_test import predict_no_ui_long_connection as azure_noui
|
20 |
+
from .bridge_azure_test import predict as azure_ui
|
21 |
+
|
22 |
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
23 |
from .bridge_chatglm import predict as chatglm_ui
|
24 |
|
|
|
86 |
"tokenizer": tokenizer_gpt35,
|
87 |
"token_cnt": get_token_num_gpt35,
|
88 |
},
|
89 |
+
|
90 |
+
"gpt-3.5-turbo-16k": {
|
91 |
+
"fn_with_ui": chatgpt_ui,
|
92 |
+
"fn_without_ui": chatgpt_noui,
|
93 |
+
"endpoint": openai_endpoint,
|
94 |
+
"max_token": 1024*16,
|
95 |
+
"tokenizer": tokenizer_gpt35,
|
96 |
+
"token_cnt": get_token_num_gpt35,
|
97 |
+
},
|
98 |
+
|
99 |
+
"gpt-3.5-turbo-0613": {
|
100 |
+
"fn_with_ui": chatgpt_ui,
|
101 |
+
"fn_without_ui": chatgpt_noui,
|
102 |
+
"endpoint": openai_endpoint,
|
103 |
+
"max_token": 4096,
|
104 |
+
"tokenizer": tokenizer_gpt35,
|
105 |
+
"token_cnt": get_token_num_gpt35,
|
106 |
+
},
|
107 |
+
|
108 |
+
"gpt-3.5-turbo-16k-0613": {
|
109 |
+
"fn_with_ui": chatgpt_ui,
|
110 |
+
"fn_without_ui": chatgpt_noui,
|
111 |
+
"endpoint": openai_endpoint,
|
112 |
+
"max_token": 1024 * 16,
|
113 |
+
"tokenizer": tokenizer_gpt35,
|
114 |
+
"token_cnt": get_token_num_gpt35,
|
115 |
+
},
|
116 |
|
117 |
"gpt-4": {
|
118 |
"fn_with_ui": chatgpt_ui,
|
|
|
123 |
"token_cnt": get_token_num_gpt4,
|
124 |
},
|
125 |
|
126 |
+
# azure openai
|
127 |
+
"azure-gpt35":{
|
128 |
+
"fn_with_ui": azure_ui,
|
129 |
+
"fn_without_ui": azure_noui,
|
130 |
+
"endpoint": get_conf("AZURE_ENDPOINT"),
|
131 |
+
"max_token": 4096,
|
132 |
+
"tokenizer": tokenizer_gpt35,
|
133 |
+
"token_cnt": get_token_num_gpt35,
|
134 |
+
},
|
135 |
+
|
136 |
# api_2d
|
137 |
"api2d-gpt-3.5-turbo": {
|
138 |
"fn_with_ui": chatgpt_ui,
|
request_llm/bridge_azure_test.py
ADDED
@@ -0,0 +1,241 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
该文件中主要包含三个函数
|
3 |
+
|
4 |
+
不具备多线程能力的函数:
|
5 |
+
1. predict: 正常对话时使用,具备完备的交互功能,不可多线程
|
6 |
+
|
7 |
+
具备多线程调用能力的函数
|
8 |
+
2. predict_no_ui:高级实验性功能模块调用,不会实时显示在界面上,参数简单,可以多线程并行,方便实现复杂的功能逻辑
|
9 |
+
3. predict_no_ui_long_connection:在实验过程中发现调用predict_no_ui处理长文档时,和openai的连接容易断掉,这个函数用stream的方式解决这个问题,同样支持多线程
|
10 |
+
"""
|
11 |
+
|
12 |
+
import logging
|
13 |
+
import traceback
|
14 |
+
import importlib
|
15 |
+
import openai
|
16 |
+
import time
|
17 |
+
|
18 |
+
|
19 |
+
# 读取config.py文件中关于AZURE OPENAI API的信息
|
20 |
+
from toolbox import get_conf, update_ui, clip_history, trimmed_format_exc
|
21 |
+
TIMEOUT_SECONDS, MAX_RETRY, AZURE_ENGINE, AZURE_ENDPOINT, AZURE_API_VERSION, AZURE_API_KEY = \
|
22 |
+
get_conf('TIMEOUT_SECONDS', 'MAX_RETRY',"AZURE_ENGINE","AZURE_ENDPOINT", "AZURE_API_VERSION", "AZURE_API_KEY")
|
23 |
+
|
24 |
+
|
25 |
+
def get_full_error(chunk, stream_response):
|
26 |
+
"""
|
27 |
+
获取完整的从Openai返回的报错
|
28 |
+
"""
|
29 |
+
while True:
|
30 |
+
try:
|
31 |
+
chunk += next(stream_response)
|
32 |
+
except:
|
33 |
+
break
|
34 |
+
return chunk
|
35 |
+
|
36 |
+
def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_prompt='', stream = True, additional_fn=None):
|
37 |
+
"""
|
38 |
+
发送至azure openai api,流式获取输出。
|
39 |
+
用于基础的对话功能。
|
40 |
+
inputs 是本次问询的输入
|
41 |
+
top_p, temperature是chatGPT的内部调优参数
|
42 |
+
history 是之前的对话列表(注意无论是inputs还是history,内容太长了都会触发token数量溢出的错误)
|
43 |
+
chatbot 为WebUI中显示的对话列表,修改它,然后yeild出去,可以直接修改对话界面内容
|
44 |
+
additional_fn代表点击的哪个按钮,按钮见functional.py
|
45 |
+
"""
|
46 |
+
print(llm_kwargs["llm_model"])
|
47 |
+
|
48 |
+
if additional_fn is not None:
|
49 |
+
import core_functional
|
50 |
+
importlib.reload(core_functional) # 热更新prompt
|
51 |
+
core_functional = core_functional.get_core_functions()
|
52 |
+
if "PreProcess" in core_functional[additional_fn]: inputs = core_functional[additional_fn]["PreProcess"](inputs) # 获取预处理函数(如果有的话)
|
53 |
+
inputs = core_functional[additional_fn]["Prefix"] + inputs + core_functional[additional_fn]["Suffix"]
|
54 |
+
|
55 |
+
raw_input = inputs
|
56 |
+
logging.info(f'[raw_input] {raw_input}')
|
57 |
+
chatbot.append((inputs, ""))
|
58 |
+
yield from update_ui(chatbot=chatbot, history=history, msg="等待响应") # 刷新界面
|
59 |
+
|
60 |
+
|
61 |
+
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream)
|
62 |
+
|
63 |
+
history.append(inputs); history.append("")
|
64 |
+
|
65 |
+
retry = 0
|
66 |
+
while True:
|
67 |
+
try:
|
68 |
+
|
69 |
+
openai.api_type = "azure"
|
70 |
+
openai.api_version = AZURE_API_VERSION
|
71 |
+
openai.api_base = AZURE_ENDPOINT
|
72 |
+
openai.api_key = AZURE_API_KEY
|
73 |
+
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
74 |
+
|
75 |
+
except:
|
76 |
+
retry += 1
|
77 |
+
chatbot[-1] = ((chatbot[-1][0], "获取response失败,重试中。。。"))
|
78 |
+
retry_msg = f",正在重试 ({retry}/{MAX_RETRY}) ……" if MAX_RETRY > 0 else ""
|
79 |
+
yield from update_ui(chatbot=chatbot, history=history, msg="请求超时"+retry_msg) # 刷新界面
|
80 |
+
if retry > MAX_RETRY: raise TimeoutError
|
81 |
+
|
82 |
+
gpt_replying_buffer = ""
|
83 |
+
is_head_of_the_stream = True
|
84 |
+
if stream:
|
85 |
+
|
86 |
+
stream_response = response
|
87 |
+
|
88 |
+
while True:
|
89 |
+
try:
|
90 |
+
chunk = next(stream_response)
|
91 |
+
|
92 |
+
except StopIteration:
|
93 |
+
from toolbox import regular_txt_to_markdown; tb_str = '```\n' + trimmed_format_exc() + '```'
|
94 |
+
chatbot[-1] = (chatbot[-1][0], f"[Local Message] 远程返回错误: \n\n{tb_str} \n\n{regular_txt_to_markdown(chunk)}")
|
95 |
+
yield from update_ui(chatbot=chatbot, history=history, msg="远程返回错误:" + chunk) # 刷新界面
|
96 |
+
return
|
97 |
+
|
98 |
+
if is_head_of_the_stream and (r'"object":"error"' not in chunk):
|
99 |
+
# 数据流的第一帧不携带content
|
100 |
+
is_head_of_the_stream = False; continue
|
101 |
+
|
102 |
+
if chunk:
|
103 |
+
#print(chunk)
|
104 |
+
try:
|
105 |
+
if "delta" in chunk["choices"][0]:
|
106 |
+
if chunk["choices"][0]["finish_reason"] == "stop":
|
107 |
+
logging.info(f'[response] {gpt_replying_buffer}')
|
108 |
+
break
|
109 |
+
status_text = f"finish_reason: {chunk['choices'][0]['finish_reason']}"
|
110 |
+
gpt_replying_buffer = gpt_replying_buffer + chunk["choices"][0]["delta"]["content"]
|
111 |
+
|
112 |
+
history[-1] = gpt_replying_buffer
|
113 |
+
chatbot[-1] = (history[-2], history[-1])
|
114 |
+
yield from update_ui(chatbot=chatbot, history=history, msg=status_text) # 刷新界面
|
115 |
+
|
116 |
+
except Exception as e:
|
117 |
+
traceback.print_exc()
|
118 |
+
yield from update_ui(chatbot=chatbot, history=history, msg="Json解析不合常规") # 刷新界面
|
119 |
+
chunk = get_full_error(chunk, stream_response)
|
120 |
+
|
121 |
+
error_msg = chunk
|
122 |
+
yield from update_ui(chatbot=chatbot, history=history, msg="Json异常" + error_msg) # 刷新界面
|
123 |
+
return
|
124 |
+
|
125 |
+
|
126 |
+
def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None, console_slience=False):
|
127 |
+
"""
|
128 |
+
发送至AZURE OPENAI API,等待回复,一次性完成,不显示中间过程。但内部用stream的方法避免中途网线被掐。
|
129 |
+
inputs:
|
130 |
+
是本次问询的输入
|
131 |
+
sys_prompt:
|
132 |
+
系统静默prompt
|
133 |
+
llm_kwargs:
|
134 |
+
chatGPT的内部调优参数
|
135 |
+
history:
|
136 |
+
是之前的对话列表
|
137 |
+
observe_window = None:
|
138 |
+
用于负责跨越线程传递已经输出的部分,大部分时候仅仅为了fancy的视觉效果,留空即可。observe_window[0]:观测窗。observe_window[1]:看门狗
|
139 |
+
"""
|
140 |
+
watch_dog_patience = 5 # 看门狗的耐心, 设置5秒即可
|
141 |
+
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=True)
|
142 |
+
retry = 0
|
143 |
+
while True:
|
144 |
+
|
145 |
+
try:
|
146 |
+
openai.api_type = "azure"
|
147 |
+
openai.api_version = AZURE_API_VERSION
|
148 |
+
openai.api_base = AZURE_ENDPOINT
|
149 |
+
openai.api_key = AZURE_API_KEY
|
150 |
+
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
151 |
+
|
152 |
+
except:
|
153 |
+
retry += 1
|
154 |
+
traceback.print_exc()
|
155 |
+
if retry > MAX_RETRY: raise TimeoutError
|
156 |
+
if MAX_RETRY!=0: print(f'请求超时,正在重试 ({retry}/{MAX_RETRY}) ……')
|
157 |
+
|
158 |
+
|
159 |
+
stream_response = response
|
160 |
+
result = ''
|
161 |
+
while True:
|
162 |
+
try: chunk = next(stream_response)
|
163 |
+
except StopIteration:
|
164 |
+
break
|
165 |
+
except:
|
166 |
+
chunk = next(stream_response) # 失败了,重试一次?再失败就没办法了。
|
167 |
+
|
168 |
+
if len(chunk)==0: continue
|
169 |
+
if not chunk.startswith('data:'):
|
170 |
+
error_msg = get_full_error(chunk, stream_response)
|
171 |
+
if "reduce the length" in error_msg:
|
172 |
+
raise ConnectionAbortedError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
173 |
+
else:
|
174 |
+
raise RuntimeError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
175 |
+
if ('data: [DONE]' in chunk): break
|
176 |
+
|
177 |
+
delta = chunk["delta"]
|
178 |
+
if len(delta) == 0: break
|
179 |
+
if "role" in delta: continue
|
180 |
+
if "content" in delta:
|
181 |
+
result += delta["content"]
|
182 |
+
if not console_slience: print(delta["content"], end='')
|
183 |
+
if observe_window is not None:
|
184 |
+
# 观测窗,把已经获取的数据显示出去
|
185 |
+
if len(observe_window) >= 1: observe_window[0] += delta["content"]
|
186 |
+
# 看门狗,如果超过期限没有喂狗,则终止
|
187 |
+
if len(observe_window) >= 2:
|
188 |
+
if (time.time()-observe_window[1]) > watch_dog_patience:
|
189 |
+
raise RuntimeError("用户取消了程序。")
|
190 |
+
else: raise RuntimeError("意外Json结构:"+delta)
|
191 |
+
if chunk['finish_reason'] == 'length':
|
192 |
+
raise ConnectionAbortedError("正常结束,但显示Token不足,导致输出不完整,请削减单次输入的文本量。")
|
193 |
+
return result
|
194 |
+
|
195 |
+
|
196 |
+
def generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream):
|
197 |
+
"""
|
198 |
+
整合所有信息,选择LLM模型,生成 azure openai api请求,为发送请求做准备
|
199 |
+
"""
|
200 |
+
|
201 |
+
conversation_cnt = len(history) // 2
|
202 |
+
|
203 |
+
messages = [{"role": "system", "content": system_prompt}]
|
204 |
+
if conversation_cnt:
|
205 |
+
for index in range(0, 2*conversation_cnt, 2):
|
206 |
+
what_i_have_asked = {}
|
207 |
+
what_i_have_asked["role"] = "user"
|
208 |
+
what_i_have_asked["content"] = history[index]
|
209 |
+
what_gpt_answer = {}
|
210 |
+
what_gpt_answer["role"] = "assistant"
|
211 |
+
what_gpt_answer["content"] = history[index+1]
|
212 |
+
if what_i_have_asked["content"] != "":
|
213 |
+
if what_gpt_answer["content"] == "": continue
|
214 |
+
messages.append(what_i_have_asked)
|
215 |
+
messages.append(what_gpt_answer)
|
216 |
+
else:
|
217 |
+
messages[-1]['content'] = what_gpt_answer['content']
|
218 |
+
|
219 |
+
what_i_ask_now = {}
|
220 |
+
what_i_ask_now["role"] = "user"
|
221 |
+
what_i_ask_now["content"] = inputs
|
222 |
+
messages.append(what_i_ask_now)
|
223 |
+
|
224 |
+
payload = {
|
225 |
+
"model": llm_kwargs['llm_model'],
|
226 |
+
"messages": messages,
|
227 |
+
"temperature": llm_kwargs['temperature'], # 1.0,
|
228 |
+
"top_p": llm_kwargs['top_p'], # 1.0,
|
229 |
+
"n": 1,
|
230 |
+
"stream": stream,
|
231 |
+
"presence_penalty": 0,
|
232 |
+
"frequency_penalty": 0,
|
233 |
+
"engine": AZURE_ENGINE
|
234 |
+
}
|
235 |
+
try:
|
236 |
+
print(f" {llm_kwargs['llm_model']} : {conversation_cnt} : {inputs[:100]} ..........")
|
237 |
+
except:
|
238 |
+
print('输入中可能存在乱码。')
|
239 |
+
return payload
|
240 |
+
|
241 |
+
|
toolbox.py
CHANGED
@@ -1,11 +1,12 @@
|
|
1 |
import markdown
|
2 |
import importlib
|
3 |
-
import
|
4 |
import inspect
|
5 |
import re
|
6 |
import os
|
7 |
from latex2mathml.converter import convert as tex2mathml
|
8 |
from functools import wraps, lru_cache
|
|
|
9 |
|
10 |
"""
|
11 |
========================================================================
|
@@ -70,6 +71,17 @@ def update_ui(chatbot, history, msg='正常', **kwargs): # 刷新界面
|
|
70 |
assert isinstance(chatbot, ChatBotWithCookies), "在传递chatbot的过程中不要将其丢弃。必要时,可用clear将其清空,然后用for+append循环重新赋值。"
|
71 |
yield chatbot.get_cookies(), chatbot, history, msg
|
72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
def trimmed_format_exc():
|
74 |
import os, traceback
|
75 |
str = traceback.format_exc()
|
@@ -83,7 +95,7 @@ def CatchException(f):
|
|
83 |
"""
|
84 |
|
85 |
@wraps(f)
|
86 |
-
def decorated(txt, top_p, temperature, chatbot, history, systemPromptTxt, WEB_PORT):
|
87 |
try:
|
88 |
yield from f(txt, top_p, temperature, chatbot, history, systemPromptTxt, WEB_PORT)
|
89 |
except Exception as e:
|
@@ -210,16 +222,21 @@ def text_divide_paragraph(text):
|
|
210 |
"""
|
211 |
将文本按照段落分隔符分割开,生成带有段落标签的HTML代码。
|
212 |
"""
|
|
|
|
|
|
|
|
|
|
|
213 |
if '```' in text:
|
214 |
# careful input
|
215 |
-
return text
|
216 |
else:
|
217 |
# wtf input
|
218 |
lines = text.split("\n")
|
219 |
for i, line in enumerate(lines):
|
220 |
lines[i] = lines[i].replace(" ", " ")
|
221 |
text = "</br>".join(lines)
|
222 |
-
return text
|
223 |
|
224 |
@lru_cache(maxsize=128) # 使用 lru缓存 加快转换速度
|
225 |
def markdown_convertion(txt):
|
@@ -331,8 +348,11 @@ def format_io(self, y):
|
|
331 |
if y is None or y == []:
|
332 |
return []
|
333 |
i_ask, gpt_reply = y[-1]
|
334 |
-
|
335 |
-
|
|
|
|
|
|
|
336 |
y[-1] = (
|
337 |
None if i_ask is None else markdown.markdown(i_ask, extensions=['fenced_code', 'tables']),
|
338 |
None if gpt_reply is None else markdown_convertion(gpt_reply)
|
@@ -380,7 +400,7 @@ def extract_archive(file_path, dest_dir):
|
|
380 |
print("Successfully extracted rar archive to {}".format(dest_dir))
|
381 |
except:
|
382 |
print("Rar format requires additional dependencies to install")
|
383 |
-
return '\n\n需要安装pip install rarfile来解压rar文件'
|
384 |
|
385 |
# 第三方库,需要预先pip install py7zr
|
386 |
elif file_extension == '.7z':
|
@@ -391,7 +411,7 @@ def extract_archive(file_path, dest_dir):
|
|
391 |
print("Successfully extracted 7z archive to {}".format(dest_dir))
|
392 |
except:
|
393 |
print("7z format requires additional dependencies to install")
|
394 |
-
return '\n\n需要安装pip install py7zr来解压7z文件'
|
395 |
else:
|
396 |
return ''
|
397 |
return ''
|
@@ -420,6 +440,17 @@ def find_recent_files(directory):
|
|
420 |
|
421 |
return recent_files
|
422 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
423 |
|
424 |
def on_file_uploaded(files, chatbot, txt, txt2, checkboxes):
|
425 |
"""
|
@@ -459,14 +490,20 @@ def on_file_uploaded(files, chatbot, txt, txt2, checkboxes):
|
|
459 |
return chatbot, txt, txt2
|
460 |
|
461 |
|
462 |
-
def on_report_generated(files, chatbot):
|
463 |
from toolbox import find_recent_files
|
464 |
-
|
|
|
|
|
|
|
|
|
465 |
if len(report_files) == 0:
|
466 |
return None, chatbot
|
467 |
# files.extend(report_files)
|
468 |
-
|
469 |
-
|
|
|
|
|
470 |
|
471 |
def is_openai_api_key(key):
|
472 |
API_MATCH_ORIGINAL = re.match(r"sk-[a-zA-Z0-9]{48}$", key)
|
@@ -728,6 +765,8 @@ def clip_history(inputs, history, tokenizer, max_token_limit):
|
|
728 |
其他小工具:
|
729 |
- zip_folder: 把某个路径下所有文件压缩,然后转移到指定的另一个路径中(gpt写的)
|
730 |
- gen_time_str: 生成时间戳
|
|
|
|
|
731 |
========================================================================
|
732 |
"""
|
733 |
|
@@ -762,11 +801,16 @@ def zip_folder(source_folder, dest_folder, zip_name):
|
|
762 |
|
763 |
print(f"Zip file created at {zip_file}")
|
764 |
|
|
|
|
|
|
|
|
|
|
|
|
|
765 |
def gen_time_str():
|
766 |
import time
|
767 |
return time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
768 |
|
769 |
-
|
770 |
class ProxyNetworkActivate():
|
771 |
"""
|
772 |
这段代码定义了一个名为TempProxy的空上下文管理器, 用于给一小段代码上代理
|
@@ -775,12 +819,27 @@ class ProxyNetworkActivate():
|
|
775 |
from toolbox import get_conf
|
776 |
proxies, = get_conf('proxies')
|
777 |
if 'no_proxy' in os.environ: os.environ.pop('no_proxy')
|
778 |
-
|
779 |
-
|
|
|
780 |
return self
|
781 |
|
782 |
def __exit__(self, exc_type, exc_value, traceback):
|
783 |
os.environ['no_proxy'] = '*'
|
784 |
if 'HTTP_PROXY' in os.environ: os.environ.pop('HTTP_PROXY')
|
785 |
if 'HTTPS_PROXY' in os.environ: os.environ.pop('HTTPS_PROXY')
|
786 |
-
return
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
import markdown
|
2 |
import importlib
|
3 |
+
import time
|
4 |
import inspect
|
5 |
import re
|
6 |
import os
|
7 |
from latex2mathml.converter import convert as tex2mathml
|
8 |
from functools import wraps, lru_cache
|
9 |
+
pj = os.path.join
|
10 |
|
11 |
"""
|
12 |
========================================================================
|
|
|
71 |
assert isinstance(chatbot, ChatBotWithCookies), "在传递chatbot的过程中不要将其丢弃。必要时,可用clear将其清空,然后用for+append循环重新赋值。"
|
72 |
yield chatbot.get_cookies(), chatbot, history, msg
|
73 |
|
74 |
+
def update_ui_lastest_msg(lastmsg, chatbot, history, delay=1): # 刷新界面
|
75 |
+
"""
|
76 |
+
刷新用户界面
|
77 |
+
"""
|
78 |
+
if len(chatbot) == 0: chatbot.append(["update_ui_last_msg", lastmsg])
|
79 |
+
chatbot[-1] = list(chatbot[-1])
|
80 |
+
chatbot[-1][-1] = lastmsg
|
81 |
+
yield from update_ui(chatbot=chatbot, history=history)
|
82 |
+
time.sleep(delay)
|
83 |
+
|
84 |
+
|
85 |
def trimmed_format_exc():
|
86 |
import os, traceback
|
87 |
str = traceback.format_exc()
|
|
|
95 |
"""
|
96 |
|
97 |
@wraps(f)
|
98 |
+
def decorated(txt, top_p, temperature, chatbot, history, systemPromptTxt, WEB_PORT=-1):
|
99 |
try:
|
100 |
yield from f(txt, top_p, temperature, chatbot, history, systemPromptTxt, WEB_PORT)
|
101 |
except Exception as e:
|
|
|
222 |
"""
|
223 |
将文本按照段落分隔符分割开,生成带有段落标签的HTML代码。
|
224 |
"""
|
225 |
+
pre = '<div class="markdown-body">'
|
226 |
+
suf = '</div>'
|
227 |
+
if text.startswith(pre) and text.endswith(suf):
|
228 |
+
return text
|
229 |
+
|
230 |
if '```' in text:
|
231 |
# careful input
|
232 |
+
return pre + text + suf
|
233 |
else:
|
234 |
# wtf input
|
235 |
lines = text.split("\n")
|
236 |
for i, line in enumerate(lines):
|
237 |
lines[i] = lines[i].replace(" ", " ")
|
238 |
text = "</br>".join(lines)
|
239 |
+
return pre + text + suf
|
240 |
|
241 |
@lru_cache(maxsize=128) # 使用 lru缓存 加快转换速度
|
242 |
def markdown_convertion(txt):
|
|
|
348 |
if y is None or y == []:
|
349 |
return []
|
350 |
i_ask, gpt_reply = y[-1]
|
351 |
+
# 输入部分太自由,预处理一波
|
352 |
+
if i_ask is not None: i_ask = text_divide_paragraph(i_ask)
|
353 |
+
# 当代码输出半截的时候,试着补上后个```
|
354 |
+
if gpt_reply is not None: gpt_reply = close_up_code_segment_during_stream(gpt_reply)
|
355 |
+
# process
|
356 |
y[-1] = (
|
357 |
None if i_ask is None else markdown.markdown(i_ask, extensions=['fenced_code', 'tables']),
|
358 |
None if gpt_reply is None else markdown_convertion(gpt_reply)
|
|
|
400 |
print("Successfully extracted rar archive to {}".format(dest_dir))
|
401 |
except:
|
402 |
print("Rar format requires additional dependencies to install")
|
403 |
+
return '\n\n解压失败! 需要安装pip install rarfile来解压rar文件'
|
404 |
|
405 |
# 第三方库,需要预先pip install py7zr
|
406 |
elif file_extension == '.7z':
|
|
|
411 |
print("Successfully extracted 7z archive to {}".format(dest_dir))
|
412 |
except:
|
413 |
print("7z format requires additional dependencies to install")
|
414 |
+
return '\n\n解压失败! 需要安装pip install py7zr来解压7z文件'
|
415 |
else:
|
416 |
return ''
|
417 |
return ''
|
|
|
440 |
|
441 |
return recent_files
|
442 |
|
443 |
+
def promote_file_to_downloadzone(file, rename_file=None, chatbot=None):
|
444 |
+
# 将文件复制一份到下载区
|
445 |
+
import shutil
|
446 |
+
if rename_file is None: rename_file = f'{gen_time_str()}-{os.path.basename(file)}'
|
447 |
+
new_path = os.path.join(f'./gpt_log/', rename_file)
|
448 |
+
if os.path.exists(new_path) and not os.path.samefile(new_path, file): os.remove(new_path)
|
449 |
+
if not os.path.exists(new_path): shutil.copyfile(file, new_path)
|
450 |
+
if chatbot:
|
451 |
+
if 'file_to_promote' in chatbot._cookies: current = chatbot._cookies['file_to_promote']
|
452 |
+
else: current = []
|
453 |
+
chatbot._cookies.update({'file_to_promote': [new_path] + current})
|
454 |
|
455 |
def on_file_uploaded(files, chatbot, txt, txt2, checkboxes):
|
456 |
"""
|
|
|
490 |
return chatbot, txt, txt2
|
491 |
|
492 |
|
493 |
+
def on_report_generated(cookies, files, chatbot):
|
494 |
from toolbox import find_recent_files
|
495 |
+
if 'file_to_promote' in cookies:
|
496 |
+
report_files = cookies['file_to_promote']
|
497 |
+
cookies.pop('file_to_promote')
|
498 |
+
else:
|
499 |
+
report_files = find_recent_files('gpt_log')
|
500 |
if len(report_files) == 0:
|
501 |
return None, chatbot
|
502 |
# files.extend(report_files)
|
503 |
+
file_links = ''
|
504 |
+
for f in report_files: file_links += f'<br/><a href="file={os.path.abspath(f)}" target="_blank">{f}</a>'
|
505 |
+
chatbot.append(['报告如何远程获取?', f'报告已经添加到右侧“文件上传区”(可能处于折叠状态),请查收。{file_links}'])
|
506 |
+
return cookies, report_files, chatbot
|
507 |
|
508 |
def is_openai_api_key(key):
|
509 |
API_MATCH_ORIGINAL = re.match(r"sk-[a-zA-Z0-9]{48}$", key)
|
|
|
765 |
其他小工具:
|
766 |
- zip_folder: 把某个路径下所有文件压缩,然后转移到指定的另一个路径中(gpt写的)
|
767 |
- gen_time_str: 生成时间戳
|
768 |
+
- ProxyNetworkActivate: 临时地启动代理网络(如果有)
|
769 |
+
- objdump/objload: 快捷的调试函数
|
770 |
========================================================================
|
771 |
"""
|
772 |
|
|
|
801 |
|
802 |
print(f"Zip file created at {zip_file}")
|
803 |
|
804 |
+
def zip_result(folder):
|
805 |
+
import time
|
806 |
+
t = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
807 |
+
zip_folder(folder, './gpt_log/', f'{t}-result.zip')
|
808 |
+
return pj('./gpt_log/', f'{t}-result.zip')
|
809 |
+
|
810 |
def gen_time_str():
|
811 |
import time
|
812 |
return time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
813 |
|
|
|
814 |
class ProxyNetworkActivate():
|
815 |
"""
|
816 |
这段代码定义了一个名为TempProxy的空上下文管理器, 用于给一小段代码上代理
|
|
|
819 |
from toolbox import get_conf
|
820 |
proxies, = get_conf('proxies')
|
821 |
if 'no_proxy' in os.environ: os.environ.pop('no_proxy')
|
822 |
+
if proxies is not None:
|
823 |
+
if 'http' in proxies: os.environ['HTTP_PROXY'] = proxies['http']
|
824 |
+
if 'https' in proxies: os.environ['HTTPS_PROXY'] = proxies['https']
|
825 |
return self
|
826 |
|
827 |
def __exit__(self, exc_type, exc_value, traceback):
|
828 |
os.environ['no_proxy'] = '*'
|
829 |
if 'HTTP_PROXY' in os.environ: os.environ.pop('HTTP_PROXY')
|
830 |
if 'HTTPS_PROXY' in os.environ: os.environ.pop('HTTPS_PROXY')
|
831 |
+
return
|
832 |
+
|
833 |
+
def objdump(obj, file='objdump.tmp'):
|
834 |
+
import pickle
|
835 |
+
with open(file, 'wb+') as f:
|
836 |
+
pickle.dump(obj, f)
|
837 |
+
return
|
838 |
+
|
839 |
+
def objload(file='objdump.tmp'):
|
840 |
+
import pickle, os
|
841 |
+
if not os.path.exists(file):
|
842 |
+
return
|
843 |
+
with open(file, 'rb') as f:
|
844 |
+
return pickle.load(f)
|
845 |
+
|
version
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"version": 3.
|
3 |
"show_feature": true,
|
4 |
-
"new_feature": "修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持
|
5 |
}
|
|
|
1 |
{
|
2 |
+
"version": 3.42,
|
3 |
"show_feature": true,
|
4 |
+
"new_feature": "完善本地Latex矫错和翻译功能 <-> 增加gpt-3.5-16k的支持 <-> 新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
|
5 |
}
|