Spaces:

KashiwaByte
/

SparkDebate-V2.0

Runtime error

App Files Files Community

KashiwaByte commited on Sep 9, 2023

Commit

b994311

1 Parent(s): 0a1500d

Upload 12 files

Browse files

Files changed (11) hide show

README.md +4 -5
app.py +218 -0
demo.py +6 -0
model/text2vec_ernie/config.json +20 -0
model/text2vec_ernie/vocab.txt +0 -0
requirements.txt +4 -0
utils/API.py +244 -0
utils/__pycache__/API.cpython-310.pyc +0 -0
utils/__pycache__/tools.cpython-310.pyc +0 -0
utils/__pycache__/tools.cpython-37.pyc +0 -0
utils/tools.py +119 -0

README.md CHANGED Viewed

@@ -1,13 +1,12 @@
 ---
-title: SparkDebate V2.0
-emoji: 👁
-colorFrom: yellow
 colorTo: yellow
 sdk: gradio
-sdk_version: 3.43.2
 app_file: app.py
 pinned: false
-license: apache-2.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: SparkDebate
+emoji: 🌖
+colorFrom: pink
 colorTo: yellow
 sdk: gradio
+sdk_version: 3.39.0
 app_file: app.py
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,218 @@

+from langchain.memory import ConversationSummaryBufferMemory
+from langchain.chains import ConversationChain
+from langchain.chains import RetrievalQA
+from utils.API import Spark_forlangchain
+import gradio as gr
+from langchain.prompts import ChatPromptTemplate
+from langchain.document_loaders import TextLoader
+from langchain.embeddings.huggingface import HuggingFaceEmbeddings
+from langchain.vectorstores import FAISS
+import sentence_transformers
+def init_knowledge_vector_store(filepath):
+    EMBEDDING_MODEL = "model/text2vec_ernie/"
+    embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)
+    embeddings.client = sentence_transformers.SentenceTransformer(
+        embeddings.model_name, device='cuda')
+    loader = TextLoader(filepath)
+    docs = loader.load()
+    vector_store = FAISS.from_documents(docs, embeddings)
+    return vector_store
+template_1 = """
+你是一个资深辩手，你的辩论风格是{style}，你确定辩论战略需要考虑以下10个方面:
+1. 分析辩题性质
+判断辩题是判断型还是比较型,明确需要论证的核心观点。回答中必须包含题是判断型还是比较型。
+2. 判断正反方定位
+大致判断哪一方更容易证成立,存在明显优劣势。回答中必须需给出谁更主流，更容易成立。
+3. 设想核心争议点
+思考双方可能存在分歧和交锋的主要争议点。回答中需要明确给出至少三个争议点。
+4. 论证框架
+设计初步的论证框架,包括定义、标准、论点等。回答中需要明确按以下格式给出论证框架：正方：标准是XXX，论点1是XXX，论点2是XXX。反方：标准是XXX，论点1是XXX，论点2是XXX。（论点至少要两个）
+5. 优势论域
+确定自己方更容易取得优势的论域的诠释点。回答中必须详细给出双方的优势论域并给出理由。
+6. 数据准备
+提前准备论证所需的证据数据。回答中必须给出对论证起作用的数据，如相关国家合法化情况与对社会影响的数据
+7. 情境假设
+设想场景和例子以备交锋时使用。回答中必须至少给出正反双方情境，各三个。
+8. 语境处理
+考虑如何处理语境环境,为自己创造有利条件。回答中必须举出正反方的语境，各三个。
+9. 质询角度
+提前想好可能的质询角度,对应对方的论点。回答中需要给出详细的分析并试着举出例子，各三个。
+10. 重点突破
+找到对方可能论证的薄弱点,准备重点突破。回答中需要举出正反双方薄弱点分别在哪里，应该如何突破。
+通过上述分析,可以确定一个明确有针对性的辩论战略.
+接下来我会给你一个具体的辩题，你需要基于以上10个原则依次回答。
+///辩题内容如下：{text}///
+"""
+template_2 = """
+你是一个资深辩手，你的辩论风格是{style}，你立论会遵循以下的立论原则，总共5个原则:
+1.定义明确
+对关键词进行明确且合理的定义,这是展开论证的基础。
+2.标准清晰
+设置公正合理的判断标准,标准要具体明确,为论点比较提供依据。你的回答中必须包含标准。
+3.论点匹配
+论点要能有效支撑并印证标准,与标准和立场高度契合。你的回答中必须包含支撑印证标准的论点。
+4.论据具体
+提供具体可信的论据支撑每个论点,使之更有说服力。你的论点必须要论据支撑。
+5.情境适用
+引入情境和例子,使复杂观点容易被听众接受。你的回答可以适当包含情境
+接下来会给你一个题目和持方。
+///题目与持方如下：{text}///
+你需要遵循以上五个立论原则立论，并且立论稿有以下要求：
+1.以一个专业辩手的口吻做开场白。
+2.总字数为1200字。
+3.第一段需要包含以下三个部分 给出持方，对名词做出简单解释，给出标准，标准只能有一个。
+4.第二段是第一个论点，论点需要围绕标准，阐述完论点后需要提出论据，最好是数据论据和学理论据，提出论据后需要做出解释来进行论证。参照以下流程：论点1+数据论据+数据论据的论证+学理论据+学理论据的论证。本段需要非常详细。
+5.第三段是第二个论点，论点需要围绕标准，本段第一句话就要阐明论点是什么，阐述完论点后需要提出论据，最好是数据论据和学理论据，提出论据后需要做出解释来进行论证。参照以下流程：论点2+数据论据+数据论据的论证+学理论据+学理论据的论证。本段需要非常详细。
+6.最后一段只需要用一句话再重复一遍己方的立场：“综上我方坚定认为XXX”。XXX为立场。
+7.立论稿中需要把上述内容衔接流畅。
+"""
+template_3 = """
+你是一个资深的逻辑性很强的顶级辩手，你的辩论风格是{style}，请对我的陈述进行反驳，越详细越好，反驳需要逐条反驳观点和论据，并且要给出详细的理由，质疑数据论据要用上常用的方法和句式，从数据合理性，样本代表性，统计方法，数据解读等多个角度进行考虑。质疑学理论据要从权威性，解读方式，是否有对抗学理等多个角度进行考虑。
+///如下是我们的话题以及我的观点：{text}///
+"""
+template_4 = """
+你是一个资深辩手，你的辩论风格是{style}，你需要根据我给出的话题提出观点并且要有数据论据和学理论据作为论证且总字数不少于400字，你的发言格式为：我们的话题是什么，我持什么观点，我的理由是XXX，因为某某数据，又因为某某学理。参照如下范例：||
+我们的话题是人工智能对人类工作的影响。我持的观点是，人工智能将导致大量的就业机会减少。我的理由是，根据国际数据公司(IDC)的报告，到2025年，全球约有3.75亿个工作岗位将被自动化技术取代。同时，人工智能的发展也将带来新的就业机会，如AI工程师、数据科学家等。
+首先，让我们从数据角度来看。根据美国劳工统计局(BLS)的数据，自20世纪90年代以来，美国的工作岗位流失率一直在上升。其中，自动化和计算机化在一定程度上对就业市场产生了负面影响。此外，根据麦肯锡全球研究院的预测，到2030年，人工智能可能会在全球范围内导致8000万至1.6亿个工作岗位的消失。
+其次，从学理角度来看，人工智能的发展是基于算法和大数据的。然而，这些算法和数据往往受到人为因素的影响，可能导致错误的决策和预测。例如，2016年在美国总统选举期间，一家名为“剑桥分析”的公司利用大数据分析和选民心理研究，为特朗普竞选团队提供了策略支持。这一事件表明，人工智能在某些情况下可能会被用于不道德的目的。||
+///我们本次讨论的话题是{text}///
+"""
+template_5 = """
+你是一个资深的逻辑性很强的顶级辩手，你的辩论风格是{style}，可以与我进行辩论训练，你很擅长质询总是一针见血，而且也很擅长使用类比来归谬我的观点，你熟练的掌握各种数据质询的技巧。现在你要与我进行对辩
+我的陈述如下：///{text}///
+请对我的陈述进行反驳，越详细越好，反驳需要逐条反驳观点和论据，并且要给出详细的理由，质疑数据论据要用上常用的方法和句式，从数据合理性，样本代表性，统计方法，数据解读等多个角度进行考虑。质疑学理论据要从权威性，解读方式，是否有对抗学理等多个角度进行考虑。
+"""
+end_prompt = """
+请你对我们的对辩过程进行总结，总结需要包括以下部分：1.对辩主要针对什么进行讨论。2.评价我的对辩能力，需要根据评级原则给出评级，并且给出具体理由。评级原则如下：等级一，缺乏论证的反驳；等级二，自说自话的反驳；等级三，针锋相对的反驳；等级四，正中要害的反驳。3.根据我的对辩能力提出一定的建议。
+示例如下：
+好的，我来对我们的对辩过程进行总结。
+在我们的对辩过程中，我们主要讨论了动物园是否应该被禁止。我认为动物园对动物的福利和权利造成了负面影响，而您则提出了一些质疑，认为动物园中的动物可以享受比野外更安全的生活条件。
+我认为您的对辩能力属于等级三，即针锋相对的反驳。您能够对我的观点提出一些质疑和反驳，并且能够给出一些合理的理由。但是，在某些情况下，您可能会使用一些不太恰当的类比来归谬我的观点，这可能会影响到对辩的质量和效果。
+鉴于您的对辩能力，我认为您可以进一步提高自己的辩论技巧。您可以通过更多的阅读和学习，提高自己的知识水平和思维能力，从而更好地进行论证和反驳。此外，在使用类比和比喻时，需要更加谨慎，确保它们能够恰当地表达您的观点，而不会歪曲或归谬对方的观点。
+"""
+prompt_1 = ChatPromptTemplate.from_template(template_1)
+prompt_2 = ChatPromptTemplate.from_template(template_2)
+prompt_3 = ChatPromptTemplate.from_template(template_3)
+prompt_4 = ChatPromptTemplate.from_template(template_4)
+prompt_5 = ChatPromptTemplate.from_template(template_5)
+def init_(app_id, api_key, api_secret):
+    global llm
+    llm = Spark_forlangchain(n=10, app_id=app_id, api_key=api_key,
+                             api_secret=api_secret)
+    memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=4096)
+    global conversation_1
+    global conversation_2
+    global conversation_3
+    conversation_1 = ConversationChain(llm=llm)
+    conversation_2 = ConversationChain(llm=llm, memory=memory)
+    print("初始化成功！")
+def shortDebate_(type, style, prompt, help):
+    if type == "破题":
+        msg = prompt_1.format_prompt(text=prompt, style=style).to_string()
+    elif type == "立论":
+        msg = prompt_2.format_prompt(text=prompt, style=style).to_string()
+    elif type == "对辩先发":
+        msg = prompt_3.format_prompt(text=prompt, style=style).to_string()
+    elif type == "对辩后发":
+        msg = prompt_4.format_prompt(text=prompt, style=style).to_string()
+    else:
+        msg = prompt
+    print(msg)
+    response = conversation_1.run(msg)
+    print(response)
+    help.append((prompt, response))
+    return help, help
+def longDebate_(style, prompt, help):
+    msg = prompt_5.format_prompt(text=prompt, style=style).to_string()
+    response = conversation_2.run(msg)
+    help.append((prompt, response))
+    return help, help
+def end_talk(style, prompt, help):
+    msg = end_prompt
+    response = conversation_2.run(msg)
+    help.append((prompt, response))
+    return help, help
+def Debatebytext_(prompt, help):
+    msg = prompt
+    response = QA_chain.run(msg)
+    help.append((prompt, response))
+    return help, help
+def upload_file(files):
+    vector_store = init_knowledge_vector_store(files.name)
+    memory_text = ConversationSummaryBufferMemory(
+        llm=llm, max_token_limit=4096)
+    global QA_chain
+    QA_chain = RetrievalQA.from_llm(llm=llm, retriever=vector_store.as_retriever(
+        search_kwargs={"k": 2}), memory=memory_text)
+    file_paths = [file.name for file in files]
+    return file_paths
+with gr.Blocks(css="#chatbot{height:300px} .overflow-y-auto{height:500px}") as init:
+    with gr.Row():
+        app_id = gr.Textbox(
+            lines=1, placeholder="app_id Here...", label="app_id")
+        api_key = gr.Textbox(
+            lines=1, placeholder="api_key Here...", label="api_key")
+        api_secret = gr.Textbox(
+            lines=1, placeholder="api_secret Here...", label="api_secret")
+        temperature = gr.Slider(minimum=0, maximum=1,
+                                step=0.1, value=0.3, interactive=True)
+        btn = gr.Button(value="初始化")
+        btn.click(init_, inputs=[app_id, api_key, api_secret])
+with gr.Blocks(css="#chatbot{height:300px} .overflow-y-auto{height:500px}") as shortDebate:
+    chatbot = gr.Chatbot(elem_id="chatbot")
+    state = gr.State([])
+    drop1 = gr.Radio(["破题", "立论", "对辩先发", "对辩后发"],
+                     label="功能选择", info="选择你想要的功能")  # 单选
+    with gr.Row():
+        txt = gr.Textbox(show_label="在这里开始聊天吧", placeholder="请输入你的问题")
+        send = gr.Button("🚀 发送")
+    style = gr.Textbox(lines=1, placeholder="style Here... ",
+                       label="辩论风格", value="犀利", interactive=True)
+    send.click(shortDebate_, [drop1, style, txt, state], [chatbot, state])
+with gr.Blocks(css="#chatbot{height:300px} .overflow-y-auto{height:500px}") as longDebate:
+    chatbot = gr.Chatbot(elem_id="chatbot")
+    state = gr.State([])
+    with gr.Row():
+        txt = gr.Textbox(show_label="在这里开始长辩论吧", placeholder="请输入你的问题")
+        send = gr.Button("🚀 发送")
+        end = gr.Button("🤠 总结")
+    style = gr.Textbox(lines=1, placeholder="style Here... ",
+                       label="辩论风格", value="犀利", interactive=True)
+    send.click(longDebate_, [style, txt, state], [chatbot, state])
+    end.click(end_talk, [style, txt, state], [chatbot, state])
+with gr.Blocks(css="#chatbot{height:300px} .overflow-y-auto{height:500px}") as Debatebytext:
+    chatbot = gr.Chatbot(elem_id="chatbot")
+    state = gr.State([])
+    file_output = gr.File(label='请上传文件, 目前支持txt、docx、md格式',
+                          file_types=['.txt', '.md', '.docx'])
+    with gr.Row():
+        txt = gr.Textbox(show_label="在这里从你给出的资料里学习吧", placeholder="请输入你的问题")
+        send = gr.Button("🚀 发送")
+    upload_button = gr.UploadButton("Click to Upload a File", scale=1, file_types=[
+                                    "text"])
+    upload_button.upload(upload_file, upload_button, file_output)
+    send.click(Debatebytext_, [txt, state], [chatbot, state])
+demo = gr.TabbedInterface([init, shortDebate, longDebate, Debatebytext], [
+                          "初始化", "辅助辩论", "对辩练习", "辩论技巧学习"])
+demo.launch()

demo.py ADDED Viewed

	@@ -0,0 +1,6 @@

+from utils.API import SparkAPI
+app_id = input("app_id here :")
+api_key = input("api_key here :")
+api_secret = input("api_secret here :")
+bot  = SparkAPI(app_id=app_id ,api_key=api_key ,api_secret=api_secret)
+bot.chat_stream()

model/text2vec_ernie/config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+    "attention_probs_dropout_prob": 0.1,
+    "hidden_act": "gelu",
+    "hidden_dropout_prob": 0.1,
+    "intermediate_size": 4096,
+    "hidden_size": 1024,
+    "initializer_range": 0.02,
+    "max_position_embeddings": 2048,
+    "num_attention_heads": 16,
+    "num_hidden_layers": 20,
+    "task_type_vocab_size": 16,
+    "type_vocab_size": 4,
+    "use_task_id": true,
+    "vocab_size": 40000,
+    "layer_norm_eps": 1e-05,
+    "model_type": "ernie",
+    "architectures": [
+        "ErnieForMaskedLM"
+    ]
+}

model/text2vec_ernie/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio==3.39.0
+langchain==0.0.252
+sentence_transformers==2.2.2
+websocket_client==1.6.1

utils/API.py ADDED Viewed

	@@ -0,0 +1,244 @@

+import base64
+import hmac
+import json
+from datetime import datetime, timezone
+from urllib.parse import urlencode, urlparse
+from websocket import create_connection, WebSocketConnectionClosedException
+from utils.tools import get_prompt, process_response, init_script, create_script
+class SparkAPI:
+    __api_url = 'ws(s)://spark-api.xf-yun.com/v2.1/chat'
+    __max_token = 4096
+    def __init__(self, app_id, api_key, api_secret):
+        self.__app_id = app_id
+        self.__api_key = api_key
+        self.__api_secret = api_secret
+    def __set_max_tokens(self, token):
+        if isinstance(token, int) is False or token < 0:
+            print("set_max_tokens() error: tokens should be a positive integer!")
+            return
+        self.__max_token = token
+    def __get_authorization_url(self):
+        authorize_url = urlparse(self.__api_url)
+        # 1. generate data
+        date = datetime.now(timezone.utc).strftime('%a, %d %b %Y %H:%M:%S %Z')
+        """
+        Generation rule of Authorization parameters
+            1) Obtain the APIKey and APISecret parameters from the console.
+            2) Use the aforementioned date to dynamically concatenate a string tmp. Here we take Huobi's URL as an example,
+                the actual usage requires replacing the host and path with the specific request URL.
+        """
+        signature_origin = "host: {}\ndate: {}\nGET {} HTTP/1.1".format(
+            authorize_url.netloc, date, authorize_url.path
+        )
+        signature = base64.b64encode(
+            hmac.new(
+                self.__api_secret.encode(),
+                signature_origin.encode(),
+                digestmod='sha256'
+            ).digest()
+        ).decode()
+        authorization_origin = \
+            'api_key="{}",algorithm="{}",headers="{}",signature="{}"'.format(
+                self.__api_key, "hmac-sha256", "host date request-line", signature
+            )
+        authorization = base64.b64encode(
+            authorization_origin.encode()).decode()
+        params = {
+            "authorization": authorization,
+            "date": date,
+            "host": authorize_url.netloc
+        }
+        ws_url = self.__api_url + "?" + urlencode(params)
+        return ws_url
+    def __build_inputs(
+            self,
+            message: dict,
+            user_id: str = "001",
+            domain: str = "general",
+            temperature: float = 0.5,
+            max_tokens: int = 4096
+    ):
+        input_dict = {
+            "header": {
+                "app_id": self.__app_id,
+                "uid": user_id,
+            },
+            "parameter": {
+                "chat": {
+                    "domain": domain,
+                    "temperature": temperature,
+                    "max_tokens": max_tokens,
+                }
+            },
+            "payload": {
+                "message": message
+            }
+        }
+        return json.dumps(input_dict)
+    def chat(
+            self,
+            query: str,
+            history: list = None,  # store the conversation history
+            user_id: str = "001",
+            domain: str = "general",
+            max_tokens: int = 4096,
+            temperature: float = 0.5,
+    ):
+        if history is None:
+            history = []
+        # the max of max_length is 4096
+        max_tokens = min(max_tokens, 4096)
+        url = self.__get_authorization_url()
+        ws = create_connection(url)
+        message = get_prompt(query, history)
+        input_str = self.__build_inputs(
+            message=message,
+            user_id=user_id,
+            domain=domain,
+            temperature=temperature,
+            max_tokens=max_tokens,
+        )
+        ws.send(input_str)
+        response_str = ws.recv()
+        try:
+            while True:
+                response, history, status = process_response(
+                    response_str, history)
+                """
+                The final return result, which means a complete conversation.
+                doc url: https://www.xfyun.cn/doc/spark/Web.html#_1-%E6%8E%A5%E5%8F%A3%E8%AF%B4%E6%98%8E
+                """
+                if len(response) == 0 or status == 2:
+                    break
+                response_str = ws.recv()
+            return response
+        except WebSocketConnectionClosedException:
+            print("Connection closed")
+        finally:
+            ws.close()
+    # Stream output statement, used for terminal chat.
+    def streaming_output(
+            self,
+            query: str,
+            history: list = None,  # store the conversation history
+            user_id: str = "001",
+            domain: str = "general",
+            max_tokens: int = 4096,
+            temperature: float = 0.5,
+    ):
+        if history is None:
+            history = []
+        # the max of max_length is 4096
+        max_tokens = min(max_tokens, 4096)
+        url = self.__get_authorization_url()
+        ws = create_connection(url)
+        message = get_prompt(query, history)
+        input_str = self.__build_inputs(
+            message=message,
+            user_id=user_id,
+            domain=domain,
+            temperature=temperature,
+            max_tokens=max_tokens,
+        )
+        # print(input_str)
+        # send question or prompt to url, and receive the answer
+        ws.send(input_str)
+        response_str = ws.recv()
+        # Continuous conversation
+        try:
+            while True:
+                response, history, status = process_response(
+                    response_str, history)
+                yield response, history
+                if len(response) == 0 or status == 2:
+                    break
+                response_str = ws.recv()
+        except WebSocketConnectionClosedException:
+            print("Connection closed")
+        finally:
+            ws.close()
+    def chat_stream(self):
+        history = []
+        try:
+            print("输入init来初始化剧本,输入create来创作剧本,输入exit或stop来终止对话\n")
+            while True:
+                query = input("Ask: ")
+                if query == 'init':
+                    jsonfile = input("请输入剧本文件路径:")
+                    script_data = init_script(history, jsonfile)
+                    print(
+                        f"正在导入剧本{script_data['name']},角色信息:{script_data['characters']},剧情介绍:{script_data['summary']}")
+                    query = f"我希望你能够扮演这个剧本杀游戏的主持人，我希望你能够逐步引导玩家到达最终结局，同时希望你在游戏中设定一些随机事件，需要玩家依靠自身的能力解决，当玩家做出偏离主线的行为或者与剧本无关的行为时，你需要委婉地将玩家引导至正常游玩路线中，对于玩家需要决策的事件，你需要提供一些行动推荐,下面是剧本介绍:{script_data}"
+                if query == 'create':
+                    name = input('请输入剧本名称:')
+                    characters = input('请输入角色信息:')
+                    summary = input('请输入剧情介绍:')
+                    details = input('请输入剧本细节')
+                    create_script(name, characters, summary, details)
+                    print('剧本创建成功！')
+                    continue
+                if query == "exit" or query == "stop":
+                    break
+                for response, _ in self.streaming_output(query, history):
+                    print("\r" + response, end="")
+                print("\n")
+        finally:
+            print("\nThank you for using the SparkDesk AI. Welcome to use it again!")
+from langchain.llms.base import LLM
+from typing import Any, List, Mapping, Optional
+class Spark_forlangchain(LLM):
+    # 类的成员变量，类型为整型
+    n: int
+    app_id: str
+    api_key: str
+    api_secret: str
+    # 用于指定该子类对象的类型
+    @property
+    def _llm_type(self) -> str:
+        return "Spark"
+    # 重写基类方法，根据用户输入的prompt来响应用户，返回字符串
+    def _call(
+            self,
+            query: str,
+            history: list = None,  # store the conversation history
+            user_id: str = "001",
+            domain: str = "general",
+            max_tokens: int = 4096,
+            temperature: float = 0.7,
+            stop: Optional[List[str]] = None,
+    ) -> str:
+        if stop is not None:
+            raise ValueError("stop kwargs are not permitted.")
+        bot = SparkAPI(app_id=self.app_id, api_key=self.api_key,
+                       api_secret=self.api_secret)
+        response = bot.chat(query, history, user_id,
+                            domain, max_tokens, temperature)
+        return response
+    # 返回一个字典类型，包含LLM的唯一标识
+    @property
+    def _identifying_params(self) -> Mapping[str, Any]:
+        """Get the identifying parameters."""
+        return {"n": self.n}

utils/__pycache__/API.cpython-310.pyc ADDED Viewed

Binary file (6.6 kB). View file

utils/__pycache__/tools.cpython-310.pyc ADDED Viewed

Binary file (3.7 kB). View file

utils/__pycache__/tools.cpython-37.pyc ADDED Viewed

Binary file (1.93 kB). View file

utils/tools.py ADDED Viewed

	@@ -0,0 +1,119 @@

+import json
+import os
+import shutil
+from glob import glob
+def read_json_file(file_path):
+    file_path = "./script/"+file_path
+    with open(file_path, 'r', encoding='utf-8') as file:
+        data = json.load(file)
+    return data
+def get_prompt(query: str, history: list):
+    use_message = {"role": "user", "content": query}
+    if history is None:
+        history = []
+    history.append(use_message)
+    message = {"text": history}
+    return message
+def process_response(response_str: str, history: list):
+    res_dict: dict = json.loads(response_str)
+    code = res_dict.get("header", {}).get("code")
+    status = res_dict.get("header", {}).get("status", 2)
+    if code == 0:
+        res_dict = res_dict.get("payload", {}).get(
+            "choices", {}).get("text", [{}])[0]
+        res_content = res_dict.get("content", "")
+        if len(res_dict) > 0 and len(res_content) > 0:
+            # Ignore the unnecessary data
+            if "index" in res_dict:
+                del res_dict["index"]
+            response = res_content
+            if status == 0:
+                history.append(res_dict)
+            else:
+                history[-1]["content"] += response
+                response = history[-1]["content"]
+            return response, history, status
+        else:
+            return "", history, status
+    else:
+        print("error code ", code)
+        print("you can see this website to know code detail")
+        print("https://www.xfyun.cn/doc/spark/%E6%8E%A5%E5%8F%A3%E8%AF%B4%E6%98%8E.html")
+        return "", history, status
+def init_script(history: list, jsonfile):
+    script_data = read_json_file(jsonfile)
+    return script_data
+def create_script(name, characters, summary, details):
+    import os
+    if not os.path.exists("script"):
+        os.mkdir("script")
+    data = {
+        "name": name,
+        "characters": characters,
+        "summary": summary,
+        "details": details
+    }
+    json_data = json.dumps(data, ensure_ascii=False)
+    print(json_data)
+    with open(f"./script/{name}.json", "w", encoding='utf-8') as file:
+        file.write(json_data)
+    pass
+def txt2vec(name: str, file_path: str):
+    from langchain.document_loaders import TextLoader
+    from langchain.text_splitter import RecursiveCharacterTextSplitter
+    loader = TextLoader(file_path)
+    data = loader.load()
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=256, chunk_overlap=128)
+    split_docs = text_splitter.split_documents(data)
+    from langchain.embeddings.huggingface import HuggingFaceEmbeddings
+    import sentence_transformers
+    EMBEDDING_MODEL = "model/text2vec_ernie/"
+    embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)
+    embeddings.client = sentence_transformers.SentenceTransformer(
+        embeddings.model_name, device='cuda')
+    from langchain.vectorstores import FAISS
+    db = FAISS.from_documents(split_docs, embeddings)
+    db.save_local(f"data/faiss/{name}/")
+def pdf2vec(name: str, file_path: str):
+    from langchain.document_loaders import PyPDFLoader
+    loader = PyPDFLoader(file_path)
+    split_docs = loader.load_and_split()
+    from langchain.embeddings.huggingface import HuggingFaceEmbeddings
+    import sentence_transformers
+    EMBEDDING_MODEL = "model/text2vec_ernie/"
+    embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)
+    embeddings.client = sentence_transformers.SentenceTransformer(
+        embeddings.model_name, device='cuda')
+    from langchain.vectorstores import FAISS
+    db = FAISS.from_documents(split_docs, embeddings)
+    db.save_local(f"data/faiss/{name}/")
+def mycopyfile(srcfile, dstpath):                       # 复制函数
+    if not os.path.isfile(srcfile):
+        print("%s not exist!" % (srcfile))
+    else:
+        fpath, fname = os.path.split(srcfile)
+        print(fpath)
+        print(fname)             # 分离文件名和路径
+        if not os.path.exists(dstpath):
+            os.makedirs(dstpath)                       # 创建路径
+        shutil.copy(srcfile, dstpath + fname)          # 复制文件
+        print("copy %s -> %s" % (srcfile, dstpath + fname))