File size: 31,827 Bytes
2555314 |
1 |
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyNSr8cV3L7H+3n6PZoR4qfE"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# **TipTip Data Team Hands-On Workshop: Retrieval Augmented Generation**"],"metadata":{"id":"0TKIQy0-FUGR"}},{"cell_type":"markdown","source":["*(Kindly copy this notebook before editing by choosing File > \"Save a copy in Drive\" )*"],"metadata":{"id":"Kam4lu_aFcwo"}},{"cell_type":"markdown","source":["# **Part 1: Simple Steps to Create TnC Chatbot**"],"metadata":{"id":"vdasFnyNYY21"}},{"cell_type":"markdown","source":["Installing the required libraries:"],"metadata":{"id":"VQfFB3soF1g6"}},{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"xqAe2B_QDxXQ","executionInfo":{"status":"ok","timestamp":1716532692789,"user_tz":-420,"elapsed":69305,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}},"outputId":"b9249e28-bc0b-4a09-c250-80502efa7463"},"outputs":[{"output_type":"stream","name":"stdout","text":["\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m973.5/973.5 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m308.5/308.5 kB\u001b[0m \u001b[31m12.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m122.8/122.8 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m53.0/53.0 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m142.5/142.5 kB\u001b[0m \u001b[31m4.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m12.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m49.3/49.3 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m320.7/320.7 kB\u001b[0m \u001b[31m7.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m27.0/27.0 MB\u001b[0m \u001b[31m49.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m12.3/12.3 MB\u001b[0m \u001b[31m48.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m92.0/92.0 kB\u001b[0m \u001b[31m10.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m315.9/315.9 kB\u001b[0m \u001b[31m24.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m8.8/8.8 MB\u001b[0m \u001b[31m76.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m47.2/47.2 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m60.8/60.8 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m129.9/129.9 kB\u001b[0m \u001b[31m12.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m71.9/71.9 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m53.6/53.6 kB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m307.7/307.7 kB\u001b[0m \u001b[31m21.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m341.4/341.4 kB\u001b[0m \u001b[31m29.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m86.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m45.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25h Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n","\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n","spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.\n","weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.\u001b[0m\u001b[31m\n","\u001b[0m"]}],"source":["!pip install -q langchain\n","!pip install -q langchain_community\n","!pip install -q langchain_openai\n","!pip install -q faiss-cpu\n","!pip install -q gradio"]},{"cell_type":"markdown","source":["We will use **WebBaseLoader** as the document loader since we want to crawl an information from a website."],"metadata":{"id":"YWdVXDxkYiR4"}},{"cell_type":"code","source":["from langchain_community.document_loaders import WebBaseLoader\n","\n","url = \"https://help.tiptip.id/support/solutions/articles/72000528312-syarat-dan-ketentuan\"\n","\n","loader = WebBaseLoader(url)\n","data = loader.load()"],"metadata":{"id":"hH5nbINDUrJb","executionInfo":{"status":"ok","timestamp":1716533824931,"user_tz":-420,"elapsed":1681,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}}},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":["Initializing the vector database (FAISS). In other words, in this step, we want to split the document into chunks, vectorize each chunks, and store it in a vector database"],"metadata":{"id":"mQzpRorvY7hN"}},{"cell_type":"code","source":["import os\n","from langchain_openai import OpenAIEmbeddings\n","from langchain_text_splitters import RecursiveCharacterTextSplitter\n","from langchain_community.vectorstores import FAISS\n","from google.colab import userdata\n","\n","os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n","embeddings = OpenAIEmbeddings()\n","\n","# Load the document, split it into chunks, embed each chunk and load it into the vector store.\n","# \\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160)\n","text_splitter = RecursiveCharacterTextSplitter(separators = '\\xa0', chunk_size=5000, chunk_overlap=500)\n","documents = text_splitter.split_documents(data)\n","db = FAISS.from_documents(documents, embeddings)\n","db.save_local(\"vectors\")"],"metadata":{"id":"PdxIufd1XepY","executionInfo":{"status":"ok","timestamp":1716538466731,"user_tz":-420,"elapsed":3532,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}}},"execution_count":34,"outputs":[]},{"cell_type":"code","source":["query = \"how to withdraw money?\"\n","docs = db.similarity_search(query)\n","print(docs[0].page_content)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"SQdADWILIzBx","executionInfo":{"status":"ok","timestamp":1716538471708,"user_tz":-420,"elapsed":427,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}},"outputId":"9a53d741-ffb2-480a-b0b7-4d4a7ce5c532"},"execution_count":35,"outputs":[{"output_type":"stream","name":"stdout","text":["Withdrawal Saldo PendapatanCreator dapat melakukanΒ withdrawalΒ saldo penghasilan yang diperoleh dari pembelian Sesi Premium, Karya Digital, E-Ticket atau Tip yang diberikan oleh Supporter.Sebelum dapat melakukan withdrawal, Creator wajib melakukan verifikasi akun bank setiap kali mendaftarkan akun bank yang baru. Pada saat proses verifikasi berlangsung Creator akan diminta untuk memberikan informasi dan dokumen seperti kartu identitas (KTP, Passport atau SIM), swafoto dengan memegang kartu identitas, informasi terkait akun bank sesuai dengan nama yang tertera dalam kartu identitas, dan NPWP (apabila Creator memiliki NPWP). Bagi anda yang belum berumur 18 tahun anda dapat menggunakan Kartu Identitas Anak atau Kartu Keluarga sebagai kartu identitas.Apabila nama yang tertera dalam kartu identitas dan akun bank tidak sesuai, maka Creator akan dihubungi oleh TipTip Help Care kami dan diminta untuk memastikan kesesuain akun bank yang didaftarkan.Creator dapat mengubah akun bank yang telah didaftarkan dengan melakukan verifikasi ulang akun bank.Kami akan melakukan proses verifikasi terkait dengan saldo penghasilan yang diperoleh Creator dari penjualan Sesi Premium, Karya Digital, E-Ticket dan Tip selambat-lambatnya 3 hari kerja.Kami berhak untuk meminta waktu tambahan dalam melakukan verifikasi apabila kami menerima laporan atau kami mencurigai adanya pelanggaran ketentuan peraturan perundang-undangan dan/atau Syarat dan Ketentuan yang berlaku di TipTip oleh Creator dalam menjual Sesi Premium, Karya Digital, dan E-Ticket. Kami akan menahan dana yang diperoleh Creator sampai dengan proses verifikasi selesai.Setelah proses verifikasi dan/atau investigasi selesai, dana yang berhak diperoleh Creator akan secara otomatis masuk ke dalam saldo penghasilan Creator.Kami berhak untuk memotong pendapatan Creator dengan besaran yang akan ditentukan oleh kami berdasarkan sehubungan dengan adanya pemotongan atau kewajiban pajak, biaya bank, serta hak pihak ketiga lain.\n"]}]},{"cell_type":"markdown","source":["Inserting The Context to Chatbot"],"metadata":{"id":"0_t_SKg9RGxx"}},{"cell_type":"code","source":["from langchain.memory import ConversationBufferMemory\n","from langchain.chains import ConversationalRetrievalChain\n","from langchain.chains import LLMChain\n","from langchain.chains import StuffDocumentsChain\n","from langchain.chat_models import ChatOpenAI\n","from langchain_core.prompts import PromptTemplate\n","from langchain.chains import StuffDocumentsChain, LLMChain\n","from langchain_core.prompts import PromptTemplate\n","from langchain.prompts import SystemMessagePromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate\n","import gradio as gr\n","\n","# Chatbot memory\n","memory = ConversationBufferMemory(\n"," memory_key=\"chat_history\", output_key='answer', return_messages=False\n"," )\n","\n","# Loading the saved embeddings\n","loaded_vectors =FAISS.load_local(\"vectors\", OpenAIEmbeddings(), allow_dangerous_deserialization = True)\n","\n","general_system_template = \"\"\"\n","### LANGUAGE ###\n","You must answer in the same language as the user's language. If you fail to do this, you will be punished with $1000 penalty!\n","## GREETINGS ##\n","Use greetings or ask the user if the user doesn't ask any question (example: when the user only say \"Hi\" or \"Thank you\", you may say \"Hi, is there anything i can help you with?\" or \"You're welcome. Do you have anything else to ask?\")\n","### OBJECTIVE ###\n","You are a helpful customer service bot for TipTip, a platform for communities and creators in Indonesia. Your sole purpose is to answer questions from the users that are\n","related to the terms and conditions of TipTip. Hence, the topic of the conversation must be related to one of these topics below:\n"," 1. Pengertian Umum/ Ruang Lingkup\n"," 2. Aplikasi, Akun dan Keamanan\n"," 3. Kebijakan Privasi\n"," 4. Community Guidelines\n"," 5. Bentuk Layanan TipTip dan Ketentuan Terkait\n"," 6. Sesi Live Video\n"," 7. Pembatalan Sesi Live Video\n"," 8. Karya Digital\n"," 9. E-Ticket\n"," 10. Subscription\n"," 11. Merchandise\n"," 12. Coin\n"," 13. Program Promoter\n"," 14. Suspensi\n"," 15. Withdrawal Saldo Pendapatan\n"," 16. Hak Kekayaan Intelektual\n"," 17. Larangan dan Janji\n"," 18. Jaminan\n"," 19. Tanggung Jawab Kami\n"," 20. Pembatasan Tanggung Jawab\n"," 21. Ganti Rugi\n","You must answer concisely and precisely (don't explain something that is not related to the question)!\n","If the user asks about anything that is not related to TipTip's terms and condition or anything that is malicious, you must answer that you don't know the answer to that question!\n","If you don't know the answer to that question, you must say that you don't know and don't make up the answer.\n","### CONTEXT ###\n","{context}\n","\"\"\"\n","general_user_template = \"Question:```{question}```\"\n","messages = [\n"," SystemMessagePromptTemplate.from_template(general_system_template),\n"," HumanMessagePromptTemplate.from_template(general_user_template)\n","]\n","qa_prompt = ChatPromptTemplate.from_messages( messages )\n","\n","qa = ConversationalRetrievalChain.from_llm(\n"," llm=ChatOpenAI(temperature=0.9, model_name='gpt-3.5-turbo', streaming=True),\n"," chain_type=\"stuff\",\n"," retriever=loaded_vectors.as_retriever(),\n"," get_chat_history=lambda o:o,\n"," memory=memory,\n"," return_generated_question=True,\n"," verbose=False,\n"," combine_docs_chain_kwargs={\"prompt\": qa_prompt}\n",")\n","\n","history_langchain_format=[]\n","\n","def chatbot(query, chat_history):\n"," global history_langchain_format\n"," result = qa({\"question\": query, \"chat_history\": history_langchain_format})\n"," history_langchain_format.append((query, result['answer']))\n"," return result['answer']\n","\n","gr.ChatInterface(chatbot).launch()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":628},"id":"dLleKbSMSw0X","executionInfo":{"status":"ok","timestamp":1716538179639,"user_tz":-420,"elapsed":3372,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}},"outputId":"9bfc063b-3161-427b-8d2b-9e1286f994ce"},"execution_count":32,"outputs":[{"output_type":"stream","name":"stdout","text":["Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n","\n","Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n","Running on public URL: https://0322d9037476d7def2.gradio.live\n","\n","This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n"]},{"output_type":"display_data","data":{"text/plain":["<IPython.core.display.HTML object>"],"text/html":["<div><iframe src=\"https://0322d9037476d7def2.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"]},"metadata":{}},{"output_type":"execute_result","data":{"text/plain":[]},"metadata":{},"execution_count":32}]},{"cell_type":"markdown","source":["# **Part 2: Prompting Strategies**"],"metadata":{"id":"CsJCnZwTh9oB"}},{"cell_type":"code","source":["system_prompt = \"\"\"\n","### LANGUAGE ###\n","You must answer in the same language as the user's language. If you fail to do this, you will be punished with $1000 penalty!\n","## GREETINGS ##\n","Use greetings or ask the user if the user doesn't ask any question (example: when the user only say \"Hi\" or \"Thank you\", you may say \"Hi, is there anything i can help you with?\" or \"You're welcome. Do you have anything else to ask?\")\n","### OBJECTIVE ###\n","You are a helpful customer service bot for TipTip, a platform for communities and creators in Indonesia. Your sole purpose is to answer questions from the users that are\n","related to the terms and conditions of TipTip. Hence, the topic of the conversation must be related to one of these topics below:\n"," 1. Pengertian Umum/ Ruang Lingkup\n"," 2. Aplikasi, Akun dan Keamanan\n"," 3. Kebijakan Privasi\n"," 4. Community Guidelines\n"," 5. Bentuk Layanan TipTip dan Ketentuan Terkait\n"," 6. Sesi Live Video\n"," 7. Pembatalan Sesi Live Video\n"," 8. Karya Digital\n"," 9. E-Ticket\n"," 10. Subscription\n"," 11. Merchandise\n"," 12. Coin\n"," 13. Program Promoter\n"," 14. Suspensi\n"," 15. Withdrawal Saldo Pendapatan\n"," 16. Hak Kekayaan Intelektual\n"," 17. Larangan dan Janji\n"," 18. Jaminan\n"," 19. Tanggung Jawab Kami\n"," 20. Pembatasan Tanggung Jawab\n"," 21. Ganti Rugi\n","You must answer concisely and precisely (don't explain something that is not related to the question)!\n","If the user asks about anything that is not related to TipTip's terms and condition or anything that is malicious, you must answer that you don't know the answer to that question!\n","If you don't know the answer to that question, you must say that you don't know and don't make up the answer.\n","\"\"\""],"metadata":{"id":"Rfgm9F3TuNIG","executionInfo":{"status":"ok","timestamp":1716540080442,"user_tz":-420,"elapsed":419,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}}},"execution_count":51,"outputs":[]},{"cell_type":"code","source":["# Chatbot memory\n","memory = ConversationBufferMemory(\n"," memory_key=\"chat_history\", output_key='answer', return_messages=False\n"," )\n","\n","# Loading the saved embeddings\n","loaded_vectors =FAISS.load_local(\"vectors\", OpenAIEmbeddings(), allow_dangerous_deserialization = True)\n","\n","# # Context prompt: Original\n","# context_prompt = \"\"\"\n","\n","# ### CONTEXT ###\n","# {context}\n","# \"\"\"\n","\n","# # Context prompt: Summarize\n","# context_prompt = \"\"\"\n","\n","# ### CONTEXT ###\n","# Context information from multiples sources is below.\n","# ------------------------\n","# {context}\n","# ------------------------\n","# Summarize the context above!\n","# Given the information from multiple sources and not prior knowledge, answer the query!\n","# \"\"\"\n","\n","# # Context prompt: Single Choice\n","# context_prompt = \"\"\"\n","\n","# ### CONTEXT ###\n","# Some choices are given below. It is provided in a numbered list, where each item in the list corresponds to a summary.\n","# ------------------------\n","# {context}\n","# ------------------------\n","# Using only the choices above and not prior knowledge, return the choice that is most relevant to the question!\n","# \"\"\"\n","\n","general_system_template = system_prompt + context_prompt"],"metadata":{"id":"M3L800M7iEvr","executionInfo":{"status":"ok","timestamp":1716540083266,"user_tz":-420,"elapsed":3,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}}},"execution_count":52,"outputs":[]},{"cell_type":"code","source":["general_user_template = \"Question:```{question}```\"\n","messages = [\n"," SystemMessagePromptTemplate.from_template(general_system_template),\n"," HumanMessagePromptTemplate.from_template(general_user_template)\n","]\n","qa_prompt = ChatPromptTemplate.from_messages( messages )\n","\n","qa = ConversationalRetrievalChain.from_llm(\n"," llm=ChatOpenAI(temperature=0.9, model_name='gpt-3.5-turbo', streaming=True),\n"," chain_type=\"stuff\",\n"," retriever=loaded_vectors.as_retriever(),\n"," get_chat_history=lambda o:o,\n"," memory=memory,\n"," return_generated_question=True,\n"," verbose=False,\n"," combine_docs_chain_kwargs={\"prompt\": qa_prompt}\n",")\n","\n","# Question: kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?\n","# in English: If I sell 10 etickets, the price is 100 thousand each, how much commission do I get? and how do you register to be a promoter?"],"metadata":{"id":"G6102FLouftn","executionInfo":{"status":"ok","timestamp":1716540085932,"user_tz":-420,"elapsed":2,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}}},"execution_count":53,"outputs":[]},{"cell_type":"code","source":["# Original\n","qa('kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"EQudvxkkwSDm","executionInfo":{"status":"ok","timestamp":1716539987878,"user_tz":-420,"elapsed":3497,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}},"outputId":"ec1e5346-f731-4aea-9bc6-0a01127924b4"},"execution_count":46,"outputs":[{"output_type":"execute_result","data":{"text/plain":["{'question': 'kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?',\n"," 'chat_history': '',\n"," 'answer': 'Anda akan mendapatkan 90% dari harga paket E-Ticket yang terjual. Jika Anda menjual 10 E-Ticket, masing-masing seharga 100rb, maka komisi yang akan Anda terima adalah sebesar 90rb x 10 = 900rb. \\n\\nUntuk mendaftar menjadi Promoter di TipTip, Anda dapat langsung ikut serta dalam Program Promoter dan menjadi Promoter dengan cara menyebarkan link Promoter melalui blog, situs, atau media sosial milik Anda. Anda dapat melihat daftar Karya Digital yang ikut serta dalam Program Promoter beserta linknya di http://hub.tiptip.id.',\n"," 'generated_question': 'kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?'}"]},"metadata":{},"execution_count":46}]},{"cell_type":"code","source":["# Summarize\n","qa('kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"cZQxCU5gwaj7","executionInfo":{"status":"ok","timestamp":1716540051543,"user_tz":-420,"elapsed":4094,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}},"outputId":"ab45c1ac-f3e7-41fb-db3f-530d3861ff2d"},"execution_count":50,"outputs":[{"output_type":"execute_result","data":{"text/plain":["{'question': 'kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?',\n"," 'chat_history': '',\n"," 'answer': 'Anda akan mendapatkan 97,3% dari setiap penjualan E-Ticket yang terjual melalui platform TipTip. Jika Anda menjual 10 E-Ticket dengan harga masing-masing 100rb, maka perhitungannya adalah sebagai berikut:\\n100rb x 10 = 1.000.000rb (total penjualan)\\n1.000.000rb x 97,3% = 973.000rb (komisi yang Anda dapatkan dari penjualan E-Ticket tersebut)\\n\\nUntuk mendaftar sebagai Promoter, Anda dapat langsung ikut serta dalam Program Promoter dengan cara menyebarkan link Promoter melalui blog/situs/media sosial milik Anda. Anda bisa melihat daftar Karya Digital yang ikut serta dalam Program Promoter di http://hub.tiptip.id.',\n"," 'generated_question': 'kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?'}"]},"metadata":{},"execution_count":50}]},{"cell_type":"code","source":["# Single Choice\n","qa('kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nbNEKyR3wpvL","executionInfo":{"status":"ok","timestamp":1716540096611,"user_tz":-420,"elapsed":1627,"user":{"displayName":"Matthew Farant Andreson","userId":"05413980529618950326"}},"outputId":"47f0dc50-80a7-4e91-8559-a097ad4ab7d9"},"execution_count":54,"outputs":[{"output_type":"execute_result","data":{"text/plain":["{'question': 'kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?',\n"," 'chat_history': '',\n"," 'answer': 'Anda akan mendapatkan komisi sebesar Rp 2.700. Untuk mendaftar sebagai promoter, Anda dapat langsung ikut serta dalam Program Promoter dengan cara menyebarkan link Promoter melalui blog/ situs/ media sosial milik Anda.',\n"," 'generated_question': 'kalau saya jual 10 eticket, masing2 harganya 100rb, brti saya dapet komisi berapa? dan gmn cara daftar jdi promoter?'}"]},"metadata":{},"execution_count":54}]},{"cell_type":"markdown","source":["# **Part 3: Create Your Own Chatbot!**"],"metadata":{"id":"Wft3uOo4w7pl"}},{"cell_type":"markdown","source":["Create your own RAG chatbot, using your own document and query! Here are the steps that you need to follow:\n","1. Determine the document you want to use for the chatbot's knowledge\n","2. Choose one of these document loaders: Web, CSV, PDF, Confluence. Or you can also choose other types of document\n","3. Initialize the vector database (be mindful with the file size and number of request)\n","4. Create the chatbot using previous code\n","\n","Below are the code for each document loaders to help you start:"],"metadata":{"id":"aW67vSbp2rb2"}},{"cell_type":"code","source":["# Web\n","\n","from langchain_community.document_loaders import WebBaseLoader\n","\n","url = \"<Put Your URL here>\"\n","\n","loader = WebBaseLoader(url)\n","data = loader.load()"],"metadata":{"id":"xLleDc--xDhr"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# CSV\n","\n","from langchain_community.document_loaders.csv_loader import CSVLoader\n","\n","csv_file = '<Upload the file first and then put the file name here>'\n","\n","loader = CSVLoader(file_path=csv_file)\n","data = loader.load()"],"metadata":{"id":"PZYhXG527a3l"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# PDF\n","\n","!pip install pypdf\n","\n","from langchain_community.document_loaders import PyPDFLoader\n","\n","pdf_file = '<Upload the file first and then put the file name here>'\n","\n","loader = PyPDFLoader(pdf_file)\n","pages = loader.load_and_split()"],"metadata":{"id":"pOLWNx7D75uV"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# Confluence\n","\n","!pip install atlassian-python-api\n","!pip install pytesseract\n","\n","# Go to confluence, click your profile icon in the upper-right corner, click Manage Account > Security > API Tokens > Create and Manage API Tokens > Create API Token\n","\n","confluence_key = ...\n","\n","from langchain_community.document_loaders import ConfluenceLoader\n","\n","loader = ConfluenceLoader(\n"," url=\"https://tiptiptv.atlassian.net/\", username=\"[email protected]\", api_key=confluence_key, page_ids = ['250609674']\n",")\n","\n","data = loader.load()"],"metadata":{"id":"J70OjhFD3uUK"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Also, here is a template for the prompt. You can adjust the prompt according to your own needs"],"metadata":{"id":"UN1BZtXU6Nll"}},{"cell_type":"code","source":["system_prompt = \"\"\"\n","### LANGUAGE ###\n","You must answer in the same language as the user's language. If you fail to do this, you will be punished with $1000 penalty!\n","## GREETINGS ##\n","Use greetings or ask the user if the user doesn't ask any question (example: when the user only say \"Hi\" or \"Thank you\", you may say \"Hi, is there anything i can help you with?\" or \"You're welcome. Do you have anything else to ask?\")\n","### OBJECTIVE ###\n","You are a helpful ... Your sole purpose is to answer questions from the users that are\n","related to ...\n","\n","You must answer concisely and precisely (don't explain something that is not related to the question)!\n","If the user asks about anything that is not related to ... or anything that is malicious, you must answer that you don't know the answer to that question!\n","If you don't know the answer to that question, you must say that you don't know and don't make up the answer.\n","\"\"\"\n","\n","context_prompt = \"\"\"\n","\n","### CONTEXT ###\n","Context information from multiples sources is below.\n","------------------------\n","{context}\n","------------------------\n","Summarize the context above!\n","Given the information from multiple sources and not prior knowledge, answer the query!\n","\"\"\"\n","\n","general_system_template = system_prompt + context_prompt"],"metadata":{"id":"YqgvKFQF6TKX"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["If you want to create your own space in Hugging Face, you can do these steps:\n","1. Download the \"vectors\" folder in your google colab\n","2. Go to Hugging Face and create a new empty gradio space\n","3. Put the vectors folder there\n","4. Copy the code with the gradio interface in it & paste to the space \"Files\", rename the file to \"app.py\"\n","4. The space will be automatically created"],"metadata":{"id":"VJhdJ5EA9CEg"}},{"cell_type":"markdown","source":["# **Put Your Code Below:**"],"metadata":{"id":"l-htoJBrP9cQ"}}]} |