Spaces:

jeevanions
/

w8d1

Paused

App Files Files Community

jeevan commited on Oct 2, 2024

Commit

008dc3a

1 Parent(s): c04cad4

app updated

Browse files

Files changed (4) hide show

.gitignore +1 -0
.vscode/settings.json +9 -0
Prototyping_LangChain_Application_with_Production_Minded_Changes_Assignment.ipynb +811 -0
app.py +139 -1

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .venv

.vscode/settings.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "python.analysis.extraPaths": [
+        "/Users/jeevan/Documents/Learnings/ai-engineering-bootcamp/week8day1prodapp/.venv/lib/python3.9/site-packages",
+    ],
+    "python.terminal.activateEnvironment": false,
+    "python.analysis.diagnosticSeverityOverrides": {
+        "reportMissingImports": "none"
+    },
+}

Prototyping_LangChain_Application_with_Production_Minded_Changes_Assignment.ipynb ADDED Viewed

	@@ -0,0 +1,811 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Prototyping LangChain Application with Production Minded Changes\n",
+        "\n",
+        "For our first breakout room we'll be exploring how to set-up a LangChain LCEL chain in a way that takes advantage of all of the amazing out of the box production ready features it offers.\n",
+        "\n",
+        "We'll also explore `Caching` and what makes it an invaluable tool when transitioning to production environments.\n"
+      ],
+      "metadata": {
+        "id": "8ZsP-j7w3zcL"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Task 1: Dependencies and Set-Up\n",
+        "\n",
+        "Let's get everything we need - we're going to use very specific versioning today to try to mitigate potential env. issues!\n",
+        "\n",
+        "> NOTE: Dependency issues are a large portion of what you're going to be tackling as you integrate new technology into your work - please keep in mind that one of the things you should be passively learning throughout this course is ways to mitigate dependency issues."
+      ],
+      "metadata": {
+        "id": "PpeN9ND0HKa0"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 24,
+      "metadata": {
+        "id": "0P4IJUQF27jW"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -qU langchain_openai==0.2.0 langchain_community==0.3.0 langchain==0.3.0 pymupdf==1.24.10 qdrant-client==1.11.2 langchain_qdrant==0.1.4 langsmith==0.1.121"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll need an OpenAI API Key:"
+      ],
+      "metadata": {
+        "id": "qYcWLzrmHgDb"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import os\n",
+        "import getpass\n",
+        "\n",
+        "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "GZ8qfrFh_6ed",
+        "outputId": "4fb1a16f-1f71-4d0a-aad4-dd0d0917abc5"
+      },
+      "execution_count": 1,
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "OpenAI API Key:··········\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "And the LangSmith set-up:"
+      ],
+      "metadata": {
+        "id": "piz2DUDuHiSO"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import uuid\n",
+        "\n",
+        "os.environ[\"LANGCHAIN_PROJECT\"] = f\"AIM Week 8 Assignment 1 - {uuid.uuid4().hex[0:8]}\"\n",
+        "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
+        "os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
+        "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"LangChain API Key:\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "wLZX5zowCh-q",
+        "outputId": "565c588a-a865-4b86-d5ca-986f35153000"
+      },
+      "execution_count": 16,
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "LangChain API Key:··········\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's verify our project so we can leverage it in LangSmith later."
+      ],
+      "metadata": {
+        "id": "WmwNTziKHrQm"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(os.environ[\"LANGCHAIN_PROJECT\"])"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "T6GZmkVkFcHq",
+        "outputId": "f4c0fdb3-24ea-429a-fa8c-23556cb7c3ed"
+      },
+      "execution_count": 17,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "AIM Week 8 Assignment 1 - 215a8497\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Task 2: Setting up RAG With Production in Mind\n",
+        "\n",
+        "This is the most crucial step in the process - in order to take advantage of:\n",
+        "\n",
+        "- Asyncronous requests\n",
+        "- Parallel Execution in Chains\n",
+        "- And more...\n",
+        "\n",
+        "You must...use LCEL. These benefits are provided out of the box and largely optimized behind the scenes."
+      ],
+      "metadata": {
+        "id": "un_ppfaAHv1J"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Building our RAG Components: Retriever\n",
+        "\n",
+        "We'll start by building some familiar components - and showcase how they automatically scale to production features."
+      ],
+      "metadata": {
+        "id": "vGi-db23JMAL"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Please upload a PDF file to use in this example!"
+      ],
+      "metadata": {
+        "id": "zvbT3HSDJemE"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from google.colab import files\n",
+        "uploaded = files.upload()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 73
+        },
+        "id": "dvYczNeY91Hn",
+        "outputId": "c711c29b-e388-4d32-a763-f4504244eef2"
+      },
+      "execution_count": 7,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ],
+            "text/html": [
+              "\n",
+              "     <input type=\"file\" id=\"files-f26e85ad-7ad3-48c1-b905-c2692a72d40e\" name=\"files[]\" multiple disabled\n",
+              "        style=\"border:none\" />\n",
+              "     <output id=\"result-f26e85ad-7ad3-48c1-b905-c2692a72d40e\">\n",
+              "      Upload widget is only available when the cell has been executed in the\n",
+              "      current browser session. Please rerun this cell to enable.\n",
+              "      </output>\n",
+              "      <script>// Copyright 2017 Google LLC\n",
+              "//\n",
+              "// Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+              "// you may not use this file except in compliance with the License.\n",
+              "// You may obtain a copy of the License at\n",
+              "//\n",
+              "//      http://www.apache.org/licenses/LICENSE-2.0\n",
+              "//\n",
+              "// Unless required by applicable law or agreed to in writing, software\n",
+              "// distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+              "// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+              "// See the License for the specific language governing permissions and\n",
+              "// limitations under the License.\n",
+              "\n",
+              "/**\n",
+              " * @fileoverview Helpers for google.colab Python module.\n",
+              " */\n",
+              "(function(scope) {\n",
+              "function span(text, styleAttributes = {}) {\n",
+              "  const element = document.createElement('span');\n",
+              "  element.textContent = text;\n",
+              "  for (const key of Object.keys(styleAttributes)) {\n",
+              "    element.style[key] = styleAttributes[key];\n",
+              "  }\n",
+              "  return element;\n",
+              "}\n",
+              "\n",
+              "// Max number of bytes which will be uploaded at a time.\n",
+              "const MAX_PAYLOAD_SIZE = 100 * 1024;\n",
+              "\n",
+              "function _uploadFiles(inputId, outputId) {\n",
+              "  const steps = uploadFilesStep(inputId, outputId);\n",
+              "  const outputElement = document.getElementById(outputId);\n",
+              "  // Cache steps on the outputElement to make it available for the next call\n",
+              "  // to uploadFilesContinue from Python.\n",
+              "  outputElement.steps = steps;\n",
+              "\n",
+              "  return _uploadFilesContinue(outputId);\n",
+              "}\n",
+              "\n",
+              "// This is roughly an async generator (not supported in the browser yet),\n",
+              "// where there are multiple asynchronous steps and the Python side is going\n",
+              "// to poll for completion of each step.\n",
+              "// This uses a Promise to block the python side on completion of each step,\n",
+              "// then passes the result of the previous step as the input to the next step.\n",
+              "function _uploadFilesContinue(outputId) {\n",
+              "  const outputElement = document.getElementById(outputId);\n",
+              "  const steps = outputElement.steps;\n",
+              "\n",
+              "  const next = steps.next(outputElement.lastPromiseValue);\n",
+              "  return Promise.resolve(next.value.promise).then((value) => {\n",
+              "    // Cache the last promise value to make it available to the next\n",
+              "    // step of the generator.\n",
+              "    outputElement.lastPromiseValue = value;\n",
+              "    return next.value.response;\n",
+              "  });\n",
+              "}\n",
+              "\n",
+              "/**\n",
+              " * Generator function which is called between each async step of the upload\n",
+              " * process.\n",
+              " * @param {string} inputId Element ID of the input file picker element.\n",
+              " * @param {string} outputId Element ID of the output display.\n",
+              " * @return {!Iterable<!Object>} Iterable of next steps.\n",
+              " */\n",
+              "function* uploadFilesStep(inputId, outputId) {\n",
+              "  const inputElement = document.getElementById(inputId);\n",
+              "  inputElement.disabled = false;\n",
+              "\n",
+              "  const outputElement = document.getElementById(outputId);\n",
+              "  outputElement.innerHTML = '';\n",
+              "\n",
+              "  const pickedPromise = new Promise((resolve) => {\n",
+              "    inputElement.addEventListener('change', (e) => {\n",
+              "      resolve(e.target.files);\n",
+              "    });\n",
+              "  });\n",
+              "\n",
+              "  const cancel = document.createElement('button');\n",
+              "  inputElement.parentElement.appendChild(cancel);\n",
+              "  cancel.textContent = 'Cancel upload';\n",
+              "  const cancelPromise = new Promise((resolve) => {\n",
+              "    cancel.onclick = () => {\n",
+              "      resolve(null);\n",
+              "    };\n",
+              "  });\n",
+              "\n",
+              "  // Wait for the user to pick the files.\n",
+              "  const files = yield {\n",
+              "    promise: Promise.race([pickedPromise, cancelPromise]),\n",
+              "    response: {\n",
+              "      action: 'starting',\n",
+              "    }\n",
+              "  };\n",
+              "\n",
+              "  cancel.remove();\n",
+              "\n",
+              "  // Disable the input element since further picks are not allowed.\n",
+              "  inputElement.disabled = true;\n",
+              "\n",
+              "  if (!files) {\n",
+              "    return {\n",
+              "      response: {\n",
+              "        action: 'complete',\n",
+              "      }\n",
+              "    };\n",
+              "  }\n",
+              "\n",
+              "  for (const file of files) {\n",
+              "    const li = document.createElement('li');\n",
+              "    li.append(span(file.name, {fontWeight: 'bold'}));\n",
+              "    li.append(span(\n",
+              "        `(${file.type || 'n/a'}) - ${file.size} bytes, ` +\n",
+              "        `last modified: ${\n",
+              "            file.lastModifiedDate ? file.lastModifiedDate.toLocaleDateString() :\n",
+              "                                    'n/a'} - `));\n",
+              "    const percent = span('0% done');\n",
+              "    li.appendChild(percent);\n",
+              "\n",
+              "    outputElement.appendChild(li);\n",
+              "\n",
+              "    const fileDataPromise = new Promise((resolve) => {\n",
+              "      const reader = new FileReader();\n",
+              "      reader.onload = (e) => {\n",
+              "        resolve(e.target.result);\n",
+              "      };\n",
+              "      reader.readAsArrayBuffer(file);\n",
+              "    });\n",
+              "    // Wait for the data to be ready.\n",
+              "    let fileData = yield {\n",
+              "      promise: fileDataPromise,\n",
+              "      response: {\n",
+              "        action: 'continue',\n",
+              "      }\n",
+              "    };\n",
+              "\n",
+              "    // Use a chunked sending to avoid message size limits. See b/62115660.\n",
+              "    let position = 0;\n",
+              "    do {\n",
+              "      const length = Math.min(fileData.byteLength - position, MAX_PAYLOAD_SIZE);\n",
+              "      const chunk = new Uint8Array(fileData, position, length);\n",
+              "      position += length;\n",
+              "\n",
+              "      const base64 = btoa(String.fromCharCode.apply(null, chunk));\n",
+              "      yield {\n",
+              "        response: {\n",
+              "          action: 'append',\n",
+              "          file: file.name,\n",
+              "          data: base64,\n",
+              "        },\n",
+              "      };\n",
+              "\n",
+              "      let percentDone = fileData.byteLength === 0 ?\n",
+              "          100 :\n",
+              "          Math.round((position / fileData.byteLength) * 100);\n",
+              "      percent.textContent = `${percentDone}% done`;\n",
+              "\n",
+              "    } while (position < fileData.byteLength);\n",
+              "  }\n",
+              "\n",
+              "  // All done.\n",
+              "  yield {\n",
+              "    response: {\n",
+              "      action: 'complete',\n",
+              "    }\n",
+              "  };\n",
+              "}\n",
+              "\n",
+              "scope.google = scope.google || {};\n",
+              "scope.google.colab = scope.google.colab || {};\n",
+              "scope.google.colab._files = {\n",
+              "  _uploadFiles,\n",
+              "  _uploadFilesContinue,\n",
+              "};\n",
+              "})(self);\n",
+              "</script> "
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Saving eu_ai_act.html to eu_ai_act (1).html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "file_path = list(uploaded.keys())[0]\n",
+        "file_path"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 35
+        },
+        "id": "NtwoVUbaJlbW",
+        "outputId": "5aa08bae-97c5-4f49-cb23-e9dbf194ecf7"
+      },
+      "execution_count": 25,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "'eu_ai_act (1).html'"
+            ],
+            "application/vnd.google.colaboratory.intrinsic+json": {
+              "type": "string"
+            }
+          },
+          "metadata": {},
+          "execution_count": 25
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll define our chunking strategy."
+      ],
+      "metadata": {
+        "id": "kucGy3f0Jhdi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
+        "\n",
+        "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)"
+      ],
+      "metadata": {
+        "id": "G-DNvNFd8je5"
+      },
+      "execution_count": 18,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "We'll chunk our uploaded PDF file."
+      ],
+      "metadata": {
+        "id": "3_zRRNcLKCZh"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from langchain_community.document_loaders import PyMuPDFLoader\n",
+        "\n",
+        "Loader = PyMuPDFLoader\n",
+        "loader = Loader(file_path)\n",
+        "documents = loader.load()\n",
+        "docs = text_splitter.split_documents(documents)\n",
+        "for i, doc in enumerate(docs):\n",
+        "    doc.metadata[\"source\"] = f\"source_{i}\""
+      ],
+      "metadata": {
+        "id": "KOh6w9ud-ff6"
+      },
+      "execution_count": 26,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "#### QDrant Vector Database - Cache Backed Embeddings\n",
+        "\n",
+        "The process of embedding is typically a very time consuming one - we must, for ever single vector in our VDB as well as query:\n",
+        "\n",
+        "1. Send the text to an API endpoint (self-hosted, OpenAI, etc)\n",
+        "2. Wait for processing\n",
+        "3. Receive response\n",
+        "\n",
+        "This process costs time, and money - and occurs *every single time a document gets converted into a vector representation*.\n",
+        "\n",
+        "Instead, what if we:\n",
+        "\n",
+        "1. Set up a cache that can hold our vectors and embeddings (similar to, or in some cases literally a vector database)\n",
+        "2. Send the text to an API endpoint (self-hosted, OpenAI, etc)\n",
+        "3. Check the cache to see if we've already converted this text before.\n",
+        "  - If we have: Return the vector representation\n",
+        "  - Else: Wait for processing and proceed\n",
+        "4. Store the text that was converted alongside its vector representation in a cache of some kind.\n",
+        "5. Return the vector representation\n",
+        "\n",
+        "Notice that we can shortcut some instances of \"Wait for processing and proceed\".\n",
+        "\n",
+        "Let's see how this is implemented in the code."
+      ],
+      "metadata": {
+        "id": "U4XLeqJMKGdQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from qdrant_client import QdrantClient\n",
+        "from qdrant_client.http.models import Distance, VectorParams\n",
+        "from langchain_openai.embeddings import OpenAIEmbeddings\n",
+        "from langchain.storage import LocalFileStore\n",
+        "from langchain_qdrant import QdrantVectorStore\n",
+        "from langchain.embeddings import CacheBackedEmbeddings\n",
+        "\n",
+        "# Typical Embedding Model\n",
+        "core_embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
+        "\n",
+        "# Typical QDrant Client Set-up\n",
+        "collection_name = f\"pdf_to_parse_{uuid.uuid4()}\"\n",
+        "client = QdrantClient(\":memory:\")\n",
+        "client.create_collection(\n",
+        "    collection_name=collection_name,\n",
+        "    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),\n",
+        ")\n",
+        "\n",
+        "# Adding cache!\n",
+        "store = LocalFileStore(\"./cache/\")\n",
+        "cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
+        "    core_embeddings, store, namespace=core_embeddings.model\n",
+        ")\n",
+        "\n",
+        "# Typical QDrant Vector Store Set-up\n",
+        "vectorstore = QdrantVectorStore(\n",
+        "    client=client,\n",
+        "    collection_name=collection_name,\n",
+        "    embedding=cached_embedder)\n",
+        "vectorstore.add_documents(docs)\n",
+        "retriever = vectorstore.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 3})"
+      ],
+      "metadata": {
+        "id": "dzPUTCua98b2"
+      },
+      "execution_count": 9,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "##### ❓ Question #1:\n",
+        "\n",
+        "What are some limitations you can see with this approach? When is this most/least useful. Discuss with your group!\n",
+        "\n",
+        "> NOTE: There is no single correct answer here!"
+      ],
+      "metadata": {
+        "id": "QVZGvmNYLomp"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "##### 🏗️ Activity #1:\n",
+        "\n",
+        "Create a simple experiment that tests the cache-backed embeddings."
+      ],
+      "metadata": {
+        "id": "vZAOhyb3L9iD"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "### YOUR CODE HERE"
+      ],
+      "metadata": {
+        "id": "M_Mekif6MDqe"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Augmentation\n",
+        "\n",
+        "We'll create the classic RAG Prompt and create our `ChatPromptTemplates` as per usual."
+      ],
+      "metadata": {
+        "id": "DH0i-YovL8kZ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from langchain_core.prompts import ChatPromptTemplate\n",
+        "\n",
+        "rag_system_prompt_template = \"\"\"\\\n",
+        "You are a helpful assistant that uses the provided context to answer questions. Never reference this prompt, or the existance of context.\n",
+        "\"\"\"\n",
+        "\n",
+        "rag_message_list = [\n",
+        "    {\"role\" : \"system\", \"content\" : rag_system_prompt_template},\n",
+        "]\n",
+        "\n",
+        "rag_user_prompt_template = \"\"\"\\\n",
+        "Question:\n",
+        "{question}\n",
+        "Context:\n",
+        "{context}\n",
+        "\"\"\"\n",
+        "\n",
+        "chat_prompt = ChatPromptTemplate.from_messages([\n",
+        "    (\"system\", rag_system_prompt_template),\n",
+        "    (\"human\", rag_user_prompt_template)\n",
+        "])"
+      ],
+      "metadata": {
+        "id": "WchaoMEx9j69"
+      },
+      "execution_count": 27,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Generation\n",
+        "\n",
+        "Like usual, we'll set-up a `ChatOpenAI` model - and we'll use the fan favourite `gpt-4o-mini` for today.\n",
+        "\n",
+        "However, we'll also implement...a PROMPT CACHE!\n",
+        "\n",
+        "In essence, this works in a very similar way to the embedding cache - if we've seen this prompt before, we just use the stored response."
+      ],
+      "metadata": {
+        "id": "UQKnByVWMpiK"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from langchain_core.globals import set_llm_cache\n",
+        "from langchain_openai import ChatOpenAI\n",
+        "\n",
+        "chat_model = ChatOpenAI(model=\"gpt-4o-mini\")"
+      ],
+      "metadata": {
+        "id": "fOXKkaY7ABab"
+      },
+      "execution_count": 11,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Setting up the cache can be done as follows:"
+      ],
+      "metadata": {
+        "id": "mhv8IqZoM9cY"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from langchain_core.caches import InMemoryCache\n",
+        "\n",
+        "set_llm_cache(InMemoryCache())"
+      ],
+      "metadata": {
+        "id": "thqam26gAyzN"
+      },
+      "execution_count": 12,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "##### ❓ Question #2:\n",
+        "\n",
+        "What are some limitations you can see with this approach? When is this most/least useful. Discuss with your group!\n",
+        "\n",
+        "> NOTE: There is no single correct answer here!"
+      ],
+      "metadata": {
+        "id": "CvxEovcEM_oA"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "##### 🏗️ Activity #2:\n",
+        "\n",
+        "Create a simple experiment that tests the cache-backed embeddings."
+      ],
+      "metadata": {
+        "id": "3iCMjVYKNEeV"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "### YOUR CODE HERE"
+      ],
+      "metadata": {
+        "id": "QT5GfmsHNFqP"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Task 3: RAG LCEL Chain\n",
+        "\n",
+        "We'll also set-up our typical RAG chain using LCEL.\n",
+        "\n",
+        "However, this time: We'll specifically call out that the `context` and `question` halves of the first \"link\" in the chain are executed *in parallel* by default!\n",
+        "\n",
+        "Thanks, LCEL!"
+      ],
+      "metadata": {
+        "id": "zyPnNWb9NH7W"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from operator import itemgetter\n",
+        "from langchain_core.runnables.passthrough import RunnablePassthrough\n",
+        "\n",
+        "retrieval_augmented_qa_chain = (\n",
+        "        {\"context\": itemgetter(\"question\") | retriever, \"question\": itemgetter(\"question\")}\n",
+        "        | RunnablePassthrough.assign(context=itemgetter(\"context\"))\n",
+        "        | chat_prompt | chat_model\n",
+        "    )"
+      ],
+      "metadata": {
+        "id": "3JNvSsx_CEtI"
+      },
+      "execution_count": 13,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Let's test it out!"
+      ],
+      "metadata": {
+        "id": "Sx--wVctNdGa"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "retrieval_augmented_qa_chain.invoke({\"question\" : \"Write 50 things about this document!\"})"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "43uQegbnDQKP",
+        "outputId": "a9ff032b-4eb2-4f5f-f456-1fc6aa24aaec"
+      },
+      "execution_count": 14,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "AIMessage(content='1. The document is a PDF with a total of 19 pages.\\n2. It was created using LaTeX with hyperref.\\n3. The document has a metadata source labeled as \\'source_15\\'.\\n4. The file path of the document is \\'2406.11200v2.pdf\\'.\\n5. The document was created on June 19, 2024.\\n6. It is formatted as PDF 1.5.\\n7. The metadata includes an ID: \\'8d631703366f48b6b910115cce8ce3c5\\'.\\n8. There are no specific titles or authors listed in the metadata.\\n9. The document includes actions to parse queries into multiple aspects.\\n10. It mentions a subquery structure with attributes like brand, type, material, and features.\\n11. The brand specified in the subquery is \"TUSA\".\\n12. The type mentioned is \"swim fins\".\\n13. It discusses using an embedding tool to filter entities.\\n14. The document includes a step for computing similarity scores based on types.\\n15. There is an action to retrieve the top-20 entities with the highest similarity scores.\\n16. It lists an action to get brands of the top-20 entities.\\n17. There is a token match scoring action for brand matching.\\n18. The document utilizes an LLM reasoning API for functionality validation.\\n19. It synthesizes final scores using optimized parameters.\\n20. There is a section for acknowledgments thanking various labs and funding agencies.\\n21. The acknowledgments include support from DARPA and NSF.\\n22. It appreciates the contributions from Stanford Data Applications Initiative.\\n23. The Chan Zuckerberg Initiative is also acknowledged.\\n24. The document references multiple organizations, including Amazon and Genentech.\\n25. It emphasizes that the content does not necessarily represent the views of funding entities.\\n26. A reference is included for a paper titled \"Graph of Thoughts: Solving Elaborate Problems with Large Language Models.\"\\n27. The reference lists multiple authors and an arXiv identifier.\\n28. The document outlines a function named \\'ParseAttributeFromQuery\\'.\\n29. It details the inputs for the \\'ParseAttributeFromQuery\\' function.\\n30. The output of the function is described as a dictionary based on attributes.\\n31. Another function, \\'GetBagOfPhrases\\', is mentioned.\\n32. The \\'GetBagOfPhrases\\' function takes image IDs as input.\\n33. This function returns a list of phrases for each image.\\n34. The document describes the \\'GetEntityDocuments\\' function.\\n35. The \\'GetEntityDocuments\\' function also uses image IDs as input.\\n36. It returns text information related to each image.\\n37. The document includes a function named \\'GetClipTextEmbedding\\'.\\n38. This function is designed to embed strings into embeddings.\\n39. The \\'GetPatchIdToPhraseDict\\' function is outlined as well.\\n40. It returns a dictionary mapping patch IDs to phrases.\\n41. The document describes a function named \\'GetImages\\'.\\n42. \\'GetImages\\' returns a list of images corresponding to input IDs.\\n43. The \\'GetClipImageEmbedding\\' function is also mentioned.\\n44. It embeds a list of images into corresponding embeddings.\\n45. The document is structured to provide a clear methodology for processing queries.\\n46. It emphasizes the use of various entity and image-related functions.\\n47. The functions are designed to handle different types of data inputs.\\n48. The document appears to focus on data processing and machine learning applications.\\n49. It presents a systematic approach to extracting and analyzing query-related data.\\n50. Overall, the document serves as a technical resource outlining specific functions and methodologies related to data processing and entity recognition.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 749, 'prompt_tokens': 1389, 'total_tokens': 2138, 'completion_tokens_details': {'audio_tokens': None, 'reasoning_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-9445763c-0461-43ae-9245-1490fddbac8f-0', usage_metadata={'input_tokens': 1389, 'output_tokens': 749, 'total_tokens': 2138})"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 14
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "##### 🏗️ Activity #3:\n",
+        "\n",
+        "Show, through LangSmith, the different between a trace that is leveraging cache-backed embeddings and LLM calls - and one that isn't.\n",
+        "\n",
+        "Post screenshots in the notebook!"
+      ],
+      "metadata": {
+        "id": "0tYAvHrJNecy"
+      }
+    }
+  ]
+}

app.py CHANGED Viewed

@@ -2,25 +2,163 @@
 """
 IMPORTS HERE
 """
 ### Global Section ###
 """
 GLOBAL CODE HERE
 """
 ### On Chat Start (Session Start) Section ###
 @cl.on_chat_start
 async def on_chat_start():
     """ SESSION SPECIFIC CODE HERE """
 ### Rename Chains ###
 @cl.author_rename
 def rename(orig_author: str):
     """ RENAME CODE HERE """
 ### On Message Section ###
 @cl.on_message
 async def main(message: cl.Message):
     """
     MESSAGE CODE HERE
-    """

 """
 IMPORTS HERE
 """
+import os
+import uuid
+from dotenv import load_dotenv
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from langchain_community.document_loaders import PyMuPDFLoader
+from qdrant_client import QdrantClient
+from qdrant_client.http.models import Distance, VectorParams
+from langchain_openai.embeddings import OpenAIEmbeddings
+from langchain.storage import LocalFileStore
+from langchain_qdrant import QdrantVectorStore
+from langchain.embeddings import CacheBackedEmbeddings
+from langchain_core.prompts import ChatPromptTemplate
+from chainlit.types import AskFileResponse
+from langchain_core.globals import set_llm_cache
+from langchain_openai import ChatOpenAI
+from langchain_core.caches import InMemoryCache
+from operator import itemgetter
+from langchain_core.runnables.passthrough import RunnablePassthrough
+import chainlit as cl
+from langchain_core.runnables.config import RunnableConfig
+load_dotenv()
 ### Global Section ###
 """
 GLOBAL CODE HERE
 """
+os.environ["LANGCHAIN_PROJECT"] = f"AIM Week 8 Assignment 1 - {uuid.uuid4().hex[0:8]}"
+os.environ["LANGCHAIN_TRACING_V2"] = "true"
+os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
+text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
+rag_system_prompt_template = """\
+You are a helpful assistant that uses the provided context to answer questions.
+Never reference this prompt, or the existance of context.
+"""
+rag_message_list = [
+    {"role" : "system", "content" : rag_system_prompt_template},
+]
+rag_user_prompt_template = """\
+Question:
+{question}
+Context:
+{context}
+"""
+chat_prompt = ChatPromptTemplate.from_messages([
+    ("system", rag_system_prompt_template),
+    ("human", rag_user_prompt_template)
+])
+chat_model = ChatOpenAI(model="gpt-4o-mini")
+# Typical Embedding Model
+core_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
+def process_file(file: AskFileResponse):
+    import tempfile
+    with tempfile.NamedTemporaryFile(mode="w", delete=False) as tempfile:
+        with open(tempfile.name, "wb") as f:
+            f.write(file.content)
+    Loader = PyMuPDFLoader
+    loader = Loader(tempfile.name)
+    documents = loader.load()
+    docs = text_splitter.split_documents(documents)
+    for i, doc in enumerate(docs):
+        doc.metadata["source"] = f"source_{i}"
+    return docs
 ### On Chat Start (Session Start) Section ###
 @cl.on_chat_start
 async def on_chat_start():
     """ SESSION SPECIFIC CODE HERE """
+    files = None
+    while files == None:
+        # Async method: This allows the function to pause execution while waiting for the user to upload a file,
+        # without blocking the entire application. It improves responsiveness and scalability.
+        files = await cl.AskFileMessage(
+            content="Please upload a PDF file to begin!",
+            accept=["application/pdf"],
+            max_size_mb=20,
+            timeout=180,
+            max_files=1
+        ).send()
+    file = files[0]
+    msg = cl.Message(
+        content=f"Processing `{file.name}`...",
+    )
+    await msg.send()
+    docs = process_file(file)
+    # Typical QDrant Client Set-up
+    collection_name = f"pdf_to_parse_{uuid.uuid4()}"
+    client = QdrantClient(":memory:")
+    client.create_collection(
+        collection_name=collection_name,
+        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
+    )
+    # Adding cache!
+    store = LocalFileStore("./cache/")
+    cached_embedder = CacheBackedEmbeddings.from_bytes_store(
+        core_embeddings, store, namespace=core_embeddings.model
+    )
+    # Typical QDrant Vector Store Set-up
+    vectorstore = QdrantVectorStore(
+        client=client,
+        collection_name=collection_name,
+        embedding=cached_embedder)
+    vectorstore.add_documents(docs)
+    retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 3})
+    retrieval_augmented_qa_chain = (
+        {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
+        | RunnablePassthrough.assign(context=itemgetter("context"))
+        | chat_prompt | chat_model
+    )
+    # Let the user know that the system is ready
+    msg.content = f"Processing `{file.name}` done. You can now ask questions!"
+    await msg.update()
+    cl.user_session.set("chain", retrieval_augmented_qa_chain)
 ### Rename Chains ###
 @cl.author_rename
 def rename(orig_author: str):
     """ RENAME CODE HERE """
+    rename_dict = {"ChatOpenAI": "the Generator...", "VectorStoreRetriever": "the Retriever..."}
+    return rename_dict.get(orig_author, orig_author)
 ### On Message Section ###
 @cl.on_message
 async def main(message: cl.Message):
     """
     MESSAGE CODE HERE
+    """
+    runnable = cl.user_session.get("chain")
+    msg = cl.Message(content="")
+    # Async method: Using astream allows for asynchronous streaming of the response,
+    # improving responsiveness and user experience by showing partial results as they become available.
+    async for chunk in runnable.astream(
+        {"question": message.content},
+        config=RunnableConfig(callbacks=[cl.LangchainCallbackHandler()]),
+    ):
+        await msg.stream_token(chunk.content)
+    await msg.send()