Spaces:

JournalistsonHF
/

first-llm-classifier

Running

File size: 21,828 Bytes

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Improving Prompts\n",
    "\n",
    "With our LLM prompt showing such strong results, you might be content to leave it as it is. But there are always ways to improve, and you might come across a circumstance where the model's performance is less than ideal.\n",
    "\n",
    "Earlier in the lesson, we showed how you can feed the LLM examples of inputs and output prior to your request as part of a \"few shot\" prompt. An added benefit of coding a supervised sample for testing is that you can also use the training slice of the set to prime the LLM with this technique. If you've already done the work of labeling your data, you might as well use it to improve your model as well.\n",
    "\n",
    "Converting the training set you held to the side into a few-shot prompt is a simple matter of formatting it to fit your LLM's expected input. Here's how you might do it in our case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import time\n",
    "import os\n",
    "from retry import retry\n",
    "from rich.progress import track\n",
    "from huggingface_hub import InferenceClient\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import confusion_matrix, classification_report\n",
    "import pandas as pd\n",
    "\n",
    "api_key = os.getenv(\"HF_TOKEN\")\n",
    "client = InferenceClient(\n",
    "    token=api_key,\n",
    ")\n",
    "\n",
    "sample_df = pd.read_csv(\"https://huggingface.co/spaces/JournalistsonHF/first-llm-classifier/resolve/main/notebooks/gradio-app/sample.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling our previous `get_batch_list` function again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_batch_list(li, n=10):\n",
    "    \"\"\"Split the provided list into batches of size `n`.\"\"\"\n",
    "    batch_list = []\n",
    "    for i in range(0, len(li), n):\n",
    "        batch_list.append(li[i : i + n])\n",
    "    return batch_list\n",
    "\n",
    "training_input, test_input, training_output, test_output = train_test_split(\n",
    "    sample_df[['payee']],\n",
    "    sample_df['category'],\n",
    "    test_size=0.33,\n",
    "    random_state=42\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_fewshots(training_input, training_output, batch_size=10):\n",
    "    \"\"\"Convert the training input and output from sklearn's train_test_split into a few-shot prompt\"\"\"\n",
    "    # Batch up the training input into groups of `batch_size`\n",
    "    input_batches = get_batch_list(list(training_input.payee), n=batch_size)\n",
    "\n",
    "    # Do the same for the output\n",
    "    output_batches = get_batch_list(list(training_output), n=batch_size)\n",
    "\n",
    "    # Create a list to hold the formatted few-shot examples\n",
    "    fewshot_list = []\n",
    "\n",
    "    # Loop through the batches\n",
    "    for i, input_list in enumerate(input_batches):\n",
    "        fewshot_list.extend([\n",
    "            # Create a \"user\" message for the LLM formatted the same was a our prompt with newlines\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": \"\\n\".join(input_list),\n",
    "            },\n",
    "            # Create the expected \"assistant\" response as the JSON formatted output we expect\n",
    "            {\n",
    "                \"role\": \"assistant\",\n",
    "                \"content\": json.dumps(output_batches[i])\n",
    "            }\n",
    "        ])\n",
    "\n",
    "    # Return the list of few-shot examples, one for each batch\n",
    "    return fewshot_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pass in your training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "fewshot_list = get_fewshots(training_input, training_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a peek at the first pair to see if it's what we expect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'role': 'user',\n",
       "  'content': 'UFW OF AMERICA - AFL-CIO\\nRE-ELECT FIONA MA\\nELLA DINNING ROOM\\nMICHAEL EMERY PHOTOGRAPHY\\nLAKELAND  VILLAGE\\nTHE IVY RESTAURANT\\nMOORLACH FOR SENATE 2016\\nBROWN PALACE HOTEL\\nAPPLE STORE FARMERS MARKET\\nCABLETIME TV'},\n",
       " {'role': 'assistant',\n",
       "  'content': '[\"Other\", \"Other\", \"Other\", \"Other\", \"Other\", \"Restaurant\", \"Other\", \"Hotel\", \"Other\", \"Other\"]'}]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fewshot_list[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can add those examples to our prompt's `messages`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "@retry(ValueError, tries=2, delay=2)\n",
    "def classify_payees(name_list):\n",
    "    prompt = \"\"\"You are an AI model trained to categorize businesses based on their names.\n",
    "\n",
    "You will be given a list of business names, each separated by a new line.\n",
    "\n",
    "Your task is to analyze each name and classify it into one of the following categories: Restaurant, Bar, Hotel, or Other.\n",
    "\n",
    "It is extremely critical that there is a corresponding category output for each business name provided as an input.\n",
    "\n",
    "If a business does not clearly fall into Restaurant, Bar, or Hotel categories, you should classify it as \"Other\".\n",
    "\n",
    "Even if the type of business is not immediately clear from the name, it is essential that you provide your best guess based on the information available to you. If you can't make a good guess, classify it as Other.\n",
    "\n",
    "For example, if given the following input:\n",
    "\n",
    "\"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\"\n",
    "\n",
    "Your output should be a JSON list in the following format:\n",
    "\n",
    "[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]\n",
    "\n",
    "This means that you have classified \"Intercontinental Hotel\" as a Hotel, \"Pizza Hut\" as a Restaurant, \"Cheers\" as a Bar, \"Welsh's Family Restaurant\" as a Restaurant, and both \"KTLA\" and \"Direct Mailing\" as Other.\n",
    "\n",
    "Ensure that the number of classifications in your output matches the number of business names in the input. It is very important that the length of JSON list you return is exactly the same as the number of business names you receive.\n",
    "\"\"\"\n",
    "    response = client.chat.completions.create(\n",
    "        messages=[\n",
    "            ### <-- NEW \n",
    "            {\n",
    "                \"role\": \"system\",\n",
    "                \"content\": prompt,\n",
    "            },\n",
    "            *fewshot_list,\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": \"\\n\".join(name_list),\n",
    "            }\n",
    "            ### -->\n",
    "        ],\n",
    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
    "        temperature=0,\n",
    "    )\n",
    "\n",
    "    answer_str = response.choices[0].message.content\n",
    "    answer_list = json.loads(answer_str)\n",
    "\n",
    "    acceptable_answers = [\n",
    "        \"Restaurant\",\n",
    "        \"Bar\",\n",
    "        \"Hotel\",\n",
    "        \"Other\",\n",
    "    ]\n",
    "    for answer in answer_list:\n",
    "        if answer not in acceptable_answers:\n",
    "            raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
    "\n",
    "    try:\n",
    "        assert len(name_list) == len(answer_list)\n",
    "    except:\n",
    "        raise ValueError(f\"Number of outputs ({len(name_list)}) does not equal the number of inputs ({len(answer_list)})\")\n",
    "\n",
    "    return dict(zip(name_list, answer_list))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling our previous `classify_batches`function again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def classify_batches(name_list, batch_size=10, wait=2):\n",
    "    # Store the results\n",
    "    all_results = {}\n",
    "\n",
    "    # Batch up the list\n",
    "    batch_list = get_batch_list(name_list, n=batch_size)\n",
    "\n",
    "    # Loop through the list in batches\n",
    "    for batch in track(batch_list):\n",
    "        # Classify it\n",
    "        batch_results = classify_payees(batch)\n",
    "\n",
    "        # Add it to the results\n",
    "        all_results.update(batch_results)\n",
    "\n",
    "        # Tap the brakes\n",
    "        time.sleep(wait)\n",
    "\n",
    "    # Return the results\n",
    "    return pd.DataFrame(\n",
    "        all_results.items(),\n",
    "        columns=[\"payee\", \"category\"]\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And all you need to do is run it again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "39e9e883ab8042049e00c2ae87a089c1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "llm_df = classify_batches(list(test_input.payee))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And see if your results are any better"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "         Bar       1.00      1.00      1.00         2\n",
      "       Hotel       1.00      1.00      1.00         9\n",
      "       Other       1.00      0.98      0.99        57\n",
      "  Restaurant       0.94      1.00      0.97        15\n",
      "\n",
      "    accuracy                           0.99        83\n",
      "   macro avg       0.98      1.00      0.99        83\n",
      "weighted avg       0.99      0.99      0.99        83\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(classification_report(\n",
    "    test_output,\n",
    "    llm_df.category,\n",
    "))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another common tactic is to examine the misclassifications and tweak your prompt to address any patterns they reveal.\n",
    "\n",
    "One simple way to do this is to merge the LLM's predictions with the human-labeled data and filter for discrepancies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "comparison_df = llm_df.merge(\n",
    "    sample_df,\n",
    "    on=\"payee\",\n",
    "    how=\"inner\",\n",
    "    suffixes=[\"_llm\", \"_human\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And filter to cases where the LLM and human labels don't match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>payee</th>\n",
       "      <th>category_llm</th>\n",
       "      <th>category_human</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>SOTTOVOCE MADERO</td>\n",
       "      <td>Restaurant</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               payee category_llm category_human\n",
       "16  SOTTOVOCE MADERO   Restaurant          Other"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "comparison_df[comparison_df.category_llm != comparison_df.category_human]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking at the misclassifications, you might notice that the LLM is struggling with a particular type of business name. You can then adjust your prompt to address that specific issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>payee</th>\n",
       "      <th>category_llm</th>\n",
       "      <th>category_human</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MIDTOWN FRAMING</td>\n",
       "      <td>Other</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ALBERGO HILTON ROME AIRPO FIUMICINO</td>\n",
       "      <td>Hotel</td>\n",
       "      <td>Hotel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ISTOCK PHOTOS</td>\n",
       "      <td>Other</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DORIAN B. GARCIA</td>\n",
       "      <td>Other</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>KEELER ADVERTISING</td>\n",
       "      <td>Other</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 payee category_llm category_human\n",
       "0                      MIDTOWN FRAMING        Other          Other\n",
       "1  ALBERGO HILTON ROME AIRPO FIUMICINO        Hotel          Hotel\n",
       "2                        ISTOCK PHOTOS        Other          Other\n",
       "3                     DORIAN B. GARCIA        Other          Other\n",
       "4                   KEELER ADVERTISING        Other          Other"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "comparison_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, I observed that the LLM was struggling with businesses that had both the word bar and the word restaurant in their name. A simple fix would be to add a new line to your prompt that instructs the LLM what to do in that case:\n",
    "\n",
    "`If a business name contains both the word \"Restaurant\" and the word \"Bar\", you should classify it as a Restaurant.`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = \"\"\"You are an AI model trained to categorize businesses based on their names.\n",
    "\n",
    "You will be given a list of business names, each separated by a new line.\n",
    "\n",
    "Your task is to analyze each name and classify it into one of the following categories: Restaurant, Bar, Hotel, or Other.\n",
    "\n",
    "It is extremely critical that there is a corresponding category output for each business name provided as an input.\n",
    "\n",
    "If a business does not clearly fall into Restaurant, Bar, or Hotel categories, you should classify it as \"Other\".\n",
    "\n",
    "Even if the type of business is not immediately clear from the name, it is essential that you provide your best guess based on the information available to you. If you can't make a good guess, classify it as Other.\n",
    "\n",
    "For example, if given the following input:\n",
    "\n",
    "\"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\"\n",
    "\n",
    "Your output should be a JSON list in the following format:\n",
    "\n",
    "[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]\n",
    "\n",
    "This means that you have classified \"Intercontinental Hotel\" as a Hotel, \"Pizza Hut\" as a Restaurant, \"Cheers\" as a Bar, \"Welsh's Family Restaurant\" as a Restaurant, and both \"KTLA\" and \"Direct Mailing\" as Other.\n",
    "\n",
    "If a business name contains both the word \"Restaurant\" and the word \"Bar\", you should classify it as a Restaurant.\n",
    "\n",
    "Ensure that the number of classifications in your output matches the number of business names in the input. It is very important that the length of JSON list you return is exactly the same as the number of business names you receive.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Repeating this disciplined, scientific process of prompt refinement, testing and review can, after a few careful cycles, gradually improve your prompt to return even better results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%pip install gradio jupyter-server-proxy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"http://localhost:7873/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import gradio as gr\n",
    "import json\n",
    "\n",
    "# -- Gradio interface function --\n",
    "def classify_business_names(input_text):\n",
    "    name_list = [line.strip() for line in input_text.splitlines() if line.strip()]\n",
    "    try:\n",
    "        result = classify_payees(name_list)\n",
    "        return json.dumps(result, indent=2)\n",
    "    except Exception as e:\n",
    "        return f\"Error: {e}\"\n",
    "\n",
    "# -- Launch the demo --\n",
    "demo = gr.Interface(\n",
    "    fn=classify_business_names,\n",
    "    inputs=gr.Textbox(lines=10, placeholder=\"Enter business names, one per line\"),\n",
    "    outputs=\"json\",\n",
    "    title=\"Business Category Classifier\",\n",
    "    description=\"Enter business names and get a classification: Restaurant, Bar, Hotel, or Other.\"\n",
    ")\n",
    "\n",
    "demo.launch(server_name=\"0.0.0.0\", server_port=7873, root_path=\"/proxy/7873/\", quiet=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**[10. Sharing your classifier →](ch10-sharing-with-gradio.ipynb)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}