Spaces:

JournalistsonHF
/

first-llm-classifier

Running

App Files Files Community

fdaudens HF Staff commited on Mar 26

Commit

b73cea4

verified ·

1 Parent(s): 14f7191

Upload 11 files

Browse files

Files changed (11) hide show

notebooks/ch0-intro.ipynb +84 -0
notebooks/ch1-what-we-will-do.ipynb +73 -0
notebooks/ch10-sharing-with-gradio.ipynb +168 -0
notebooks/ch2-the-LLM-advantage.ipynb +45 -0
notebooks/ch3-getting-started-with-hf.ipynb +63 -0
notebooks/ch4-installing-jupyterlab.ipynb +142 -0
notebooks/ch5-prompting-with-python.ipynb +536 -0
notebooks/ch6-structured-responses.ipynb +719 -0
notebooks/ch7-bulk-prompts.ipynb +954 -0
notebooks/ch8-evaluating-prompts.ipynb +655 -0
notebooks/ch9-improving-prompts.ipynb +669 -0

notebooks/ch0-intro.ipynb ADDED Viewed

	@@ -0,0 +1,84 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "bf2cde26",
+   "metadata": {},
+   "source": [
+    "# First LLM Classifier\n",
+    "\n",
+    "Learn how journalists use large-language models to organize and analyze massive datasets\n",
+    "\n",
+    "## What you will learn\n",
+    "\n",
+    "This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.\n",
+    "\n",
+    "It will teach you how to:\n",
+    "\n",
+    "- Submit large-language model prompts with the Python programming language\n",
+    "- Write structured prompts that can classify text into predefined categories\n",
+    "- Submit dozens of prompts at once as part of an automated routine\n",
+    "- Evaluate results using a rigorous, scientific approach\n",
+    "- Improve results by training the model with rules and examples\n",
+    "\n",
+    "By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.\n",
+    "\n",
+    "## Who can take it\n",
+    "\n",
+    "This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.\n",
+    "\n",
+    "💬 Need help or want to connect with others? Join the **Journalists on Hugging Face** community by signing up for our Slack group [here](https://forms.gle/JMCULh3jEdgFEsJu5).\n",
+    "\n",
+    "## Table of contents\n",
+    "\n",
+    "- [1. What we’ll do](ch1-what-we-will-do.ipynb)  \n",
+    "- [2. The LLM advantage](ch2-the-LLM-advantage.ipynb)  \n",
+    "- [3. Getting started with Hugging Face](ch3-getting-started-with-hf.ipynb)  \n",
+    "- [4. Installing JupyterLab (optional)](ch4-installing-jupyterlab.ipynb)  \n",
+    "- [5. Prompting with Python](ch5-prompting-with-python.ipynb)  \n",
+    "- [6. Structured responses](ch6-structured-responses.ipynb)  \n",
+    "- [7. Bulk prompts](ch7-bulk-prompts.ipynb)  \n",
+    "- [8. Evaluating prompts](ch8-evaluating-prompts.ipynb)  \n",
+    "- [9. Improving prompts](ch9-improving-prompts.ipynb)  \n",
+    "- [10. Sharing your app with Gradio](ch10-sharing-with-gradio.ipynb)\n",
+    "\n",
+    "## About this class\n",
+    "[Ben Welsh](https://palewi.re/who-is-ben-welsh/) and [Derek Willis](https://thescoop.org/about/) prepared this guide for [a training session](https://schedules.ire.org/nicar-2025/index.html#2045) at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis.  \n",
+    "The project was adapted to run on Hugging Face by [Florent Daudens](https://www.linkedin.com/in/fdaudens/).  \n",
+    "\n",
+    "Some of the copy was written with the assistance of GitHub’s Copilot, an AI-powered text generator. The materials are available as free and open source.\n",
+    "\n",
+    "**[1. What we’ll do →](ch1-what-we-will-do.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02477b14-edff-4380-ad41-9954b6c80863",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch1-what-we-will-do.ipynb ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9d45b5fc",
+   "metadata": {},
+   "source": [
+    "# First LLM Classifier\n",
+    "\n",
+    "## 1. What we’ll do\n",
+    "\n",
+    "Journalists frequently encounter the mountains of messy data generated by our periphrastic society. This vast and verbose corpus boasts everything from long-hand entries in police reports to the legalese of legislative bills.\n",
+    "\n",
+    "Understanding and analyzing this data is critical to the job but can be time-consuming and inefficient. Computers can help by automating sorting through blocks of text, extracting key details and flagging unusual patterns.\n",
+    "\n",
+    "A common goal in this work is to classify text into categories. For example, you might want to sort a collection of emails as “spam” and “not spam” or identify corporate filings that suggest a company is about to go bankrupt.\n",
+    "\n",
+    "Traditional techniques for classifying text, like keyword searches or regular expressions, can be brittle and error-prone. Machine learning models can be more flexible, but they require large amounts of human training, a high level of computer programming expertise and often yield unimpressive results.\n",
+    "\n",
+    "Large-language models offer a better deal. We will demonstrate how you can use them to get superior results with less hassle.\n",
+    "\n",
+    "### 1.1. Our example case\n",
+    "\n",
+    "To show the power of this approach, we’ll focus on a specific data set: campaign expenditures.\n",
+    "\n",
+    "Candidates for office must disclose the money they spend on everything from pizza to private jets. Tracking their spending can reveal patterns and lead to important stories.\n",
+    "\n",
+    "But it’s no easy task. Each election cycle, thousands of candidates log transactions into the public databases where spending is disclosed. That’s so much data that no one can examine it all. To make matters worse, campaigns often use vague or misleading descriptions of their spending, making it difficult to parse and understand.\n",
+    "\n",
+    "It wasn’t until after his 2022 election to Congress that [journalists discovered](https://www.nytimes.com/2022/12/29/nyregion/george-santos-campaign-finance.html) that Rep. George Santos of New York had spent thousands of campaign dollars on questionable and potentially illegal expenses. While much of his shady spending was publicly disclosed, it was largely overlooked in the run-up to election day.\n",
+    "\n",
+    "[![NYTimes Article](images/santos.png)](https://www.nytimes.com/2022/12/29/nyregion/george-santos-campaign-finance.html)\n",
+    "\n",
+    "Inspired by this scoop, we will create a classifier that can scan the expenditures logged in campaign finance reports and identify those that may be newsworthy.\n",
+    "\n",
+    "[![California Civic Data Coalition](images/ccdc.png)](https://californiacivicdata.org/)\n",
+    "\n",
+    "We will draw data from The Golden State, where the California Civic Data Coalition developed a clean, structured version of the statehouse’s disclosure data.\n",
+    "\n",
+    "**[2. The LLM advantage →](ch2-the-LLM-advantage.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "934bc606-e7a5-4dff-9154-dc5c42bf3fc7",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch10-sharing-with-gradio.ipynb ADDED Viewed

	@@ -0,0 +1,168 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10. Building a Demo with Gradio and Hugging Face Spaces\n",
+    "\n",
+    "Now that we've built a powerful LLM-based classifier, let's showcase it to the world (or your colleagues) by creating an interactive demo. In this chapter, we'll learn how to:\n",
+    "\n",
+    "1. Create a user-friendly web interface using Gradio\n",
+    "2. Package our demo for deployment\n",
+    "3. Deploy it on Hugging Face Spaces for free\n",
+    "4. Use the Hugging Face Inference API for model access"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### What we will do is the following:\n",
+    "\n",
+    "We will essentially start from [the functional notebook](ch9-improving-prompts.ipynb) we created in Chapter 9, and add an interactive component to it.\n",
+    "\n",
+    "1. **Add Gradio**\n",
+    "\n",
+    "Gradio is a Python library that allows you to easily create web-based interfaces where users can interact with your model. We will install **Gradio** to set up the interface for our model (it will be included in the requirements file — more on that below).\n",
+    "\n",
+    "   ```python\n",
+    "   import gradio as gr\n",
+    "   ```\n",
+    "\n",
+    "2. **Add an interface function that will call what we already coded**\n",
+    "\n",
+    "Here we will define the interface function that connects Gradio to the model we built earlier. This function will take input from the user, process it with the classifier, and return the result.\n",
+    "\n",
+    "```python\n",
+    "    # -- Gradio interface function --\n",
+    "    def classify_business_names(input_text):\n",
+    "        # Parse input text into list of names\n",
+    "        name_list = [line.strip() for line in input_text.splitlines() if line.strip()]\n",
+    "        \n",
+    "        if not name_list:\n",
+    "            return json.dumps({\"error\": \"No business names provided. Please enter at least one business name.\"})\n",
+    "            \n",
+    "        try:\n",
+    "            result = classify_payees(name_list)\n",
+    "            return json.dumps(result, indent=2)\n",
+    "        except Exception as e:\n",
+    "            return json.dumps({\"error\": f\"Classification failed: {str(e)}\"})\n",
+    " ```\n",
+    "\n",
+    "3. **Launch the Gradio interface**\n",
+    "      \n",
+    "   ```python\n",
+    "   # -- Launch the demo --\n",
+    "   demo = gr.Interface(\n",
+    "       fn=classify_business_names,\n",
+    "       inputs=gr.Textbox(lines=10, placeholder=\"Enter business names, one per line\"),\n",
+    "       outputs=\"json\",\n",
+    "       title=\"Business Category Classifier\",\n",
+    "       description=\"Enter business names and get a classification: Restaurant, Bar, Hotel, or Other.\"\n",
+    "   )\n",
+    "\n",
+    "   demo.launch(share=True)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🌍 Publish your demo to Hugging Face Spaces\n",
+    "\n",
+    "To share your Gradio app with the world, you can deploy it to [Hugging Face Spaces](https://huggingface.co/spaces) in just a few steps.\n",
+    "\n",
+    "### 1. Prepare your files\n",
+    "\n",
+    "Make sure your project has:\n",
+    "- A `app.py` file containing your Gradio app (e.g. `gr.Interface(...)`)\n",
+    "- The `sample.csv` file for the few shots classification\n",
+    "- A `requirements.txt` file listing any Python dependencies:\n",
+    "```\n",
+    "gradio\n",
+    "huggingface_hub\n",
+    "pandas\n",
+    "scikit-learn\n",
+    "retry\n",
+    "rich\n",
+    "```\n",
+    "\n",
+    "> **Example files are ready to use in the [gradio-app](gradio-app) folder!**\n",
+    "\n",
+    "### 2. Create a new Space\n",
+    "\n",
+    "1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)\n",
+    "2. Click **\"Create new Space\"**\n",
+    "3. Choose:\n",
+    "   - **SDK**: Gradio\n",
+    "   - **License**: (choose one, e.g. MIT)\n",
+    "   - **Visibility**: Public or Private\n",
+    "4. Name your Space and click **Create Space**\n",
+    "\n",
+    "### 3. Upload your files\n",
+    "\n",
+    "You can:\n",
+    "- Use the web interface to upload `app.py`, `sample.csv` and `requirements.txt`, or\n",
+    "- Clone the Space repo with Git and push your files:\n",
+    "```bash\n",
+    "git lfs install\n",
+    "git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME\n",
+    "cd YOUR_SPACE_NAME\n",
+    "# Add your files here\n",
+    "git add .\n",
+    "git commit -m \"Initial commit\"\n",
+    "git push\n",
+    "```\n",
+    "### 4. Add your Hugging Face token to Secrets\n",
+    "\n",
+    "For your Gradio app to interact with Hugging Face’s Inference API (or any other Hugging Face service), you need to securely store your Hugging Face token.\n",
+    "\n",
+    "1. In your Hugging Face Space:\n",
+    "   - Navigate to the **Settings** of your Space.\n",
+    "   - Go to the **Secrets** tab.\n",
+    "   - Add your token as a new secret with the key `HF_TOKEN`.\n",
+    "     - **Key**: `HF_TOKEN`\n",
+    "     - **Value**: Your Hugging Face token, which you can get from [here](https://huggingface.co/settings/tokens).\n",
+    "\n",
+    "Once added, the token will be accessible in your Space, and you can securely reference it in your code with:\n",
+    "\n",
+    "```python\n",
+    "api_key = os.getenv(\"HF_TOKEN\")\n",
+    "client = InferenceClient(token=api_key)\n",
+    "```\n",
+    "\n",
+    "### 5. Done 🎉\n",
+    "\n",
+    "Your app will build and be available at:\n",
+    "```\n",
+    "https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME\n",
+    "```\n",
+    "\n",
+    "Need inspiration? Check out [awesome Spaces](https://huggingface.co/spaces?sort=trending)!\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

notebooks/ch2-the-LLM-advantage.ipynb ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "48d73cd4",
+   "metadata": {},
+   "source": [
+    "## 2. The LLM advantage\n",
+    "\n",
+    "A [large-language model](https://en.wikipedia.org/wiki/Large_language_model) is an artificial intelligence system capable of understanding and generating human language due to its extensive training on vast amounts of text. These systems are commonly referred to by the acronym LLM. The most prominent examples include OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude, but there are many others, including several open-source options.\n",
+    "\n",
+    "While they are most famous for their ability to converse with humans as chatbots, LLMs can perform a wide range of language processing tasks, including text classification, summarization and translation.\n",
+    "\n",
+    "Unlike traditional machine-learning models, LLMs do not require users to provide pre-prepared training data to perform a specific task. Instead, LLMs can be prompted with a broad description of their goals and a few examples of rules they should follow. The LLMs will then generate responses informed by the massive amount of information they contain. That deep knowledge can be especially beneficial when dealing with large and diverse datasets that are difficult for humans to process on their own. This advancement is recognized as a landmark achievement in the development of artificial intelligence.\n",
+    "\n",
+    "[![Wired Article](images/llm.png)](https://www.wired.com/story/eight-google-employees-invented-modern-ai-transformers-paper/)\n",
+    "\n",
+    "LLMs also do not require the user to understand machine-learning concepts, like vectorization or Bayesian statistics, or to write complex code to train and evaluate the model. Instead, users can submit prompts in plain language, which the model will use to generate responses. This makes it easier for journalists to experiment with different approaches and quickly iterate on their work.\n",
+    "\n",
+    "**[3. Getting started with Hugging Face →](ch3-getting-started-with-hf.ipynb)**"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch3-getting-started-with-hf.ipynb ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c2c4bff1",
+   "metadata": {},
+   "source": [
+    "## 3. Getting started with Hugging Face\n",
+    "\n",
+    "In addition to the commercial chatbots that draw the most media attention, there are many other ways to access large-language models — including free and open-source options that you can run directly in the cloud using Hugging Face.\n",
+    "\n",
+    "For this demonstration, we will use [Hugging Face Serverless Inference API](https://huggingface.co/docs/api-inference/en/index), which offers free access to a wide range of powerful language models. It’s fast, beginner-friendly, and widely supported in the AI community. The skills you learn here will transfer easily to other platforms as well.\n",
+    "\n",
+    "To get started, go to [huggingface.co](https://huggingface.co/). Click on **Sign Up** to create an account or **Log In** at the top right.\n",
+    "\n",
+    "[![Hugging Face](images/hf.png)](https://huggingface.co/)\n",
+    "\n",
+    "Once you’re logged in, navigate to your profile dropdown and select **Settings**, then [**Access Tokens**](https://huggingface.co/settings/tokens). Click on **New token**, give it a name (we recommend `first-llm-classifier`), set the role to **Fine-Grained**, select the following options and hit **Generate**.\n",
+    "\n",
+    "[![Tokens](images/tokens.png)](https://huggingface.co/)\n",
+    "\n",
+    "Copy the token that appears — you'll only see it once — and store it somewhere safe. You’ll use it to authenticate your Python scripts when making requests to Hugging Face's APIs.\n",
+    "\n",
+    "You can now access any public model using the Hugging Face Inference API — no deployment required. For example, visit the [Llama 3.3 70B Instruct model page](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), click **Deploy**, then go to the **Inference Providers** tab, and select **HF Inference API**. This gives you instant access to the model via a hosted endpoint maintained by Hugging Face.\n",
+    "\n",
+    "[![Llama 3.3](images/llama.png)](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)\n",
+    "\n",
+    "This approach is ideal if you want to quickly try out models without spinning up your own infrastructure. Many models are available with generous free-tier access.\n",
+    "\n",
+    "**[4. Installing JupyterLab →](ch4-installing-jupyterlab.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f8b3428-2d43-4691-82b4-085341c8a1d2",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch4-installing-jupyterlab.ipynb ADDED Viewed

	@@ -0,0 +1,142 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "26b5a248",
+   "metadata": {},
+   "source": [
+    "## 4. Installing JupyterLab\n",
+    "\n",
+    "> ⚠️ Note: This step is optional. We’ll be running all code directly in JupyterLab on Hugging Face. Follow this step only if you prefer to run the code on your local machine—otherwise, you can skip to the next step.\n",
+    "\n",
+    "This class will show you how to interact with the Hugging Face API using the Python computer programming language.\n",
+    "\n",
+    "If you want to run it on your computer, you can write Python code in your terminal, in a text file and any number of other places. If you’re a skilled programmer who already has a preferred venue for coding, feel free to use it as you work through this class.\n",
+    "\n",
+    "If you’re not, the tool we recommend for beginners is [Project Jupyter](http://jupyter.org/), a browser-based interface where you can write, run, remix, and republish code.\n",
+    "\n",
+    "It is free software that anyone can install and run. It is used by [scientists](http://nbviewer.jupyter.org/github/robertodealmeida/notebooks/blob/master/earth_day_data_challenge/Analyzing%20whale%20tracks.ipynb), [scholars](http://nbviewer.jupyter.org/github/nealcaren/workshop_2014/blob/master/notebooks/5_Times_API.ipynb), [investors](https://github.com/rsvp/fecon235/blob/master/nb/fred-debt-pop.ipynb), and corporations to create and share their research. It is also used by journalists to develop stories and show their work.\n",
+    "\n",
+    "The easiest way to use it is by installing [JupyterLab Desktop](https://github.com/jupyterlab/jupyterlab-desktop), a self-contained application that provides a ready-to-use Python environment with several popular libraries bundled in.  \n",
+    "It can be installed on any operating system with a simple point-and-click interface."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c97c32a7-0497-4231-a628-647afdaac68b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "<div style=\"position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; max-width: 100%;\">\n",
+       "  <iframe \n",
+       "    src=\"https://www.youtube.com/embed/578B63wZ7rI\" \n",
+       "    style=\"position: absolute; top: 0; left: 0; width: 100%; height: 100%;\" \n",
+       "    frameborder=\"0\" \n",
+       "    allowfullscreen>\n",
+       "  </iframe>\n",
+       "</div>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.display import HTML\n",
+    "\n",
+    "HTML(\"\"\"\n",
+    "<div style=\"position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden; max-width: 100%;\">\n",
+    "  <iframe \n",
+    "    src=\"https://www.youtube.com/embed/578B63wZ7rI\" \n",
+    "    style=\"position: absolute; top: 0; left: 0; width: 100%; height: 100%;\" \n",
+    "    frameborder=\"0\" \n",
+    "    allowfullscreen>\n",
+    "  </iframe>\n",
+    "</div>\n",
+    "\"\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5decd99-55a2-4a56-b88f-1930a6245203",
+   "metadata": {},
+   "source": [
+    "The first step is to visit [JupyterLab Desktop’s homepage on GitHub](https://github.com/jupyterlab/jupyterlab-desktop) in your web browser. \n",
+    "\n",
+    "![JupyterLab Desktop homepage](images/jupyter-desktop-repo.png)\n",
+    "\n",
+    "Scroll down to the documentation below the code until you reach the [Installation](https://github.com/jupyterlab/jupyterlab-desktop) section.  \n",
+    "\n",
+    "![JupyterLab Desktop download](images/jupyter-desktop-install.png)\n",
+    "\n",
+    "Then pick the link appropriate for your operating system. The installation file is large, so the download might take a while.\n",
+    "\n",
+    "Find the file in your downloads directory and double-click it to begin the installation process.  \n",
+    "\n",
+    "Follow the instructions presented by the pop-up windows, sticking to the default options.\n",
+    "\n",
+    "> ⚠️ **Warning**  \n",
+    "> Your computer’s operating system might flag the JupyterLab Desktop installer as an unverified or insecure application. Don’t worry. The tool has been vetted by Project Jupyter’s core developers and it’s safe to use.  \n",
+    "> If your system is blocking you from installing the tool, you’ll likely need to work around its barriers. For instance, on macOS, this might require [visiting your system’s security settings](https://www.wikihow.com/Install-Software-from-Unsigned-Developers-on-a-Mac) to allow the installation.\n",
+    "\n",
+    "Once JupyterLab Desktop is installed, you can accept the installation wizard’s offer to immediately open the program, or you can search for “Jupyter Lab” in your operating system’s application finder.\n",
+    "\n",
+    "That will open up a new window that looks something like this:\n",
+    "\n",
+    "![JupyterLab Desktop splash screen](images/jupyter-desktop-splash.png)\n",
+    "\n",
+    "> ⚠️ **Warning**  \n",
+    "> If you see a warning bar at the bottom of the screen that says you need to install Python, click the link provided to make that happen.\n",
+    "\n",
+    "Click the “New notebook…” button to open the Python interface.\n",
+    "\n",
+    "![JupyterLab new notebook](images/jupyter-desktop-blank.png)\n",
+    "\n",
+    "Welcome to your first Jupyter notebook. Now you’re ready to move on to writing code.\n",
+    "\n",
+    "> 💡 **Note**  \n",
+    "> If you’re struggling to make Jupyter work and need help with the basics,  \n",
+    "> we recommend you check out [“First Python Notebook”](https://palewi.re/docs/first-python-notebook/), where you can get up to speed.\n",
+    "\n",
+    "**[5. Prompting with Python →](ch5-prompting-with-python.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cca91757-438f-4a5f-b3d5-2bdd774278de",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch5-prompting-with-python.ipynb ADDED Viewed

	@@ -0,0 +1,536 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "84dd193e",
+   "metadata": {},
+   "source": [
+    "## 5. Prompting with Python"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "862b8218",
+   "metadata": {},
+   "source": [
+    "First, we’ll install the libraries we need. The `huggingface_hub` package is the official client for Hugging Face’s API. The `rich` and `ipywidgets` packages are helper libraries that will improve how your outputs look in Jupyter notebooks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c09f0b15",
+   "metadata": {},
+   "source": [
+    "A common way to install packages from inside your JupyterLab Desktop notebook is to use the `%pip command`. Hit the play button in the top toolbar after selecting the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e728c5fe",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%pip install rich ipywidgets huggingface_hub"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79f96b1a",
+   "metadata": {},
+   "source": [
+    "If the `%pip command` doesn’t work on your computer, try substituting the `!pip command` instead. Or you can install the packages from the command line on your computer and restart your notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75f1366a",
+   "metadata": {},
+   "source": [
+    "Now let's import them in the cell that appears below the installation output. Hit play again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8013a72c-670e-48ab-8619-99a337fd5392",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from rich import print\n",
+    "from huggingface_hub import InferenceClient"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fdd7b199",
+   "metadata": {},
+   "source": [
+    "With `api_key = os.getenv(\"HF_TOKEN\")`, we're calling the free Hugging Face Inference API, using an authentication token stored in the \"Secrets\" of this Space. If you'd like to duplicate this Space, you'll need to create a token with your account [here](https://huggingface.co/settings/tokens).\n",
+    "\n",
+    "You should continue adding new cells as you need throughout the rest of the class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "a5ec5ea4-5bd1-4ba7-b4cb-3f0a98505f29",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "api_key = os.getenv(\"HF_TOKEN\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9e81187",
+   "metadata": {},
+   "source": [
+    "Let’s make our first prompt. To do that, we submit a dictionary to Hugging Face’s `chat.completions.create` method. The dictionary has a `messages` key that contains a list of dictionaries. Each dictionary in the list represents a message in the conversation. When the role is \"user\" it is roughly the same as asking a question to a chatbot."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e83c5390",
+   "metadata": {},
+   "source": [
+    "We also need to pick a model from among the choices Hugging Face gives us. We’re picking Llama 3.3, the latest from Meta."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "54a5befd-0c64-4039-9b26-14733c9f007e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = InferenceClient(\n",
+    "    model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "    token=api_key,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "38abe6e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Explain the importance of data journalism in a concise sentence\"\n",
+    "        }\n",
+    "    ],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3156058c",
+   "metadata": {},
+   "source": [
+    "Our client saves the response as a variable. Print that Python object to see what it contains."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "49bf29f5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ChatCompletionOutput</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">choices</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ChatCompletionOutputComplete</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">finish_reason</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'stop'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">index</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>, \n",
+       "<span style=\"color: #808000; text-decoration-color: #808000\">message</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ChatCompletionOutputMessage</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">role</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'assistant'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">content</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Data journalism plays a crucial role in holding </span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000\">institutions accountable and informing the public by analyzing and interpreting complex data to uncover trends, </span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000\">patterns, and insights that can lead to more informed decision-making and a deeper understanding of social </span>\n",
+       "<span style=\"color: #008000; text-decoration-color: #008000\">issues.'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">tool_calls</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span><span style=\"font-weight: bold\">)</span>, <span style=\"color: #808000; text-decoration-color: #808000\">logprobs</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span><span style=\"font-weight: bold\">)]</span>, <span style=\"color: #808000; text-decoration-color: #808000\">created</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1742869712</span>, <span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #008000; text-decoration-color: #008000\">''</span>, <span style=\"color: #808000; text-decoration-color: #808000\">model</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'meta-llama/Llama-3.3-70B-Instruct'</span>, \n",
+       "<span style=\"color: #808000; text-decoration-color: #808000\">system_fingerprint</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'3.0.1-sha-bb9095a'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">usage</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ChatCompletionOutputUsage</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">completion_tokens</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">45</span>, <span style=\"color: #808000; text-decoration-color: #808000\">prompt_tokens</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">46</span>, \n",
+       "<span style=\"color: #808000; text-decoration-color: #808000\">total_tokens</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">91</span><span style=\"font-weight: bold\">)</span>, <span style=\"color: #808000; text-decoration-color: #808000\">object</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'chat.completion'</span><span style=\"font-weight: bold\">)</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1;35mChatCompletionOutput\u001b[0m\u001b[1m(\u001b[0m\u001b[33mchoices\u001b[0m=\u001b[1m[\u001b[0m\u001b[1;35mChatCompletionOutputComplete\u001b[0m\u001b[1m(\u001b[0m\u001b[33mfinish_reason\u001b[0m=\u001b[32m'stop'\u001b[0m, \u001b[33mindex\u001b[0m=\u001b[1;36m0\u001b[0m, \n",
+       "\u001b[33mmessage\u001b[0m=\u001b[1;35mChatCompletionOutputMessage\u001b[0m\u001b[1m(\u001b[0m\u001b[33mrole\u001b[0m=\u001b[32m'assistant'\u001b[0m, \u001b[33mcontent\u001b[0m=\u001b[32m'Data journalism plays a crucial role in holding \u001b[0m\n",
+       "\u001b[32minstitutions accountable and informing the public by analyzing and interpreting complex data to uncover trends, \u001b[0m\n",
+       "\u001b[32mpatterns, and insights that can lead to more informed decision-making and a deeper understanding of social \u001b[0m\n",
+       "\u001b[32missues.'\u001b[0m, \u001b[33mtool_calls\u001b[0m=\u001b[3;35mNone\u001b[0m\u001b[1m)\u001b[0m, \u001b[33mlogprobs\u001b[0m=\u001b[3;35mNone\u001b[0m\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m, \u001b[33mcreated\u001b[0m=\u001b[1;36m1742869712\u001b[0m, \u001b[33mid\u001b[0m=\u001b[32m''\u001b[0m, \u001b[33mmodel\u001b[0m=\u001b[32m'meta-llama/Llama-3.3-70B-Instruct'\u001b[0m, \n",
+       "\u001b[33msystem_fingerprint\u001b[0m=\u001b[32m'3.0.1-sha-bb9095a'\u001b[0m, \u001b[33musage\u001b[0m=\u001b[1;35mChatCompletionOutputUsage\u001b[0m\u001b[1m(\u001b[0m\u001b[33mcompletion_tokens\u001b[0m=\u001b[1;36m45\u001b[0m, \u001b[33mprompt_tokens\u001b[0m=\u001b[1;36m46\u001b[0m, \n",
+       "\u001b[33mtotal_tokens\u001b[0m=\u001b[1;36m91\u001b[0m\u001b[1m)\u001b[0m, \u001b[33mobject\u001b[0m=\u001b[32m'chat.completion'\u001b[0m\u001b[1m)\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9f86d8e",
+   "metadata": {},
+   "source": [
+    "You should see something like:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdf08e93-a6cc-4245-9fa3-2cc456208bcf",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "ChatCompletionOutput(\n",
+    "    choices=[\n",
+    "        ChatCompletionOutputComplete(\n",
+    "            finish_reason='stop',\n",
+    "            index=0,\n",
+    "            message=ChatCompletionOutputMessage(\n",
+    "                role='assistant',\n",
+    "                content='Data journalism plays a crucial role in holding those in power accountable by using data analysis and visualization to uncover insights, trends, and patterns that inform and engage the public on important issues.',\n",
+    "                tool_calls=None\n",
+    "            ),\n",
+    "            logprobs=None\n",
+    "        )\n",
+    "    ],\n",
+    "    created=1742592105,\n",
+    "    id='',\n",
+    "    model='meta-llama/Llama-3.3-70B-Instruct',\n",
+    "    system_fingerprint='3.2.1-native',\n",
+    "    usage=ChatCompletionOutputUsage(\n",
+    "        completion_tokens=37,\n",
+    "        prompt_tokens=46,\n",
+    "        total_tokens=83\n",
+    "    ),\n",
+    "    object='chat.completion'\n",
+    ")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff414bab",
+   "metadata": {},
+   "source": [
+    "There’s a lot here, but the `message` has the actual response from the LLM. Let’s just print the content from that message. Note that your response probably varies from this guide. That’s because LLMs mostly are probablistic prediction machines. Every response can be a little different."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "0f291693",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Data journalism plays a crucial role in holding institutions accountable and informing the public by analyzing and \n",
+       "interpreting complex data to uncover trends, patterns, and insights that can lead to more informed decision-making \n",
+       "and a deeper understanding of social issues.\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Data journalism plays a crucial role in holding institutions accountable and informing the public by analyzing and \n",
+       "interpreting complex data to uncover trends, patterns, and insights that can lead to more informed decision-making \n",
+       "and a deeper understanding of social issues.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c1e292a",
+   "metadata": {},
+   "source": [
+    "Let’s pick a different model from amongthe choices that Hugging Face offers. One we could try is Gemma2, an open model from Google. Rather than add a new cell, lets revise the code we already have and rerun it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "9fb3f9b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Explain the importance of data journalism in a concise sentence\",\n",
+    "        }\n",
+    "    ],\n",
+    "    model=\"google/gemma-2-9b-it\", # NEW\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa6662ed",
+   "metadata": {},
+   "source": [
+    "Again, your response might vary from what’s here. Let’s find out."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "0f036c66",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Data journalism plays a crucial role in holding those in power accountable by uncovering hidden trends, patterns, \n",
+       "and insights through the analysis and visualization of data, enabling informed decision-making and promoting \n",
+       "transparency.\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Data journalism plays a crucial role in holding those in power accountable by uncovering hidden trends, patterns, \n",
+       "and insights through the analysis and visualization of data, enabling informed decision-making and promoting \n",
+       "transparency.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12802cac",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "### Sidenote:\n",
+    "Hugging Face’s Python library is very similar to the ones offered by OpenAI, Anthropic and other LLM providers. If you prefer to use those tools, the techniques you learn here should be easily transferable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7668bd3",
+   "metadata": {},
+   "source": [
+    "For instance, here’s how you’d make this same call with Anthropic’s Python library:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "11e567e3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from anthropic import Anthropic\n",
+    "\n",
+    "client = Anthropic(api_key=api_key)\n",
+    "\n",
+    "response = client.messages.create(\n",
+    "    messages=[\n",
+    "        {\"role\": \"user\", \"content\": \"Explain the importance of data journalism in a concise sentence\"},\n",
+    "    ],\n",
+    "    model=\"claude-3-5-sonnet-20240620\",\n",
+    ")\n",
+    "\n",
+    "print(response.content[0].text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "182e8e6b",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "A well-structured prompt helps the LLM provide more accurate and useful responses."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da211bea",
+   "metadata": {},
+   "source": [
+    "One common technique for improving results is to open with a “system” prompt to establish the model’s tone and role. Let’s switch back to Llama 3.3 and provide a `system` message that provides a specific motivation for the LLM’s responses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "a5660ed5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        ### <-- NEW\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"you are an enthusiastic nerd who believes data journalism is the future.\"\n",
+    "        },\n",
+    "        ### -->\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Explain the importance of data journalism in a concise sentence\",\n",
+    "        }\n",
+    "    ],\n",
+    "    model=\"meta-llama/Llama-3.3-70B-Instruct\", # NEW\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "598dd139",
+   "metadata": {},
+   "source": [
+    "Check out the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "a5c74d22",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Data journalism is revolutionizing the way we consume news by using statistical analysis and visualizations to \n",
+       "uncover hidden truths, provide fact-based evidence, and hold those in power accountable, thereby fostering a more \n",
+       "informed and engaged citizenry!\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Data journalism is revolutionizing the way we consume news by using statistical analysis and visualizations to \n",
+       "uncover hidden truths, provide fact-based evidence, and hold those in power accountable, thereby fostering a more \n",
+       "informed and engaged citizenry!\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "304bbabc",
+   "metadata": {},
+   "source": [
+    "Want to see how tone affects the response? Change the system prompt to something old-school."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "3123cd0b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"you are a crusty, ill-tempered editor who hates math and thinks data journalism is a waste of time and resources.\" # NEW\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Explain the importance of data journalism in a concise sentence\",\n",
+    "        }\n",
+    "    ],\n",
+    "    model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b90dd487",
+   "metadata": {},
+   "source": [
+    "Then re-run the code and summon J. Jonah Jameson."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "3defdc9c",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">If I must, data journalism is supposedly important because it allows reporters to use numbers and statistics to \n",
+       "fact-check claims, identify trends, and hold those in power accountable, but frankly, I've seen better storytelling\n",
+       "in a spreadsheet.\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "If I must, data journalism is supposedly important because it allows reporters to use numbers and statistics to \n",
+       "fact-check claims, identify trends, and hold those in power accountable, but frankly, I've seen better storytelling\n",
+       "in a spreadsheet.\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "957517fa-e69b-42bb-88b5-3a71b680972f",
+   "metadata": {},
+   "source": [
+    "**[6. Structured responses →](ch6-structured-responses.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0e45220f-a77c-4463-8211-96dd79b09840",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch6-structured-responses.ipynb ADDED Viewed

	@@ -0,0 +1,719 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Structured Responses\n",
+    "\n",
+    "Here's a public service announcement. There's no law that says you have to ask LLMs for essays, poems or relationship advice.\n",
+    "\n",
+    "Yes, they're great at drumming up long blocks of text. An LLM can spit out a long answer to almost any question. It's how they've been tuned and marketed by companies selling chatbots and more conversational forms of search.\n",
+    "\n",
+    "But they're also great at answering simple questions, a skill that has been overlooked in much of the hoopla that followed the introduction of ChatGPT.\n",
+    "\n",
+    "Here's a example that simply prompts the LLM to answer a straightforward question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from rich import print\n",
+    "\n",
+    "# Reuse the Hugging Face client setup from the previous chapter\n",
+    "from huggingface_hub import InferenceClient\n",
+    "api_key = os.getenv(\"HF_TOKEN\")\n",
+    "client = InferenceClient(\n",
+    "    token=api_key,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide the name of a professional sports team.\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Lace that into our request."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": prompt # NEW\n",
+    "        },\n",
+    "    ],\n",
+    "    model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And now add a user message that provides the name of a professional sports team."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": prompt\n",
+    "        },\n",
+    "        ### <-- NEW \n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Minnesota Twins\",\n",
+    "        }\n",
+    "        ### -->\n",
+    "    ],\n",
+    "    model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check the response."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Major League Baseball <span style=\"font-weight: bold\">(</span>MLB<span style=\"font-weight: bold\">)</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "Major League Baseball \u001b[1m(\u001b[0mMLB\u001b[1m)\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And we'll bet you get the right answer.\n",
+    "\n",
+    "```\n",
+    "Major League Baseball (MLB)\n",
+    "```\n",
+    "\n",
+    "Try another one."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": prompt\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Minnesota Vikings\", # NEW\n",
+    "        }\n",
+    "    ],\n",
+    "    model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">National Football League <span style=\"font-weight: bold\">(</span>NFL<span style=\"font-weight: bold\">)</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "National Football League \u001b[1m(\u001b[0mNFL\u001b[1m)\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "See what we mean?\n",
+    "\n",
+    "```\n",
+    "National Football League (NFL)\n",
+    "```\n",
+    "\n",
+    "This approach can be use to classify large datasets, adding a new column of data that categories text in a way that makes it easier to analyze.\n",
+    "\n",
+    "Let's try it by making a function that will classify whatever team you provide."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_team(name):\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide the name of a professional sports team.\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": name,\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "    )\n",
+    "\n",
+    "    return response.choices[0].message.content"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A list of teams."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "team_list = [\"Minnesota Twins\", \"Minnesota Vikings\", \"Minnesota Timberwolves\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, loop through the list and ask the LLM to code them one by one."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'Minnesota Twins'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Major League Baseball (MLB)'</span><span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\u001b[32m'Minnesota Twins'\u001b[0m, \u001b[32m'Major League Baseball \u001b[0m\u001b[32m(\u001b[0m\u001b[32mMLB\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'Minnesota Vikings'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'National Football League (NFL)'</span><span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\u001b[32m'Minnesota Vikings'\u001b[0m, \u001b[32m'National Football League \u001b[0m\u001b[32m(\u001b[0m\u001b[32mNFL\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'Minnesota Timberwolves'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'National Basketball Association (NBA)'</span><span style=\"font-weight: bold\">]</span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[1m[\u001b[0m\u001b[32m'Minnesota Timberwolves'\u001b[0m, \u001b[32m'National Basketball Association \u001b[0m\u001b[32m(\u001b[0m\u001b[32mNBA\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m\u001b[1m]\u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "for team in team_list:\n",
+    "    league = classify_team(team)\n",
+    "    print([team, league])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Due its probabilistic nature, the LLM can sometimes return slight variations on the same answer. You can prevent this by adding a validation system that will only accept responses from a pre-defined list."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_team(name):\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide the name of a professional sports team.\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\n",
+    "Your responses must come from the following list:\n",
+    "- Major League Baseball (MLB)\n",
+    "- National Football League (NFL)\n",
+    "- National Basketball Association (NBA)\n",
+    "\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": name,\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "    )\n",
+    "\n",
+    "    answer = response.choices[0].message.content\n",
+    "    ### <-- NEW\n",
+    "    acceptable_answers = [\n",
+    "        \"Major League Baseball (MLB)\",\n",
+    "        \"National Football League (NFL)\",\n",
+    "        \"National Basketball Association (NBA)\",\n",
+    "    ]\n",
+    "    if answer not in acceptable_answers:\n",
+    "        raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "    ### -->\n",
+    "    return answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, ask it for a team that's not in one of those leagues. You should get an error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "ValueError",
+     "evalue": "National Hockey League (NHL) \n\nNote: The provided team doesn't fit into the specified leagues (MLB, NFL, NBA), as the Minnesota Wild is a part of the National Hockey League. not in list of acceptable answers",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[51], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mclassify_team\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mMinnesota Wild\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
+      "Cell \u001b[0;32mIn[50], line 37\u001b[0m, in \u001b[0;36mclassify_team\u001b[0;34m(name)\u001b[0m\n\u001b[1;32m     31\u001b[0m acceptable_answers \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m     32\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMajor League Baseball (MLB)\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m     33\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNational Football League (NFL)\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m     34\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNational Basketball Association (NBA)\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m     35\u001b[0m ]\n\u001b[1;32m     36\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m answer \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m acceptable_answers:\n\u001b[0;32m---> 37\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00manswer\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m not in list of acceptable answers\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m     39\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m answer\n",
+      "\u001b[0;31mValueError\u001b[0m: National Hockey League (NHL) \n\nNote: The provided team doesn't fit into the specified leagues (MLB, NFL, NBA), as the Minnesota Wild is a part of the National Hockey League. not in list of acceptable answers"
+     ]
+    }
+   ],
+   "source": [
+    "classify_team(\"Minnesota Wild\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_team(name):\n",
+    "    # Last sentence is the prompt is new\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide the name of a professional sports team.\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\n",
+    "Your responses must come from the following list:\n",
+    "- Major League Baseball (MLB)\n",
+    "- National Football League (NFL)\n",
+    "- National Basketball Association (NBA)\n",
+    "\n",
+    "If the team's league is not on the list, you should label them as \"Other\".\n",
+    "\"\"\"\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": name,\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "    )\n",
+    "\n",
+    "    answer = response.choices[0].message.content\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Major League Baseball (MLB)\",\n",
+    "        \"National Football League (NFL)\",\n",
+    "        \"National Basketball Association (NBA)\",\n",
+    "        \"Other\", # NEW\n",
+    "    ]\n",
+    "    if answer not in acceptable_answers:\n",
+    "        raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    return answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now try the Minnesota Wild again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Other'"
+      ]
+     },
+     "execution_count": 53,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "classify_team(\"Minnesota Wild\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And you'll get the answer you expect.\n",
+    "\n",
+    "```\n",
+    "'Other'\n",
+    "```\n",
+    "\n",
+    "Most LLMs are pre-programmed to be creative and generate a range of responses to same prompt. For structured responses like this, we don't want that. We want consistency. So it's a good idea to ask the LLM to be more straightforward by reducing a creativity setting known as `temperature` to zero."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_team(name):\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide the name of a professional sports team.\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\n",
+    "Your responses must come from the following list:\n",
+    "- Major League Baseball (MLB)\n",
+    "- National Football League (NFL)\n",
+    "- National Basketball Association (NBA)\n",
+    "\n",
+    "If the team's league is not on the list, you should label them as \"Other\".\n",
+    "\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": name,\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0, # NEW\n",
+    "    )\n",
+    "\n",
+    "    answer = response.choices[0].message.content\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Major League Baseball (MLB)\",\n",
+    "        \"National Football League (NFL)\",\n",
+    "        \"National Basketball Association (NBA)\",\n",
+    "        \"Other\",\n",
+    "    ]\n",
+    "    if answer not in acceptable_answers:\n",
+    "        raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    return answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can also increase reliability by priming the LLM with examples of the type of response you want. This technique is called [\"few shot prompting\"](https://www.ibm.com/think/topics/few-shot-prompting). In this style of prompting, which can feel like a strange form of roleplaying, you provide both the \"user\" input as well as the \"assistant\" response you want the LLM to mimic.\n",
+    "\n",
+    "Here's how it's done:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_team(name):\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide the name of a professional sports team.\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\n",
+    "Your responses must come from the following list:\n",
+    "- Major League Baseball (MLB)\n",
+    "- National Football League (NFL)\n",
+    "- National Basketball Association (NBA)\n",
+    "\n",
+    "If the team's league is not on the list, you should label them as \"Other\".\n",
+    "\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            ### <-- NEW \n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Los Angeles Rams\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": \"National Football League (NFL)\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Los Angeles Dodgers\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": \" Major League Baseball (MLB)\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Los Angeles Lakers\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": \"National Basketball Association (NBA)\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Los Angeles Kings\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": \"Other\",\n",
+    "            },\n",
+    "            ### -->\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": name,\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "\n",
+    "    answer = response.choices[0].message.content\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Major League Baseball (MLB)\",\n",
+    "        \"National Football League (NFL)\",\n",
+    "        \"National Basketball Association (NBA)\",\n",
+    "        \"Other\",\n",
+    "    ]\n",
+    "    if answer not in acceptable_answers:\n",
+    "        raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    return answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can also ask the function to automatically retry if it doesn't get a valid response. This will give the LLM a second chance to get it right in cases where it gets too creative.\n",
+    "\n",
+    "To do that, we'll return installation step and in the `retry` package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%pip install rich ipywidgets retry"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now import the `retry` package."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from rich import print\n",
+    "import requests\n",
+    "from huggingface_hub import InferenceClient\n",
+    "from retry import retry # NEW"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And add the `retry` decorator to the function that will catch the `ValueError` exception and try again, as many times as you specify."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@retry(ValueError, tries=2, delay=2) # NEW\n",
+    "def classify_team(name):\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "...\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**[7. Bulk prompts →](ch7-bulk-prompts.ipynb)**"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

notebooks/ch7-bulk-prompts.ipynb ADDED Viewed

	@@ -0,0 +1,954 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3eb9d2c1",
+   "metadata": {},
+   "source": [
+    "## 7. Bulk prompts"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "960b1cbf",
+   "metadata": {},
+   "source": [
+    "Our reusable prompting function is pretty cool. But requesting answers one by one across a big dataset could take forever. And with the Hugging Face free API, we’re likely to hit rate limits or timeouts if we send too many requests too quickly.\n",
+    "\n",
+    "One solution is to submit your requests in batches and then ask the LLM to return its responses in bulk.\n",
+    "\n",
+    "A common way to do that is to prompt the LLM to return its responses in JSON, a JavaScript data format that is easy to work with in Python.\n",
+    "\n",
+    "To try that, we start by adding the built-in json library to our imports."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "id": "ec94fe49",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json # NEW\n",
+    "from rich import print\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import os\n",
+    "\n",
+    "api_key = os.getenv(\"HF_TOKEN\")\n",
+    "client = InferenceClient(\n",
+    "    token=api_key,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eac8a34e",
+   "metadata": {},
+   "source": [
+    "Next, we make a series of changes to our function to adapt it to work with a batch of inputs. Get ready. It’s a lot.\n",
+    "- We tweak the name of the function.\n",
+    "- We change our input argument to a list.\n",
+    "- We expand our prompt to explain that we will provide a list of team names.\n",
+    "- We ask the LLM to classify them individually, returning its answers in a JSON list.\n",
+    "- We insist on getting one answer for each input.\n",
+    "- We tweak our few-shot training to reflect this new approach.\n",
+    "- We submit our input as a single string with new lines separating each team name.\n",
+    "- We convert the LLM’s response from a string to a list using the `json.loads` function.\n",
+    "- We check that the LLM’s answers are in our list of acceptable answers with a loop through the list.\n",
+    "- We merge the team names and the LLM’s answers into a dictionary returned by the function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "70477229",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@retry(ValueError, tries=2, delay=2)\n",
+    "def classify_teams(name_list): # NEW\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide list of professional sports team names separated by new lines\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\n",
+    "Your responses must come from the following list:\n",
+    "- Major League Baseball (MLB)\n",
+    "- National Football League (NFL)\n",
+    "- National Basketball Association (NBA)\n",
+    "\n",
+    "If the team's league is not on the list, you should label them as \"Other\".\n",
+    "\n",
+    "Your answers should be returned as a flat JSON list.\n",
+    "\n",
+    "It is very important that the length of JSON list you return is exactly the same as the number of names you receive.\n",
+    "\n",
+    "If I were to submit:\n",
+    "\n",
+    "\"Los Angeles Rams\\nLos Angeles Dodgers\\nLos Angeles Lakers\\nLos Angeles Kings\"\n",
+    "\n",
+    "You should return the following:\n",
+    "\n",
+    "[\"National Football League (NFL)\", \"Major League Baseball (MLB)\", \"National Basketball Association (NBA)\", \"Other\"]\n",
+    "\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            ### <-- NEW \n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Chicago Bears\\nChicago Cubs\\nChicago Bulls\\nChicago Blackhawks\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": '[\"National Football League (NFL)\", \"Major League Baseball (MLB)\", \"National Basketball Association (NBA)\", \"Other\"]',\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"\\n\".join(name_list),\n",
+    "            }\n",
+    "            ### --> \n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "\n",
+    "    answer_str = response.choices[0].message.content # NEW\n",
+    "    answer_list = json.loads(answer_str) # NEW\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Major League Baseball (MLB)\",\n",
+    "        \"National Football League (NFL)\",\n",
+    "        \"National Basketball Association (NBA)\",\n",
+    "        \"Other\",\n",
+    "    ]\n",
+    "    ### <-- NEW\n",
+    "    for answer in answer_list:\n",
+    "        if answer not in acceptable_answers:\n",
+    "            raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "    return dict(zip(name_list, answer_list))\n",
+    "    ### -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a0909ed",
+   "metadata": {},
+   "source": [
+    "Try that with our team list. And you’ll see that it works with only a single API call. The same technique will work for a batch of any size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "2bb71639",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'Minnesota Twins': 'Major League Baseball (MLB)',\n",
+       " 'Minnesota Vikings': 'National Football League (NFL)',\n",
+       " 'Minnesota Timberwolves': 'National Basketball Association (NBA)'}"
+      ]
+     },
+     "execution_count": 28,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "team_list = [\"Minnesota Twins\", \"Minnesota Vikings\", \"Minnesota Timberwolves\"]\n",
+    "classify_teams(team_list)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6815a5c",
+   "metadata": {},
+   "source": [
+    "Though, as you batches get bigger, one common problem is that the number of outputs from the LLM can fail to match the number of inputs you provide. This problem may lessen as LLMs improve, but for now it’s a good idea to limit to batches to a few dozen inputs and to verify that you’re getting the right number back."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "8295afc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@retry(ValueError, tries=2, delay=2)\n",
+    "def classify_teams(name_list):\n",
+    "    prompt = \"\"\"\n",
+    "You are an AI model trained to classify text.\n",
+    "\n",
+    "I will provide list of professional sports team names separated by new lines\n",
+    "\n",
+    "You will reply with the sports league in which they compete.\n",
+    "\n",
+    "Your responses must come from the following list:\n",
+    "- Major League Baseball (MLB)\n",
+    "- National Football League (NFL)\n",
+    "- National Basketball Association (NBA)\n",
+    "\n",
+    "If the team's league is not on the list, you should label them as \"Other\".\n",
+    "\n",
+    "Your answers should be returned as a flat JSON list.\n",
+    "\n",
+    "It is very important that the length of JSON list you return is exactly the same as the number of names you receive.\n",
+    "\n",
+    "If I were to submit:\n",
+    "\n",
+    "\"Los Angeles Rams\\nLos Angeles Dodgers\\nLos Angeles Lakers\\nLos Angeles Kings\"\n",
+    "\n",
+    "You should return the following:\n",
+    "\n",
+    "[\"National Football League (NFL)\", \"Major League Baseball (MLB)\", \"National Basketball Association (NBA)\", \"Other\"]\n",
+    "\"\"\"\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Chicago Bears,Chicago Cubs,Chicago Bulls,Chicago Blackhawks\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": '[\"National Football League (NFL)\", \"Major League Baseball (MLB)\", \"National Basketball Association (NBA)\", \"Other\"]',\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"\\n\".join(name_list),\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "\n",
+    "    answer_str = response.choices[0].message.content\n",
+    "    answer_list = json.loads(answer_str)\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Major League Baseball (MLB)\",\n",
+    "        \"National Football League (NFL)\",\n",
+    "        \"National Basketball Association (NBA)\",\n",
+    "        \"Other\",\n",
+    "    ]\n",
+    "    for answer in answer_list:\n",
+    "        if answer not in acceptable_answers:\n",
+    "            raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    ### <-- NEW\n",
+    "    try:\n",
+    "        assert len(name_list) == len(answer_list)\n",
+    "    except AssertionError:\n",
+    "        raise ValueError(f\"Number of outputs ({len(name_list)}) does not equal the number of inputs ({len(answer_list)})\")\n",
+    "    ### -->\n",
+    "    return dict(zip(name_list, answer_list))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "587604c0",
+   "metadata": {},
+   "source": [
+    "Okay. Naming sports teams is a cute trick, but what about something hard? And whatever happened to that George Santos idea?\n",
+    "\n",
+    "We’ll tackle that by pulling in our example dataset using `pandas`, a popular data manipulation library in Python.\n",
+    "\n",
+    "First, we need to install it. Back to our installation cell."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ff5e26c",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%pip install huggingface_hub rich ipywidgets retry pandas"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "295346d4",
+   "metadata": {},
+   "source": [
+    "Then import it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "c6d289d7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from rich import print\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import pandas as pd # NEW"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ee5cf9f",
+   "metadata": {},
+   "source": [
+    "Now we’re ready to load the California expenditures data prepared for the class. It contains the distinct list of all vendors listed as payees in itemized receipts attached to disclosure filings."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "3ae2b2fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-llm-classifier/refs/heads/main/_notebooks/Form460ScheduleESubItem.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bca7002b",
+   "metadata": {},
+   "source": [
+    "Have a look at a random sample to get a taste of what’s in there."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "0aa44f42",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>payee</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>5822</th>\n",
+       "      <td>GRAND HYATT SAN FRANCISCO</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8765</th>\n",
+       "      <td>LIZ FIGUEROA FOR LT. GOVERNOR</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2027</th>\n",
+       "      <td>CA STATE UNIVERSITY NORTHRIDGE YOUNG DEMOCRATS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1371</th>\n",
+       "      <td>BEN FRANKLIN CRAFTS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9033</th>\n",
+       "      <td>LYNWOOD FOR BETTER HEALTHCARE, SPONSORED BY SE...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11720</th>\n",
+       "      <td>QUINTESSA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7983</th>\n",
+       "      <td>KOST FM</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4324</th>\n",
+       "      <td>DUNKIN DONUTS</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2434</th>\n",
+       "      <td>CARDENAS MARKETS, INC.</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5738</th>\n",
+       "      <td>GOLDEN PALACE</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                                   payee\n",
+       "5822                           GRAND HYATT SAN FRANCISCO\n",
+       "8765                       LIZ FIGUEROA FOR LT. GOVERNOR\n",
+       "2027      CA STATE UNIVERSITY NORTHRIDGE YOUNG DEMOCRATS\n",
+       "1371                                 BEN FRANKLIN CRAFTS\n",
+       "9033   LYNWOOD FOR BETTER HEALTHCARE, SPONSORED BY SE...\n",
+       "11720                                          QUINTESSA\n",
+       "7983                                             KOST FM\n",
+       "4324                                       DUNKIN DONUTS\n",
+       "2434                              CARDENAS MARKETS, INC.\n",
+       "5738                                       GOLDEN PALACE"
+      ]
+     },
+     "execution_count": 34,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.sample(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fe1e695",
+   "metadata": {},
+   "source": [
+    "Now let’s adapt what we have to fit. Instead of asking for a sports league back, we will ask the LLM to classify each payee as a restaurant, bar, hotel or other establishment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "970c5161",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@retry(ValueError, tries=2, delay=2)\n",
+    "### <-- NEW\n",
+    "def classify_payees(name_list):\n",
+    "    prompt = \"\"\"You are an AI model trained to categorize businesses based on their names.\n",
+    "\n",
+    "You will be given a list of business names, each separated by a new line.\n",
+    "\n",
+    "Your task is to analyze each name and classify it into one of the following categories: Restaurant, Bar, Hotel, or Other.\n",
+    "\n",
+    "It is extremely critical that there is a corresponding category output for each business name provided as an input.\n",
+    "\n",
+    "If a business does not clearly fall into Restaurant, Bar, or Hotel categories, you should classify it as \"Other\".\n",
+    "\n",
+    "Even if the type of business is not immediately clear from the name, it is essential that you provide your best guess based on the information available to you. If you can't make a good guess, classify it as Other.\n",
+    "\n",
+    "For example, if given the following input:\n",
+    "\n",
+    "\"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\"\n",
+    "\n",
+    "Your output should be a JSON list in the following format:\n",
+    "\n",
+    "[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]\n",
+    "\n",
+    "This means that you have classified \"Intercontinental Hotel\" as a Hotel, \"Pizza Hut\" as a Restaurant, \"Cheers\" as a Bar, \"Welsh's Family Restaurant\" as a Restaurant, and both \"KTLA\" and \"Direct Mailing\" as Other.\n",
+    "\n",
+    "Ensure that the number of classifications in your output matches the number of business names in the input. It is very important that the length of JSON list you return is exactly the same as the number of business names youyou receive.\n",
+    "\"\"\"\n",
+    "### -->\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            ### <-- NEW\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": '[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]',\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Subway Sandwiches\\nRuth Chris Steakhouse\\nPolitical Consulting Co\\nThe Lamb's Club\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": '[\"Restaurant\", \"Restaurant\", \"Other\", \"Bar\"]',\n",
+    "            },\n",
+    "            ### -->\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"\\n\".join(name_list),\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "\n",
+    "    answer_str = response.choices[0].message.content\n",
+    "    answer_list = json.loads(answer_str)\n",
+    "\n",
+    "    ### <-- NEW \n",
+    "    acceptable_answers = [\n",
+    "        \"Restaurant\",\n",
+    "        \"Bar\",\n",
+    "        \"Hotel\",\n",
+    "        \"Other\",\n",
+    "    ] \n",
+    "    ### -->\n",
+    "    \n",
+    "    for answer in answer_list:\n",
+    "        if answer not in acceptable_answers:\n",
+    "            raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    try:\n",
+    "        assert len(name_list) == len(answer_list)\n",
+    "    except AssertionError:\n",
+    "        raise ValueError(f\"Number of outputs ({len(name_list)}) does not equal the number of inputs ({len(answer_list)})\")\n",
+    "\n",
+    "    return dict(zip(name_list, answer_list))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf9b69a0",
+   "metadata": {},
+   "source": [
+    "Now pull out a random sample of payees as a list."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "fe74a8ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_list = list(df.sample(10).payee)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "688192ae",
+   "metadata": {},
+   "source": [
+    "And see how it does."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "a84d364a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'CALIFORNIA NOW ORGANIZATION': 'Other',\n",
+       " 'ALOHA SIGNS': 'Other',\n",
+       " \"SABELLA'S ITALIAN MARKET\": 'Restaurant',\n",
+       " 'ELIZABETH ESPARZA': 'Other',\n",
+       " 'DATA-SCRIBE': 'Other',\n",
+       " \"LISA HEMENWAY'S BISTRO\": 'Restaurant',\n",
+       " 'NEW EDGE MULTIMEDIA': 'Other',\n",
+       " 'FUSILLI': 'Restaurant',\n",
+       " 'FRIENDS OF DR IRENE PINKARD FOR CITY COUNCIL': 'Other',\n",
+       " 'ZEN SUSHI SACRAMENTO': 'Restaurant'}"
+      ]
+     },
+     "execution_count": 38,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "classify_payees(sample_list)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "197856d9",
+   "metadata": {},
+   "source": [
+    "That’s nice for a sample. But how do you loop through the entire dataset and code them.\n",
+    "\n",
+    "One way to start is to write a function that will split up a list into batches of a certain size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "b784940f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_batch_list(li, n=10):\n",
+    "    \"\"\"Split the provided list into batches of size `n`.\"\"\"\n",
+    "    batch_list = []\n",
+    "    for i in range(0, len(li), n):\n",
+    "        batch_list.append(li[i : i + n])\n",
+    "    return batch_list"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e35948e",
+   "metadata": {},
+   "source": [
+    "Before we loop through our payees, let’s add a couple libraries that will let us avoid hammering HF and keep tabs on our progress."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "f1593a7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time # NEW\n",
+    "import json\n",
+    "from rich import print\n",
+    "from rich.progress import track # NEW\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da68c37f",
+   "metadata": {},
+   "source": [
+    "That batching trick can then be fit into a new function that will accept a big list of payees and classify them batch by batch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "6e6965f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_batches(name_list, batch_size=10, wait=2):\n",
+    "    \"\"\"Split the provided list of names into batches and classify with our LLM them one by one.\"\"\"\n",
+    "    # Create a place to store the results\n",
+    "    all_results = {}\n",
+    "\n",
+    "    # Batch up the list\n",
+    "    batch_list = get_batch_list(name_list, n=batch_size)\n",
+    "\n",
+    "    # Loop through the list in batches\n",
+    "    for batch in track(batch_list):\n",
+    "        # Classify it with the LLM\n",
+    "        batch_results = classify_payees(batch)\n",
+    "\n",
+    "        # Add what we get back to the results\n",
+    "        all_results.update(batch_results)\n",
+    "\n",
+    "        # Tap the brakes to avoid overloading groq's API\n",
+    "        time.sleep(wait)\n",
+    "\n",
+    "    # Return the results\n",
+    "    return all_results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "222a3846",
+   "metadata": {},
+   "source": [
+    "Now, let’s take out a bigger sample."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "39778766",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bigger_sample = list(df.sample(100).payee)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "722de39a",
+   "metadata": {},
+   "source": [
+    "And let it rip."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7f676e52",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "classify_batches(bigger_sample)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc2c1f94",
+   "metadata": {},
+   "source": [
+    "Printing out to the console is interesting, but eventually you’ll want to be able to work with the results in a more structured way. So let’s convert the results into a `pandas` DataFrame by modifying our function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "c41b736f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_batches(name_list, batch_size=10, wait=2):\n",
+    "    # Store the results\n",
+    "    all_results = {}\n",
+    "\n",
+    "    # Batch up the list\n",
+    "    batch_list = get_batch_list(name_list, n=batch_size)\n",
+    "\n",
+    "    # Loop through the list in batches\n",
+    "    for batch in track(batch_list):\n",
+    "        # Classify it\n",
+    "        batch_results = classify_payees(batch)\n",
+    "\n",
+    "        # Add it to the results\n",
+    "        all_results.update(batch_results)\n",
+    "\n",
+    "        # Tap the brakes\n",
+    "        time.sleep(wait)\n",
+    "\n",
+    "    # Return the results (NEW)\n",
+    "    return pd.DataFrame(\n",
+    "        all_results.items(),\n",
+    "        columns=[\"payee\", \"category\"]\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353adf11",
+   "metadata": {},
+   "source": [
+    "Results can now be stored as a DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "51aa0550",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a4b561bb64554c70a7bfd309c43b60da",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "results_df = classify_batches(bigger_sample)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96bd75bc",
+   "metadata": {},
+   "source": [
+    "And inspected using the standard `pandas` tools. Let's take a peek at the first records:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "514971f9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>payee</th>\n",
+       "      <th>category</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>TAXIPASS</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>THE JEFFERSON HOTEL</td>\n",
+       "      <td>Hotel</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>NORMS RESTAURANT</td>\n",
+       "      <td>Restaurant</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>JENNY OROPEZA FOR STATE SENATE</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>BIG MAMA'S &amp; PAPA'S PIZZERIA</td>\n",
+       "      <td>Restaurant</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                            payee    category\n",
+       "0                        TAXIPASS       Other\n",
+       "1             THE JEFFERSON HOTEL       Hotel\n",
+       "2                NORMS RESTAURANT  Restaurant\n",
+       "3  JENNY OROPEZA FOR STATE SENATE       Other\n",
+       "4    BIG MAMA'S & PAPA'S PIZZERIA  Restaurant"
+      ]
+     },
+     "execution_count": 46,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "results_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fef7be4e",
+   "metadata": {},
+   "source": [
+    "Or a sum of all the categories."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "6911dc37",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "category\n",
+       "Other         67\n",
+       "Restaurant    20\n",
+       "Hotel         12\n",
+       "Bar            1\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 47,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "results_df.category.value_counts()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8705a48-49e6-4ec8-8f3f-126bcf011f0f",
+   "metadata": {},
+   "source": [
+    "**[8. Evaluating prompts →](ch8-evaluating-prompts.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d96f1b46-7e66-4e5a-8d17-84a75b70404e",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebooks/ch8-evaluating-prompts.ipynb ADDED Viewed

	@@ -0,0 +1,655 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8. Evaluating Prompts\n",
+    "\n",
+    "Before the advent of large-language models, machine-learning systems were trained using a technique called [supervised learning](https://en.wikipedia.org/wiki/Supervised_learning). This approach required users to provide carefully prepared training data that showed the computer what was expected.\n",
+    "\n",
+    "For instance, if you were developing a model to distinguish spam emails from legitimate ones, you would need to provide the model with a set of spam emails and another set of legitimate emails. The model would then use that data to learn the relationships between the inputs and outputs, which it could then apply to new emails.\n",
+    "\n",
+    "In addition to training the model, the curated input would be used to evaluate the model's performance. This process typically involved splitting the supervised data into two sets: one for training and one for testing. The model could then be evaluated using a separate set of supervised data to ensure it could generalize beyond the examples it had been fed during training.\n",
+    "\n",
+    "Large-language models operate differently. They are trained on vast amounts of text and can generate responses based on the relationships they derive from various machine-learning approaches. The result is that they can be used to perform a wide range of tasks without requiring supervised data to be prepared beforehand.\n",
+    "\n",
+    "This is a significant advantage. However, it also raises questions about evaluating an LLM prompt. If we don't have a supervised sample to test its results, how do we know if it's doing a good job? How can we improve its performance if we can't see where it gets things wrong?\n",
+    "\n",
+    "In the final chapters, we will show how traditional supervision can still play a vital role in evaluating and improving an LLM prompt."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 84,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time # NEW\n",
+    "import json\n",
+    "from rich import print\n",
+    "from rich.progress import track # NEW\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import pandas as pd\n",
+    "from huggingface_hub import InferenceClient\n",
+    "\n",
+    "api_key = os.getenv(\"HF_TOKEN\")\n",
+    "client = InferenceClient(\n",
+    "    token=api_key,\n",
+    ")\n",
+    "df = pd.read_csv(\"https://raw.githubusercontent.com/palewire/first-llm-classifier/refs/heads/main/_notebooks/Form460ScheduleESubItem.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Start by outputting a random sample from the dataset to a file of comma-separated values. It will serve as our supervised sample. In general, the larger the sample the better the evaluation. But at a certain point the returns diminish. For this exercise, we will use a sample of 250 records."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 85,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.sample(250).to_csv(\"./sample.csv\", index=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can open the file in a spreadsheet program like Excel or Google Sheets. For each payee in the sample, you would provide the correct category in a companion column. This gradually becomes the supervised sample.\n",
+    "\n",
+    "![Sample](https://palewi.re/docs/first-llm-classifier/_images/sample.png)\n",
+    "\n",
+    "To speed the class along, we've already prepared a sample for you in [the class repository](https://github.com/palewire/first-llm-classifier). Our next step is to read it back into a DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 86,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_df = pd.read_csv(\"sample_classified.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll install the Python packages `scikit-learn`, `matplotlib`, and `seaborn`. Prior to LLMs, these libraries were the go-to tools for training and evaluating machine-learning models. We'll primarily be using them for testing.\n",
+    "\n",
+    "Return to the Jupyter notebook and install the packages alongside our other dependencies."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%pip install huggingface_hub rich ipywidgets retry pandas scikit-learn matplotlib seaborn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Add the `test_train_split` function from `scikit-learn` to the import statement."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 88,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from rich import print\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import pandas as pd\n",
+    "from sklearn.model_selection import train_test_split #NEW "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This tool is used to split a supervised sample into separate sets for training and testing.\n",
+    "\n",
+    "The first input is the DataFrame column containing our supervised payees. The second input is the DataFrame column containing the correct categories.\n",
+    "\n",
+    "The `test_size` parameter determines the proportion of the sample that will be used for testing. The `random_state` parameter ensures that the split is reproducible by setting a seed for the random number generator that draws the samples."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "training_input, test_input, training_output, test_output = train_test_split(\n",
+    "    sample_df[['payee']],\n",
+    "    sample_df['category'],\n",
+    "    test_size=0.33,\n",
+    "    random_state=42, # Remember Jackie Robinson. Remember Douglas Adams.\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In a traditional training setup, the next step would be to train a machine-learning model in `sklearn` using the `training_input` and `training_output` sets. The model would then be evaluated using the `test_input` and `test_output` sets.\n",
+    "\n",
+    "With the LLM we skip ahead to the testing phase. We pass the `test_input` set to our LLM prompt and compare the results to the right answers found in `test_output` set.\n",
+    "\n",
+    "All that requires is that we pass the `payee` column from our `test_input` DataFrame to the function we created in the previous chapters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "2549c6db6c4a428a959aa78c686afce1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "###REPEAT FROM PREVIOUS NOTEBOOK\n",
+    "@retry(ValueError, tries=2, delay=2)\n",
+    "def classify_payees(name_list):\n",
+    "    prompt = \"\"\"You are an AI model trained to categorize businesses based on their names.\n",
+    "\n",
+    "You will be given a list of business names, each separated by a new line.\n",
+    "\n",
+    "Your task is to analyze each name and classify it into one of the following categories: Restaurant, Bar, Hotel, or Other.\n",
+    "\n",
+    "It is extremely critical that there is a corresponding category output for each business name provided as an input.\n",
+    "\n",
+    "If a business does not clearly fall into Restaurant, Bar, or Hotel categories, you should classify it as \"Other\".\n",
+    "\n",
+    "Even if the type of business is not immediately clear from the name, it is essential that you provide your best guess based on the information available to you. If you can't make a good guess, classify it as Other.\n",
+    "\n",
+    "For example, if given the following input:\n",
+    "\n",
+    "\"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\"\n",
+    "\n",
+    "Your output should be a JSON list in the following format:\n",
+    "\n",
+    "[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]\n",
+    "\n",
+    "This means that you have classified \"Intercontinental Hotel\" as a Hotel, \"Pizza Hut\" as a Restaurant, \"Cheers\" as a Bar, \"Welsh's Family Restaurant\" as a Restaurant, and both \"KTLA\" and \"Direct Mailing\" as Other.\n",
+    "\n",
+    "Ensure that the number of classifications in your output matches the number of business names in the input. It is very important that the length of JSON list you return is exactly the same as the number of business names you receive.\n",
+    "\"\"\"\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": '[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]',\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"Subway Sandwiches\\nRuth Chris Steakhouse\\nPolitical Consulting Co\\nThe Lamb's Club\",\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": '[\"Restaurant\", \"Restaurant\", \"Other\", \"Bar\"]',\n",
+    "            },\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"\\n\".join(name_list),\n",
+    "            }\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "\n",
+    "    answer_str = response.choices[0].message.content\n",
+    "    answer_list = json.loads(answer_str)\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Restaurant\",\n",
+    "        \"Bar\",\n",
+    "        \"Hotel\",\n",
+    "        \"Other\",\n",
+    "    ]\n",
+    "    for answer in answer_list:\n",
+    "        if answer not in acceptable_answers:\n",
+    "            raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    try:\n",
+    "        assert len(name_list) == len(answer_list)\n",
+    "    except AssertionError:\n",
+    "        raise ValueError(f\"Number of outputs ({len(name_list)}) does not equal the number of inputs ({len(answer_list)})\")\n",
+    "\n",
+    "    return dict(zip(name_list, answer_list))\n",
+    "\n",
+    "def get_batch_list(li, n=10):\n",
+    "    \"\"\"Split the provided list into batches of size `n`.\"\"\"\n",
+    "    batch_list = []\n",
+    "    for i in range(0, len(li), n):\n",
+    "        batch_list.append(li[i : i + n])\n",
+    "    return batch_list\n",
+    "    \n",
+    "def classify_batches(name_list, batch_size=11, wait=2):\n",
+    "    \"\"\"Split the provided list of names into batches and classify with our LLM them one by one.\"\"\"\n",
+    "    # Create a place to store the results\n",
+    "    all_results = {}\n",
+    "\n",
+    "    # Batch up the list\n",
+    "    batch_list = get_batch_list(name_list, n=batch_size)\n",
+    "\n",
+    "    # Loop through the list in batches\n",
+    "    for batch in track(batch_list):\n",
+    "        # Classify it with the LLM\n",
+    "        batch_results = classify_payees(batch)\n",
+    "\n",
+    "        # Add what we get back to the results\n",
+    "        all_results.update(batch_results)\n",
+    "\n",
+    "        # Tap the brakes to avoid overloading HF's API\n",
+    "        time.sleep(wait)\n",
+    "\n",
+    "    # Return the results\n",
+    "    return all_results\n",
+    "    \n",
+    "llm_dict = classify_batches(list(test_input.payee))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we import the `classification_report` and `confusion_matrix` functions from `sklearn`, which are used to evaluate a model's performance. We'll also pull in `seaborn` and `matplotlib` to visualize the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 91,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from rich import print\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import pandas as pd\n",
+    "import seaborn as sns # NEW\n",
+    "import matplotlib.pyplot as plt # NEW \n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import confusion_matrix, classification_report # NEW"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `classification_report` function generats a report card on a model's performance. You provide it with the correct answers in the `test_output` set and the model's predictions in your prompt's DataFrame. In this case, our LLM's predictions are stored in the `llm_df` DataFrame's `category` column."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 92,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm_df = pd.DataFrame.from_dict(llm_dict, orient=\"index\", columns=[\"category\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 93,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">              precision    recall  f1-score   support\n",
+       "\n",
+       "         Bar       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>         <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span>\n",
+       "       Hotel       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>         <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">9</span>\n",
+       "       Other       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.98</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.99</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">57</span>\n",
+       "  Restaurant       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.94</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.97</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">15</span>\n",
+       "\n",
+       "    accuracy                           <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.99</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">83</span>\n",
+       "   macro avg       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.98</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.00</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.99</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">83</span>\n",
+       "weighted avg       <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.99</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.99</span>      <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.99</span>        <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">83</span>\n",
+       "\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "              precision    recall  f1-score   support\n",
+       "\n",
+       "         Bar       \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m         \u001b[1;36m2\u001b[0m\n",
+       "       Hotel       \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m1.00\u001b[0m         \u001b[1;36m9\u001b[0m\n",
+       "       Other       \u001b[1;36m1.00\u001b[0m      \u001b[1;36m0.98\u001b[0m      \u001b[1;36m0.99\u001b[0m        \u001b[1;36m57\u001b[0m\n",
+       "  Restaurant       \u001b[1;36m0.94\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m0.97\u001b[0m        \u001b[1;36m15\u001b[0m\n",
+       "\n",
+       "    accuracy                           \u001b[1;36m0.99\u001b[0m        \u001b[1;36m83\u001b[0m\n",
+       "   macro avg       \u001b[1;36m0.98\u001b[0m      \u001b[1;36m1.00\u001b[0m      \u001b[1;36m0.99\u001b[0m        \u001b[1;36m83\u001b[0m\n",
+       "weighted avg       \u001b[1;36m0.99\u001b[0m      \u001b[1;36m0.99\u001b[0m      \u001b[1;36m0.99\u001b[0m        \u001b[1;36m83\u001b[0m\n",
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print(classification_report(test_output, llm_df.category))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "That will output a report that looks something like this:\n",
+    "\n",
+    "```\n",
+    "              precision    recall  f1-score   support\n",
+    "\n",
+    "         Bar       1.00      1.00      1.00         2\n",
+    "       Hotel       0.89      0.80      0.84        10\n",
+    "       Other       0.96      0.96      0.96        57\n",
+    "  Restaurant       0.87      0.93      0.90        14\n",
+    "\n",
+    "    accuracy                           0.94        83\n",
+    "   macro avg       0.93      0.92      0.93        83\n",
+    "weighted avg       0.94      0.94      0.94        83\n",
+    "```\n",
+    "\n",
+    "At first, the report can be a bit overwhelming. What are all these technical terms?\n",
+    "\n",
+    "Precision measures what statistics nerds call \"positive predictive value.\" It's how often the model made the correct decision when it applied a category. For instance, in the \"Bar\" category, the LLM correctly predicted both of the bars in our supervised sample. That's a precision of 1.00. An analogy here is a baseball player's contact rate. Precision is a measure of how often the model connects with the ball when it swings its bat.\n",
+    "\n",
+    "Recall measures how many of the supervised instances were identified by the model. In this case, it shows that the LLM correctly spotted 80% of the hotels in our manual sample.\n",
+    "\n",
+    "The f1-score is a combination of precision and recall. It's a way to measure a model's overall performance by balancing the two.\n",
+    "\n",
+    "The support column shows how many instances of each category were in the supervised sample.\n",
+    "\n",
+    "The averages at the bottom combine the results for all categories. The macro row is a simple average all the scores in that column. The weighted row is a weighted average based on the number of instances in each category.\n",
+    "\n",
+    "In the example result provided above, we can see that the LLM was guessing correctly more than 90% of the time no matter how you slice it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Another technique for evaluating classifiers is to visualize the results using a chart known as a confusion matrix. This chart shows how often the model correctly predicted each category and where it got things wrong.\n",
+    "\n",
+    "Drawing one up requires the `confusion_matrix` function from `sklearn` and an embarassing tangle of code from `seaborn` and `matplotlib` libraries. Most of it is boilerplate, but you need to punch your test variables, as well as the proper labels for the categories, in a few picky places."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "conf_mat = confusion_matrix(\n",
+    "    test_output, # labels\n",
+    "    llm_df.category, # labels\n",
+    "    labels=llm_df.category.unique() # labels\n",
+    ")\n",
+    "fig, ax = plt.subplots(figsize=(5,5))\n",
+    "sns.heatmap(\n",
+    "    conf_mat,\n",
+    "    annot=True,\n",
+    "    fmt='d',\n",
+    "    xticklabels=llm_df.category.unique(), # labels\n",
+    "    yticklabels=llm_df.category.unique() # labels\n",
+    ")\n",
+    "plt.ylabel('Actual')\n",
+    "plt.xlabel('Predicted')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![confusion matrix](https://palewi.re/docs/first-llm-classifier/_images/matrix-llm.png)\n",
+    "\n",
+    "The diagonal line of cells running from the upper left to the lower right shows where the model correctly predicted the category. The off-diagonal cells show where it got things wrong. The color of the cells indicates how often the model made that prediction. For instance, we can see that one miscategorized hotel in the sample was predicted to be a restaurant and the second was predicted to be \"Other.\"\n",
+    "\n",
+    "Due to the inherent randomness in the LLM's predictions, it's a good idea to test your sample and run these reports multiple times to get a sense of the model's performance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before we look at how you might improve the LLM's performance, let's take a moment to compare the results of this evaluation against the old school approach where the supervised sample is used to train a machine-learning model that doesn't have access to the ocean of knowledge poured into an LLM.\n",
+    "\n",
+    "This will require importing a mess of `sklearn` functions and classes. We'll use `TfidfVectorizer` to convert the payee text into a numerical representation that can be used by a `LinearSVC` classifier. We'll then use a `Pipeline` to chain the two together. If you have no idea what any of that means, don't worry. Now that we have LLMs in this world, you might never need to know."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from rich import print\n",
+    "import requests\n",
+    "from retry import retry\n",
+    "import pandas as pd\n",
+    "import seaborn as sns\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import confusion_matrix, classification_report\n",
+    "from sklearn.svm import LinearSVC # NEW\n",
+    "from sklearn.pipeline import Pipeline # NEW\n",
+    "from sklearn.compose import ColumnTransformer # NEW\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer # NEW"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here's a simple example of how you might train and evaluate a traditional machine-learning model using the supervised sample.\n",
+    "\n",
+    "First you setup all the machinery."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vectorizer = TfidfVectorizer(\n",
+    "    sublinear_tf=True,\n",
+    "    min_df=5,\n",
+    "    norm='l2',\n",
+    "    encoding='latin-1',\n",
+    "    ngram_range=(1, 3),\n",
+    ")\n",
+    "preprocessor = ColumnTransformer(\n",
+    "    transformers=[\n",
+    "        ('payee', vectorizer, 'payee')\n",
+    "    ],\n",
+    "    sparse_threshold=0,\n",
+    "    remainder='drop'\n",
+    ")\n",
+    "pipeline = Pipeline([\n",
+    "    ('preprocessor', preprocessor),\n",
+    "    ('classifier', LinearSVC(dual=\"auto\"))\n",
+    "])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then you train the model using those training sets we split out at the start."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = pipeline.fit(training_input, training_output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And you ask the model to use its training to predict the right answers for the test set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = model.predict(test_input)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, you can run the same evaluation code as before to see how the traditional model performed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(classification_report(test_output, predictions))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "              precision    recall  f1-score   support\n",
+    "\n",
+    "         Bar       0.00      0.00      0.00         2\n",
+    "       Hotel       1.00      0.27      0.43        10\n",
+    "       Other       0.75      1.00      0.85        57\n",
+    "  Restaurant       0.80      0.29      0.42        14\n",
+    "\n",
+    "    accuracy                           0.76        83\n",
+    "   macro avg       0.64      0.39      0.43        83\n",
+    "weighted avg       0.77      0.76      0.70        83\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "conf_mat = confusion_matrix(test_output, llm_df.category, labels=llm_df.category.unique())\n",
+    "fig, ax = plt.subplots(figsize=(5,5))\n",
+    "sns.heatmap(\n",
+    "    conf_mat,\n",
+    "    annot=True,\n",
+    "    fmt='d',\n",
+    "    xticklabels=llm_df.category.unique(),\n",
+    "    yticklabels=llm_df.category.unique()\n",
+    ")\n",
+    "plt.ylabel('Actual')\n",
+    "plt.xlabel('Predicted')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![confusion matrix](https://palewi.re/docs/first-llm-classifier/_images/matrix-ml.png)\n",
+    "\n",
+    "Not great. The traditional model is guessing correctly about 75% of the time, but it's missing most cases of our \"Bar\", \"Hotel\" and \"Restaurant\" categories as almost everything is getting filed as \"Other.\" The LLM, on the other hand, is guessing correctly more than 90% of the time and flagging many of the rare categories that we're seeking to find in the haystack of data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**[9. Improving prompts →](ch9-improving-prompts.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

notebooks/ch9-improving-prompts.ipynb ADDED Viewed

	@@ -0,0 +1,669 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Improving Prompts\n",
+    "\n",
+    "With our LLM prompt showing such strong results, you might be content to leave it as it is. But there are always ways to improve, and you might come across a circumstance where the model's performance is less than ideal.\n",
+    "\n",
+    "Earlier in the lesson, we showed how you can feed the LLM examples of inputs and output prior to your request as part of a \"few shot\" prompt. An added benefit of coding a supervised sample for testing is that you can also use the training slice of the set to prime the LLM with this technique. If you've already done the work of labeling your data, you might as well use it to improve your model as well.\n",
+    "\n",
+    "Converting the training set you held to the side into a few-shot prompt is a simple matter of formatting it to fit your LLM's expected input. Here's how you might do it in our case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import time\n",
+    "import os\n",
+    "from retry import retry\n",
+    "from rich.progress import track\n",
+    "from huggingface_hub import InferenceClient\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import confusion_matrix, classification_report\n",
+    "import pandas as pd\n",
+    "\n",
+    "api_key = os.getenv(\"HF_TOKEN\")\n",
+    "client = InferenceClient(\n",
+    "    token=api_key,\n",
+    ")\n",
+    "\n",
+    "sample_df = pd.read_csv(\"sample.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Calling our previous `get_batch_list` function again:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_batch_list(li, n=10):\n",
+    "    \"\"\"Split the provided list into batches of size `n`.\"\"\"\n",
+    "    batch_list = []\n",
+    "    for i in range(0, len(li), n):\n",
+    "        batch_list.append(li[i : i + n])\n",
+    "    return batch_list\n",
+    "\n",
+    "training_input, test_input, training_output, test_output = train_test_split(\n",
+    "    sample_df[['payee']],\n",
+    "    sample_df['category'],\n",
+    "    test_size=0.33,\n",
+    "    random_state=42\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_fewshots(training_input, training_output, batch_size=10):\n",
+    "    \"\"\"Convert the training input and output from sklearn's train_test_split into a few-shot prompt\"\"\"\n",
+    "    # Batch up the training input into groups of `batch_size`\n",
+    "    input_batches = get_batch_list(list(training_input.payee), n=batch_size)\n",
+    "\n",
+    "    # Do the same for the output\n",
+    "    output_batches = get_batch_list(list(training_output), n=batch_size)\n",
+    "\n",
+    "    # Create a list to hold the formatted few-shot examples\n",
+    "    fewshot_list = []\n",
+    "\n",
+    "    # Loop through the batches\n",
+    "    for i, input_list in enumerate(input_batches):\n",
+    "        fewshot_list.extend([\n",
+    "            # Create a \"user\" message for the LLM formatted the same was a our prompt with newlines\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"\\n\".join(input_list),\n",
+    "            },\n",
+    "            # Create the expected \"assistant\" response as the JSON formatted output we expect\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"content\": json.dumps(output_batches[i])\n",
+    "            }\n",
+    "        ])\n",
+    "\n",
+    "    # Return the list of few-shot examples, one for each batch\n",
+    "    return fewshot_list"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Pass in your training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fewshot_list = get_fewshots(training_input, training_output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Take a peek at the first pair to see if it's what we expect."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[{'role': 'user',\n",
+       "  'content': 'UFW OF AMERICA - AFL-CIO\\nRE-ELECT FIONA MA\\nELLA DINNING ROOM\\nMICHAEL EMERY PHOTOGRAPHY\\nLAKELAND  VILLAGE\\nTHE IVY RESTAURANT\\nMOORLACH FOR SENATE 2016\\nBROWN PALACE HOTEL\\nAPPLE STORE FARMERS MARKET\\nCABLETIME TV'},\n",
+       " {'role': 'assistant',\n",
+       "  'content': '[\"Other\", \"Other\", \"Other\", \"Other\", \"Other\", \"Restaurant\", \"Other\", \"Hotel\", \"Other\", \"Other\"]'}]"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "fewshot_list[:2]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, we can add those examples to our prompt's `messages`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@retry(ValueError, tries=2, delay=2)\n",
+    "def classify_payees(name_list):\n",
+    "    prompt = \"\"\"You are an AI model trained to categorize businesses based on their names.\n",
+    "\n",
+    "You will be given a list of business names, each separated by a new line.\n",
+    "\n",
+    "Your task is to analyze each name and classify it into one of the following categories: Restaurant, Bar, Hotel, or Other.\n",
+    "\n",
+    "It is extremely critical that there is a corresponding category output for each business name provided as an input.\n",
+    "\n",
+    "If a business does not clearly fall into Restaurant, Bar, or Hotel categories, you should classify it as \"Other\".\n",
+    "\n",
+    "Even if the type of business is not immediately clear from the name, it is essential that you provide your best guess based on the information available to you. If you can't make a good guess, classify it as Other.\n",
+    "\n",
+    "For example, if given the following input:\n",
+    "\n",
+    "\"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\"\n",
+    "\n",
+    "Your output should be a JSON list in the following format:\n",
+    "\n",
+    "[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]\n",
+    "\n",
+    "This means that you have classified \"Intercontinental Hotel\" as a Hotel, \"Pizza Hut\" as a Restaurant, \"Cheers\" as a Bar, \"Welsh's Family Restaurant\" as a Restaurant, and both \"KTLA\" and \"Direct Mailing\" as Other.\n",
+    "\n",
+    "Ensure that the number of classifications in your output matches the number of business names in the input. It is very important that the length of JSON list you return is exactly the same as the number of business names you receive.\n",
+    "\"\"\"\n",
+    "    response = client.chat.completions.create(\n",
+    "        messages=[\n",
+    "            ### <-- NEW \n",
+    "            {\n",
+    "                \"role\": \"system\",\n",
+    "                \"content\": prompt,\n",
+    "            },\n",
+    "            *fewshot_list,\n",
+    "            {\n",
+    "                \"role\": \"user\",\n",
+    "                \"content\": \"\\n\".join(name_list),\n",
+    "            }\n",
+    "            ### -->\n",
+    "        ],\n",
+    "        model=\"meta-llama/Llama-3.3-70B-Instruct\",\n",
+    "        temperature=0,\n",
+    "    )\n",
+    "\n",
+    "    answer_str = response.choices[0].message.content\n",
+    "    answer_list = json.loads(answer_str)\n",
+    "\n",
+    "    acceptable_answers = [\n",
+    "        \"Restaurant\",\n",
+    "        \"Bar\",\n",
+    "        \"Hotel\",\n",
+    "        \"Other\",\n",
+    "    ]\n",
+    "    for answer in answer_list:\n",
+    "        if answer not in acceptable_answers:\n",
+    "            raise ValueError(f\"{answer} not in list of acceptable answers\")\n",
+    "\n",
+    "    try:\n",
+    "        assert len(name_list) == len(answer_list)\n",
+    "    except:\n",
+    "        raise ValueError(f\"Number of outputs ({len(name_list)}) does not equal the number of inputs ({len(answer_list)})\")\n",
+    "\n",
+    "    return dict(zip(name_list, answer_list))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Calling our previous `classify_batches`function again:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_batches(name_list, batch_size=10, wait=2):\n",
+    "    # Store the results\n",
+    "    all_results = {}\n",
+    "\n",
+    "    # Batch up the list\n",
+    "    batch_list = get_batch_list(name_list, n=batch_size)\n",
+    "\n",
+    "    # Loop through the list in batches\n",
+    "    for batch in track(batch_list):\n",
+    "        # Classify it\n",
+    "        batch_results = classify_payees(batch)\n",
+    "\n",
+    "        # Add it to the results\n",
+    "        all_results.update(batch_results)\n",
+    "\n",
+    "        # Tap the brakes\n",
+    "        time.sleep(wait)\n",
+    "\n",
+    "    # Return the results\n",
+    "    return pd.DataFrame(\n",
+    "        all_results.items(),\n",
+    "        columns=[\"payee\", \"category\"]\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And all you need to do is run it again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "39e9e883ab8042049e00c2ae87a089c1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "llm_df = classify_batches(list(test_input.payee))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And see if your results are any better"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "              precision    recall  f1-score   support\n",
+      "\n",
+      "         Bar       1.00      1.00      1.00         2\n",
+      "       Hotel       1.00      1.00      1.00         9\n",
+      "       Other       1.00      0.98      0.99        57\n",
+      "  Restaurant       0.94      1.00      0.97        15\n",
+      "\n",
+      "    accuracy                           0.99        83\n",
+      "   macro avg       0.98      1.00      0.99        83\n",
+      "weighted avg       0.99      0.99      0.99        83\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(classification_report(\n",
+    "    test_output,\n",
+    "    llm_df.category,\n",
+    "))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Another common tactic is to examine the misclassifications and tweak your prompt to address any patterns they reveal.\n",
+    "\n",
+    "One simple way to do this is to merge the LLM's predictions with the human-labeled data and filter for discrepancies."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "comparison_df = llm_df.merge(\n",
+    "    sample_df,\n",
+    "    on=\"payee\",\n",
+    "    how=\"inner\",\n",
+    "    suffixes=[\"_llm\", \"_human\"]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And filter to cases where the LLM and human labels don't match."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>payee</th>\n",
+       "      <th>category_llm</th>\n",
+       "      <th>category_human</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>SOTTOVOCE MADERO</td>\n",
+       "      <td>Restaurant</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "               payee category_llm category_human\n",
+       "16  SOTTOVOCE MADERO   Restaurant          Other"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "comparison_df[comparison_df.category_llm != comparison_df.category_human]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Looking at the misclassifications, you might notice that the LLM is struggling with a particular type of business name. You can then adjust your prompt to address that specific issue."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>payee</th>\n",
+       "      <th>category_llm</th>\n",
+       "      <th>category_human</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>MIDTOWN FRAMING</td>\n",
+       "      <td>Other</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>ALBERGO HILTON ROME AIRPO FIUMICINO</td>\n",
+       "      <td>Hotel</td>\n",
+       "      <td>Hotel</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>ISTOCK PHOTOS</td>\n",
+       "      <td>Other</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>DORIAN B. GARCIA</td>\n",
+       "      <td>Other</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>KEELER ADVERTISING</td>\n",
+       "      <td>Other</td>\n",
+       "      <td>Other</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                 payee category_llm category_human\n",
+       "0                      MIDTOWN FRAMING        Other          Other\n",
+       "1  ALBERGO HILTON ROME AIRPO FIUMICINO        Hotel          Hotel\n",
+       "2                        ISTOCK PHOTOS        Other          Other\n",
+       "3                     DORIAN B. GARCIA        Other          Other\n",
+       "4                   KEELER ADVERTISING        Other          Other"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "comparison_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this case, I observed that the LLM was struggling with businesses that had both the word bar and the word restaurant in their name. A simple fix would be to add a new line to your prompt that instructs the LLM what to do in that case:\n",
+    "\n",
+    "`If a business name contains both the word \"Restaurant\" and the word \"Bar\", you should classify it as a Restaurant.`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = \"\"\"You are an AI model trained to categorize businesses based on their names.\n",
+    "\n",
+    "You will be given a list of business names, each separated by a new line.\n",
+    "\n",
+    "Your task is to analyze each name and classify it into one of the following categories: Restaurant, Bar, Hotel, or Other.\n",
+    "\n",
+    "It is extremely critical that there is a corresponding category output for each business name provided as an input.\n",
+    "\n",
+    "If a business does not clearly fall into Restaurant, Bar, or Hotel categories, you should classify it as \"Other\".\n",
+    "\n",
+    "Even if the type of business is not immediately clear from the name, it is essential that you provide your best guess based on the information available to you. If you can't make a good guess, classify it as Other.\n",
+    "\n",
+    "For example, if given the following input:\n",
+    "\n",
+    "\"Intercontinental Hotel\\nPizza Hut\\nCheers\\nWelsh's Family Restaurant\\nKTLA\\nDirect Mailing\"\n",
+    "\n",
+    "Your output should be a JSON list in the following format:\n",
+    "\n",
+    "[\"Hotel\", \"Restaurant\", \"Bar\", \"Restaurant\", \"Other\", \"Other\"]\n",
+    "\n",
+    "This means that you have classified \"Intercontinental Hotel\" as a Hotel, \"Pizza Hut\" as a Restaurant, \"Cheers\" as a Bar, \"Welsh's Family Restaurant\" as a Restaurant, and both \"KTLA\" and \"Direct Mailing\" as Other.\n",
+    "\n",
+    "If a business name contains both the word \"Restaurant\" and the word \"Bar\", you should classify it as a Restaurant.\n",
+    "\n",
+    "Ensure that the number of classifications in your output matches the number of business names in the input. It is very important that the length of JSON list you return is exactly the same as the number of business names you receive.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Repeating this disciplined, scientific process of prompt refinement, testing and review can, after a few careful cycles, gradually improve your prompt to return even better results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%pip install gradio jupyter-server-proxy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div><iframe src=\"http://localhost:7873/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": []
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import gradio as gr\n",
+    "import json\n",
+    "\n",
+    "# -- Gradio interface function --\n",
+    "def classify_business_names(input_text):\n",
+    "    name_list = [line.strip() for line in input_text.splitlines() if line.strip()]\n",
+    "    try:\n",
+    "        result = classify_payees(name_list)\n",
+    "        return json.dumps(result, indent=2)\n",
+    "    except Exception as e:\n",
+    "        return f\"Error: {e}\"\n",
+    "\n",
+    "# -- Launch the demo --\n",
+    "demo = gr.Interface(\n",
+    "    fn=classify_business_names,\n",
+    "    inputs=gr.Textbox(lines=10, placeholder=\"Enter business names, one per line\"),\n",
+    "    outputs=\"json\",\n",
+    "    title=\"Business Category Classifier\",\n",
+    "    description=\"Enter business names and get a classification: Restaurant, Bar, Hotel, or Other.\"\n",
+    ")\n",
+    "\n",
+    "demo.launch(server_name=\"0.0.0.0\", server_port=7873, root_path=\"/proxy/7873/\", quiet=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**[10. Sharing your classifier →](ch10-sharing-with-gradio.ipynb)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}