{ "cells": [ { "cell_type": "markdown", "id": "84dd193e", "metadata": {}, "source": [ "## 5. Prompting with Python" ] }, { "cell_type": "markdown", "id": "862b8218", "metadata": {}, "source": [ "First, we’ll install the libraries we need. The `huggingface_hub` package is the official client for Hugging Face’s API. The `rich` and `ipywidgets` packages are helper libraries that will improve how your outputs look in Jupyter notebooks." ] }, { "cell_type": "markdown", "id": "c09f0b15", "metadata": {}, "source": [ "A common way to install packages from inside your JupyterLab Desktop notebook is to use the `%pip command`. Hit the play button in the top toolbar after selecting the cell below." ] }, { "cell_type": "code", "execution_count": null, "id": "e728c5fe", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%pip install rich ipywidgets huggingface_hub" ] }, { "cell_type": "markdown", "id": "79f96b1a", "metadata": {}, "source": [ "If the `%pip command` doesn’t work on your computer, try substituting the `!pip command` instead. Or you can install the packages from the command line on your computer and restart your notebook." ] }, { "cell_type": "markdown", "id": "75f1366a", "metadata": {}, "source": [ "Now let's import them in the cell that appears below the installation output. Hit play again." ] }, { "cell_type": "code", "execution_count": 2, "id": "8013a72c-670e-48ab-8619-99a337fd5392", "metadata": {}, "outputs": [], "source": [ "import os\n", "from rich import print\n", "from huggingface_hub import InferenceClient" ] }, { "cell_type": "markdown", "id": "fdd7b199", "metadata": {}, "source": [ "With `api_key = os.getenv(\"HF_TOKEN\")`, we're calling the free Hugging Face Inference API, using an authentication token stored in the \"Secrets\" of this Space. If you'd like to duplicate this Space, you'll need to create a token with your account [here](https://huggingface.co/settings/tokens).\n", "\n", "You should continue adding new cells as you need throughout the rest of the class." ] }, { "cell_type": "code", "execution_count": 3, "id": "a5ec5ea4-5bd1-4ba7-b4cb-3f0a98505f29", "metadata": {}, "outputs": [], "source": [ "api_key = os.getenv(\"HF_TOKEN\")" ] }, { "cell_type": "markdown", "id": "b9e81187", "metadata": {}, "source": [ "Let’s make our first prompt. To do that, we submit a dictionary to Hugging Face’s `chat.completions.create` method. The dictionary has a `messages` key that contains a list of dictionaries. Each dictionary in the list represents a message in the conversation. When the role is \"user\" it is roughly the same as asking a question to a chatbot." ] }, { "cell_type": "markdown", "id": "e83c5390", "metadata": {}, "source": [ "We also need to pick a model from among the choices Hugging Face gives us. We’re picking Llama 3.3, the latest from Meta." ] }, { "cell_type": "code", "execution_count": 4, "id": "54a5befd-0c64-4039-9b26-14733c9f007e", "metadata": {}, "outputs": [], "source": [ "client = InferenceClient(\n", " model=\"meta-llama/Llama-3.3-70B-Instruct\",\n", " token=api_key,\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "id": "38abe6e0", "metadata": {}, "outputs": [], "source": [ "response = client.chat.completions.create(\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Explain the importance of data journalism in a concise sentence\"\n", " }\n", " ],\n", ")" ] }, { "cell_type": "markdown", "id": "3156058c", "metadata": {}, "source": [ "Our client saves the response as a variable. Print that Python object to see what it contains." ] }, { "cell_type": "code", "execution_count": 6, "id": "49bf29f5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, \n",
       "message=ChatCompletionOutputMessage(role='assistant', content='Data journalism plays a crucial role in holding \n",
       "institutions accountable and informing the public by analyzing and interpreting complex data to uncover trends, \n",
       "patterns, and insights that can lead to more informed decision-making and a deeper understanding of social \n",
       "issues.', tool_calls=None), logprobs=None)], created=1742869712, id='', model='meta-llama/Llama-3.3-70B-Instruct', \n",
       "system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=45, prompt_tokens=46, \n",
       "total_tokens=91), object='chat.completion')\n",
       "
\n" ], "text/plain": [ "\u001b[1;35mChatCompletionOutput\u001b[0m\u001b[1m(\u001b[0m\u001b[33mchoices\u001b[0m=\u001b[1m[\u001b[0m\u001b[1;35mChatCompletionOutputComplete\u001b[0m\u001b[1m(\u001b[0m\u001b[33mfinish_reason\u001b[0m=\u001b[32m'stop'\u001b[0m, \u001b[33mindex\u001b[0m=\u001b[1;36m0\u001b[0m, \n", "\u001b[33mmessage\u001b[0m=\u001b[1;35mChatCompletionOutputMessage\u001b[0m\u001b[1m(\u001b[0m\u001b[33mrole\u001b[0m=\u001b[32m'assistant'\u001b[0m, \u001b[33mcontent\u001b[0m=\u001b[32m'Data journalism plays a crucial role in holding \u001b[0m\n", "\u001b[32minstitutions accountable and informing the public by analyzing and interpreting complex data to uncover trends, \u001b[0m\n", "\u001b[32mpatterns, and insights that can lead to more informed decision-making and a deeper understanding of social \u001b[0m\n", "\u001b[32missues.'\u001b[0m, \u001b[33mtool_calls\u001b[0m=\u001b[3;35mNone\u001b[0m\u001b[1m)\u001b[0m, \u001b[33mlogprobs\u001b[0m=\u001b[3;35mNone\u001b[0m\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m, \u001b[33mcreated\u001b[0m=\u001b[1;36m1742869712\u001b[0m, \u001b[33mid\u001b[0m=\u001b[32m''\u001b[0m, \u001b[33mmodel\u001b[0m=\u001b[32m'meta-llama/Llama-3.3-70B-Instruct'\u001b[0m, \n", "\u001b[33msystem_fingerprint\u001b[0m=\u001b[32m'3.0.1-sha-bb9095a'\u001b[0m, \u001b[33musage\u001b[0m=\u001b[1;35mChatCompletionOutputUsage\u001b[0m\u001b[1m(\u001b[0m\u001b[33mcompletion_tokens\u001b[0m=\u001b[1;36m45\u001b[0m, \u001b[33mprompt_tokens\u001b[0m=\u001b[1;36m46\u001b[0m, \n", "\u001b[33mtotal_tokens\u001b[0m=\u001b[1;36m91\u001b[0m\u001b[1m)\u001b[0m, \u001b[33mobject\u001b[0m=\u001b[32m'chat.completion'\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(response)" ] }, { "cell_type": "markdown", "id": "d9f86d8e", "metadata": {}, "source": [ "You should see something like:" ] }, { "cell_type": "markdown", "id": "cdf08e93-a6cc-4245-9fa3-2cc456208bcf", "metadata": {}, "source": [ "```\n", "ChatCompletionOutput(\n", " choices=[\n", " ChatCompletionOutputComplete(\n", " finish_reason='stop',\n", " index=0,\n", " message=ChatCompletionOutputMessage(\n", " role='assistant',\n", " content='Data journalism plays a crucial role in holding those in power accountable by using data analysis and visualization to uncover insights, trends, and patterns that inform and engage the public on important issues.',\n", " tool_calls=None\n", " ),\n", " logprobs=None\n", " )\n", " ],\n", " created=1742592105,\n", " id='',\n", " model='meta-llama/Llama-3.3-70B-Instruct',\n", " system_fingerprint='3.2.1-native',\n", " usage=ChatCompletionOutputUsage(\n", " completion_tokens=37,\n", " prompt_tokens=46,\n", " total_tokens=83\n", " ),\n", " object='chat.completion'\n", ")\n", "```" ] }, { "cell_type": "markdown", "id": "ff414bab", "metadata": {}, "source": [ "There’s a lot here, but the `message` has the actual response from the LLM. Let’s just print the content from that message. Note that your response probably varies from this guide. That’s because LLMs mostly are probablistic prediction machines. Every response can be a little different." ] }, { "cell_type": "code", "execution_count": 7, "id": "0f291693", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Data journalism plays a crucial role in holding institutions accountable and informing the public by analyzing and \n",
       "interpreting complex data to uncover trends, patterns, and insights that can lead to more informed decision-making \n",
       "and a deeper understanding of social issues.\n",
       "
\n" ], "text/plain": [ "Data journalism plays a crucial role in holding institutions accountable and informing the public by analyzing and \n", "interpreting complex data to uncover trends, patterns, and insights that can lead to more informed decision-making \n", "and a deeper understanding of social issues.\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "id": "0c1e292a", "metadata": {}, "source": [ "Let’s pick a different model from amongthe choices that Hugging Face offers. One we could try is Gemma2, an open model from Google. Rather than add a new cell, lets revise the code we already have and rerun it." ] }, { "cell_type": "code", "execution_count": 8, "id": "9fb3f9b8", "metadata": {}, "outputs": [], "source": [ "response = client.chat.completions.create(\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Explain the importance of data journalism in a concise sentence\",\n", " }\n", " ],\n", " model=\"google/gemma-2-9b-it\", # NEW\n", ")" ] }, { "cell_type": "markdown", "id": "aa6662ed", "metadata": {}, "source": [ "Again, your response might vary from what’s here. Let’s find out." ] }, { "cell_type": "code", "execution_count": 9, "id": "0f036c66", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Data journalism plays a crucial role in holding those in power accountable by uncovering hidden trends, patterns, \n",
       "and insights through the analysis and visualization of data, enabling informed decision-making and promoting \n",
       "transparency.\n",
       "
\n" ], "text/plain": [ "Data journalism plays a crucial role in holding those in power accountable by uncovering hidden trends, patterns, \n", "and insights through the analysis and visualization of data, enabling informed decision-making and promoting \n", "transparency.\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "id": "12802cac", "metadata": {}, "source": [ "---\n", "### Sidenote:\n", "Hugging Face’s Python library is very similar to the ones offered by OpenAI, Anthropic and other LLM providers. If you prefer to use those tools, the techniques you learn here should be easily transferable." ] }, { "cell_type": "markdown", "id": "e7668bd3", "metadata": {}, "source": [ "For instance, here’s how you’d make this same call with Anthropic’s Python library:" ] }, { "cell_type": "code", "execution_count": null, "id": "11e567e3", "metadata": {}, "outputs": [], "source": [ "from anthropic import Anthropic\n", "\n", "client = Anthropic(api_key=api_key)\n", "\n", "response = client.messages.create(\n", " messages=[\n", " {\"role\": \"user\", \"content\": \"Explain the importance of data journalism in a concise sentence\"},\n", " ],\n", " model=\"claude-3-5-sonnet-20240620\",\n", ")\n", "\n", "print(response.content[0].text)" ] }, { "cell_type": "markdown", "id": "182e8e6b", "metadata": {}, "source": [ "---\n", "A well-structured prompt helps the LLM provide more accurate and useful responses." ] }, { "cell_type": "markdown", "id": "da211bea", "metadata": {}, "source": [ "One common technique for improving results is to open with a “system” prompt to establish the model’s tone and role. Let’s switch back to Llama 3.3 and provide a `system` message that provides a specific motivation for the LLM’s responses." ] }, { "cell_type": "code", "execution_count": 11, "id": "a5660ed5", "metadata": {}, "outputs": [], "source": [ "response = client.chat.completions.create(\n", " messages=[\n", " ### <-- NEW\n", " {\n", " \"role\": \"system\",\n", " \"content\": \"you are an enthusiastic nerd who believes data journalism is the future.\"\n", " },\n", " ### -->\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Explain the importance of data journalism in a concise sentence\",\n", " }\n", " ],\n", " model=\"meta-llama/Llama-3.3-70B-Instruct\", # NEW\n", ")" ] }, { "cell_type": "markdown", "id": "598dd139", "metadata": {}, "source": [ "Check out the results." ] }, { "cell_type": "code", "execution_count": 12, "id": "a5c74d22", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Data journalism is revolutionizing the way we consume news by using statistical analysis and visualizations to \n",
       "uncover hidden truths, provide fact-based evidence, and hold those in power accountable, thereby fostering a more \n",
       "informed and engaged citizenry!\n",
       "
\n" ], "text/plain": [ "Data journalism is revolutionizing the way we consume news by using statistical analysis and visualizations to \n", "uncover hidden truths, provide fact-based evidence, and hold those in power accountable, thereby fostering a more \n", "informed and engaged citizenry!\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "id": "304bbabc", "metadata": {}, "source": [ "Want to see how tone affects the response? Change the system prompt to something old-school." ] }, { "cell_type": "code", "execution_count": 13, "id": "3123cd0b", "metadata": {}, "outputs": [], "source": [ "response = client.chat.completions.create(\n", " messages=[\n", " {\n", " \"role\": \"system\",\n", " \"content\": \"you are a crusty, ill-tempered editor who hates math and thinks data journalism is a waste of time and resources.\" # NEW\n", " },\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Explain the importance of data journalism in a concise sentence\",\n", " }\n", " ],\n", " model=\"meta-llama/Llama-3.3-70B-Instruct\",\n", ")" ] }, { "cell_type": "markdown", "id": "b90dd487", "metadata": {}, "source": [ "Then re-run the code and summon J. Jonah Jameson." ] }, { "cell_type": "code", "execution_count": 14, "id": "3defdc9c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
If I must, data journalism is supposedly important because it allows reporters to use numbers and statistics to \n",
       "fact-check claims, identify trends, and hold those in power accountable, but frankly, I've seen better storytelling\n",
       "in a spreadsheet.\n",
       "
\n" ], "text/plain": [ "If I must, data journalism is supposedly important because it allows reporters to use numbers and statistics to \n", "fact-check claims, identify trends, and hold those in power accountable, but frankly, I've seen better storytelling\n", "in a spreadsheet.\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(response.choices[0].message.content)" ] }, { "cell_type": "markdown", "id": "957517fa-e69b-42bb-88b5-3a71b680972f", "metadata": {}, "source": [ "**[6. Structured responses →](ch6-structured-responses.ipynb)**" ] }, { "cell_type": "code", "execution_count": null, "id": "0e45220f-a77c-4463-8211-96dd79b09840", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }