{ "cells": [ { "cell_type": "markdown", "id": "bf2cde26", "metadata": {}, "source": [ "# First LLM Classifier\n", "\n", "Learn how journalists use large-language models to organize and analyze massive datasets\n", "\n", "## What you will learn\n", "\n", "This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.\n", "\n", "It will teach you how to:\n", "\n", "- Submit large-language model prompts with the Python programming language\n", "- Write structured prompts that can classify text into predefined categories\n", "- Submit dozens of prompts at once as part of an automated routine\n", "- Evaluate results using a rigorous, scientific approach\n", "- Improve results by training the model with rules and examples\n", "\n", "By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.\n", "\n", "## Who can take it\n", "\n", "This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.\n", "\n", "💬 Need help or want to connect with others? Join the **Journalists on Hugging Face** community by signing up for our Slack group [here](https://forms.gle/JMCULh3jEdgFEsJu5).\n", "\n", "## Table of contents\n", "\n", "- [1. What we’ll do](ch1-what-we-will-do.ipynb) \n", "- [2. The LLM advantage](ch2-the-LLM-advantage.ipynb) \n", "- [3. Getting started with Hugging Face](ch3-getting-started-with-hf.ipynb) \n", "- [4. Installing JupyterLab (optional)](ch4-installing-jupyterlab.ipynb) \n", "- [5. Prompting with Python](ch5-prompting-with-python.ipynb) \n", "- [6. Structured responses](ch6-structured-responses.ipynb) \n", "- [7. Bulk prompts](ch7-bulk-prompts.ipynb) \n", "- [8. Evaluating prompts](ch8-evaluating-prompts.ipynb) \n", "- [9. Improving prompts](ch9-improving-prompts.ipynb) \n", "- [10. Sharing your app with Gradio](ch10-sharing-with-gradio.ipynb)\n", "\n", "## About this class\n", "[Ben Welsh](https://palewi.re/who-is-ben-welsh/) and [Derek Willis](https://thescoop.org/about/) prepared this guide for [a training session](https://schedules.ire.org/nicar-2025/index.html#2045) at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis. \n", "The project was adapted to run on Hugging Face by [Florent Daudens](https://www.linkedin.com/in/fdaudens/). \n", "\n", "Some of the copy was written with the assistance of GitHub’s Copilot, an AI-powered text generator. The materials are available as free and open source.\n", "\n", "**[1. What we’ll do →](ch1-what-we-will-do.ipynb)**" ] }, { "cell_type": "code", "execution_count": null, "id": "02477b14-edff-4380-ad41-9954b6c80863", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }