Querying Hugging Face Datasets with the DuckDB UI
Hugging Face hosts a whopping 384k+ datasets that range from a few thousand rows to 100s of million. While the browser-based Data Studio (powered by DuckDB WASM) is powerful, exploring very large datasets or running complex queries can sometimes be limited by browser constraints.
This is where the new DuckDB Local UI comes into play! Starting in DuckDB v1.2.1, Motherduck and DuckDB Labs collaborated to bring a local UI to the DuckDB CLI.
This is particularly powerful because it leverages your local machine's resources (CPU, RAM), bypassing browser limitations for significantly faster and more complex queries on any Hugging Face Dataset.
Why use the DuckDB UI?
- Leverages your machines full CPU cores and available RAM
- Significantly faster for multi-million row datasets
- Fully featured UI
- Column Explorer
- Schema Viewers
- Table Summaries
- Notebook-like cells
My favorite feature is the the Column Explorer.

Image from DuckDB.org
Getting Started
To launch the UI, simply open your terminal and run:
duckdb --ui
If you haven't installed DuckDB before it's as easy as:
curl https://install.duckdb.org | sh
or via Homebrew
brew install duckdb
This will start DuckDB automatically with an in-memory database.
Connecting to over 384k+ Hugging Face Datasets
DuckDB provides a seamless integration with Hugging Face datasets. Here are the two primary ways to connect:
Method 1: Using with hf:// protocol
DuckDB's httpfs extension understands the hf:// protocol, allowing you to query datasets directly. For optimal performance, use the @~parquet suffix in the path. This tells DuckDB to access the efficient Parquet file conversions of the dataset hosted by Hugging Face. Here is a helpful guide to understand how the hf://
protocol works.

As an example, here's what the SQL would look like if we wanted to query the glaiveai/reasoning-v1-20m dataset.
select * from 'hf://datasets/glaiveai/reasoning-v1-20m@~parquet/default/train/*.parquet' limit 500
Method 2: Using the "Copy for DuckDB CLI" button in Data Studio (faster)
The Hugging Face Data Studio provides a handy shortcut:
- Navigate to a dataset on the Hugging Face Hub (e.g., facebook/natural_reasoning)
- Open the Data Studio
- Run any initial query (or just the default LIMIT 10 query)
- Click the "Copy for DuckDB CLI" button
This will copy SQL code to your clipboard that first creates a convenient view for the dataset split and includes your query.
Watch how it works:
The SQL looks like this:
CREATE VIEW train AS (SELECT * FROM read_parquet('hf://datasets/facebook/natural_reasoning@~parquet/default/train/*.parquet'));
-- The SQL console is powered by DuckDB WASM and runs entirely in the browser.
-- Get started by typing a query or selecting a view from the options below.
SELECT * FROM train LIMIT 10
If we copy this into the DuckDB UI and run it. We get something that looks like this!
In summary, the DuckDB Local UI provides a fast, powerful, and feature-rich way to explore Hugging Face datasets directly on your machine. Give it a try!
Have you used DuckDB with HF Datasets? Let us know your experience!