HFdb
Community Article
Published
June 17, 2025
HFdb is a lightweight Python module that turns a private Hugging Face Dataset repo into a version-controlled CSV database. Ideal for small, structured data (e.g. API keys, user credits, feature flags) where you want cloud persistence, seamless versioning, and zero-ops hosting.
π¦ Features
- β Create private dataset repos programmatically
- β
CRUD helpers (
add_row
,delete_row
,replace_element
,replace_row
) - β
Pandas-powered interface (
get_df
returns a DataFrame) - β Auto-syncβlocal CSV β Hub on every change
- β Works anywhere Python runs: scripts, Spaces, notebooks, Lambdas
π₯ Download
Option 1) Download ZIP at HFdb.
Option 2) Clone with Git:
git clone https://github.com/TonyAssi/HFdb.git
cd HFdb
π Installation
pip install -r requirements.txt
β‘οΈ Quick-Start
import HFdb
TOKEN = "hf_your_access_token"
# 1) Bootstrap a fresh, private dataset repo with blank CSV
db = HFdb.create(
"username/my_db", # repo_id
["key", "email", "credits"], # CSV columns
TOKEN
)
# 2) Later⦠reopen the same DB (auto-downloads latest CSV)
db = HFdb.db("username/my_db", TOKEN)
# ββ CRUD in one line ββββββββββββββββββββββββββββββββββββββββββββ
db.add_row({"key": 1, "email": "[email protected]", "credits": 5})
row = db.get_row("key", 1)
exists = db.row_exists("key", 1)
columns = db.get_columns()
df = db.get_df() # full DataFrame
db.replace_element("key", 1, "credits", 0)
db.replace_row("key", 1,
{"key": 1, "email": "[email protected]", "credits": 0})
db.delete_row("key", 1)
π§© API Reference
Method | Purpose |
---|---|
HFdb.create(repo_id, columns, token) |
Create private dataset repo (if absent) and upload an empty db.csv . Returns HFdbClient . |
HFdb.db(repo_id, token) |
Connect to an existing repo and pull the latest CSV. |
HFdbClient methods |
|
add_row(dict) |
Append a dict of values (keys must match CSV columns). |
get_row(column, value) |
Return first matching row as a dict or None . |
get_df() |
Load the entire CSV into a pandas.DataFrame . |
delete_row(column, value) |
Remove all rows where column == value . |
replace_element(find_col, find_val, update_col, new_val) |
In-place cell update. |
replace_row(find_col, find_val, new_row) |
Delete old row, insert replacement dict. |
row_exists(column, value) |
True if at least one matching row. |
get_columns() |
List of column names. |
HFdb uses two huggingface_hub
helpers under the hood:
hf_hub_download
β grabs the currentdb.csv
upload_file
β pushes your updated CSV (creating a new commit)
βοΈ How It Works
- CSV First β Your data lives in
db.csv
at the repo root. - Atomic Writes β Each mutating call loads the CSV β edits with Pandas β pushes a fresh file.
- History for Free β Every push is a git-backed commit, viewable in the Hub UI.
- Auth β Scoped user access tokens keep things private.
β οΈ Concurrency: HFdb suits low-traffic, single-writer workloads. Two clients saving simultaneously can raceβconsider queueing writes or adding file locks if that matters.
π Limits & Best Practices
Good For | Not Great For |
---|---|
Feature flags, user credits, API keys | > ~50 MB CSV or 10 k rows |
Scheduled jobs / cron scripts | Real-time, high-QPS APIs |
Single-user or single-writer applications | Multi-writer concurrency at scale |
πΊοΈ Roadmap
- π Optional file locking
- π° Native Parquet & Arrow support
- πΎ Differential (row-level) commits instead of full CSV uploads
- π οΈ CLI wrapper (
hfdb add-row β¦
) - π‘ Got feature requests? Open an issue!
π€ Contributing
git clone https://github.com/tonyassi/HFdb.git
poetry install # or: pip install -r dev-requirements.txt
pre-commit install
- Create a feature branch:
git checkout -b feat/my-idea
- Write tests in
tests/
(pytest
powered) - Run
black
,ruff
,mypy
- Open a PR β please describe why the change helps π«Ά
About Me
Hello, my name is Tony Assi. I'm a designer and maker based in Los Angeles. I have a background in software, fashion, and marketing. I currently work for an e-commerce fashion brand. Check out my π€ profile for more apps, models and datasets.
Feel free to send me an email at [email protected] with any questions, comments, business inquiries or job offers.