HFdb

Community Article Published June 17, 2025

HFdb is a lightweight Python module that turns a private Hugging Face Dataset repo into a version-controlled CSV database. Ideal for small, structured data (e.g. API keys, user credits, feature flags) where you want cloud persistence, seamless versioning, and zero-ops hosting.

HFdb banner


πŸ“¦ Features

  • βœ… Create private dataset repos programmatically
  • βœ… CRUD helpers (add_row, delete_row, replace_element, replace_row)
  • βœ… Pandas-powered interface (get_df returns a DataFrame)
  • βœ… Auto-syncβ€”local CSV β†’ Hub on every change
  • βœ… Works anywhere Python runs: scripts, Spaces, notebooks, Lambdas

πŸ“₯ Download

Option 1) Download ZIP at HFdb.

Option 2) Clone with Git:

git clone https://github.com/TonyAssi/HFdb.git
cd HFdb

πŸš€ Installation

pip install -r requirements.txt

⚑️ Quick-Start

import HFdb

TOKEN = "hf_your_access_token"

# 1) Bootstrap a fresh, private dataset repo with blank CSV
db = HFdb.create(
    "username/my_db",               # repo_id
    ["key", "email", "credits"],    # CSV columns
    TOKEN
)

# 2) Later… reopen the same DB (auto-downloads latest CSV)
db = HFdb.db("username/my_db", TOKEN)

# ── CRUD in one line ────────────────────────────────────────────
db.add_row({"key": 1, "email": "[email protected]", "credits": 5})

row      = db.get_row("key", 1)
exists   = db.row_exists("key", 1)
columns  = db.get_columns()
df       = db.get_df()                 # full DataFrame

db.replace_element("key", 1, "credits", 0)
db.replace_row("key", 1,
               {"key": 1, "email": "[email protected]", "credits": 0})
db.delete_row("key", 1)

🧩 API Reference

Method Purpose
HFdb.create(repo_id, columns, token) Create private dataset repo (if absent) and upload an empty db.csv. Returns HFdbClient.
HFdb.db(repo_id, token) Connect to an existing repo and pull the latest CSV.
HFdbClient methods
add_row(dict) Append a dict of values (keys must match CSV columns).
get_row(column, value) Return first matching row as a dict or None.
get_df() Load the entire CSV into a pandas.DataFrame.
delete_row(column, value) Remove all rows where column == value.
replace_element(find_col, find_val, update_col, new_val) In-place cell update.
replace_row(find_col, find_val, new_row) Delete old row, insert replacement dict.
row_exists(column, value) True if at least one matching row.
get_columns() List of column names.

HFdb uses two huggingface_hub helpers under the hood:

  • hf_hub_download β†’ grabs the current db.csv
  • upload_file β†’ pushes your updated CSV (creating a new commit)

βš™οΈ How It Works

  1. CSV First – Your data lives in db.csv at the repo root.
  2. Atomic Writes – Each mutating call loads the CSV β†’ edits with Pandas β†’ pushes a fresh file.
  3. History for Free – Every push is a git-backed commit, viewable in the Hub UI.
  4. Auth – Scoped user access tokens keep things private.

⚠️ Concurrency: HFdb suits low-traffic, single-writer workloads. Two clients saving simultaneously can raceβ€”consider queueing writes or adding file locks if that matters.


πŸ“ Limits & Best Practices

Good For Not Great For
Feature flags, user credits, API keys > ~50 MB CSV or 10 k rows
Scheduled jobs / cron scripts Real-time, high-QPS APIs
Single-user or single-writer applications Multi-writer concurrency at scale

πŸ—ΊοΈ Roadmap

  • πŸ”’ Optional file locking
  • 🍰 Native Parquet & Arrow support
  • πŸ’Ύ Differential (row-level) commits instead of full CSV uploads
  • πŸ› οΈ CLI wrapper (hfdb add-row …)
  • πŸ’‘ Got feature requests? Open an issue!

🀝 Contributing

git clone https://github.com/tonyassi/HFdb.git
poetry install      # or: pip install -r dev-requirements.txt
pre-commit install
  1. Create a feature branch: git checkout -b feat/my-idea
  2. Write tests in tests/ (pytest powered)
  3. Run black, ruff, mypy
  4. Open a PR – please describe why the change helps 🫢

About Me

Hello, my name is Tony Assi. I'm a designer and maker based in Los Angeles. I have a background in software, fashion, and marketing. I currently work for an e-commerce fashion brand. Check out my πŸ€— profile for more apps, models and datasets.

Feel free to send me an email at [email protected] with any questions, comments, business inquiries or job offers.

Community

Sign up or log in to comment