Datasets Topics

community

AI & ML interests

None defined yet.

Recent Activity

asoriaΒ  updated a Space about 1 month ago
datasets-topics/neuralwork-arxiver
asoriaΒ  updated a Space about 1 month ago
datasets-topics/nvidia-HelpSteer2
View all activity

datasets-topics's activity

asoriaΒ 
updated a Space about 2 months ago
asoriaΒ 
posted an update about 2 months ago
view post
Post
1802
πŸš€ Exploring Topic Modeling with BERTopic πŸ€–

When you come across an interesting dataset, you often wonder:
Which topics frequently appear in these documents? πŸ€”
What is this data really about? πŸ“Š

Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.

I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. πŸ”—

πŸ” How do we make this work?
Here’s the stack we’re using:

πŸ“‚ Data Source ➑️ Hugging Face datasets with DuckDB for retrieval
🧠 Text Embeddings ➑️ Sentence Transformers (all-MiniLM-L6-v2)
⚑ Dimensionality Reduction ➑️ RAPIDS cuML UMAP for GPU-accelerated performance
πŸ” Clustering ➑️ RAPIDS cuML HDBSCAN for fast clustering
βœ‚οΈ Tokenization ➑️ CountVectorizer
πŸ”§ Representation Tuning ➑️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct
🌍 Visualization ➑️ Datamapplot library
Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator

Powered by @MaartenGr - BERTopic
asoriaΒ 
posted an update 3 months ago
view post
Post
2459
πŸ“ I wrote a tutorial on how to get started with the fine-tuning process using Hugging Face tools, providing an end-to-end workflow.

The tutorial covers creating a new dataset using the new SQL Console πŸ›’ and fine-tuning a model with SFT, guided by the Notebook Creator App πŸ“™.

πŸ‘‰ You can read the full article here:
https://huggingface.co/blog/asoria/easy-fine-tuning-with-hf
asoria/auto-notebook-creator
asoriaΒ 
posted an update 3 months ago
view post
Post
961
πŸš€ Excited to share the latest update to the Notebook Creator Tool!

Now with basic fine-tuning support using Supervised Fine-Tuning! 🎯

How it works:
1️⃣ Choose your Hugging Face dataset and notebook type (SFT)
2️⃣ Automatically generate your training notebook
3️⃣ Start fine-tuning with your data!

Link to the app πŸ‘‰ https://lnkd.in/e_3nmWrB
πŸ’‘ Want to contribute with new notebooks? πŸ‘‰https://lnkd.in/eWcZ92dS
asoriaΒ 
posted an update 4 months ago
view post
Post
820
I've been working on a Space to make it super easy to create notebooks and help users quickly understand and manipulate their data!
With just a few clicks automatically generate notebooks for:

πŸ“Š Exploratory Data Analysis
🧠 Text Embeddings
πŸ€– Retrieval-Augmented Generation (RAG)

✨ Automatic training is coming soon!
Check it out here asoria/auto-notebook-creator
Appreciate any feedback to improve this tool πŸ€—