argilla-internal-testing

company

https://argilla.io/

argilla_io

https://github.com/argilla-io

Activity Feed

AI & ML interests

Data Quality

Recent Activity

plaguss authored a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

burtenshaw authored a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

davidberenstein1957 updated a dataset 19 days ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_844b2262-af53-492e-a893-eba895f32d4f

View all activity

argilla-internal-testing's activity

davidberenstein1957

updated a dataset about 7 hours ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_ea6a2f0a-bedd-4e7e-8d0f-b0edebbcbe1e

Viewer • Updated about 7 hours ago • 3

davidberenstein1957

published a dataset about 7 hours ago

argilla-internal-testing/test_import_dataset_from_hub_with_classlabel_ea6a2f0a-bedd-4e7e-8d0f-b0edebbcbe1e

Viewer • Updated about 7 hours ago • 3

burtenshaw

posted an update 6 days ago

Post

6561

AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:

1️⃣ New Supervised Fine-tuning unit in the NLP Course https://huggingface.co/learn/nlp-course/en/chapter11/1
2️⃣New Finetuning for agents bonus module in the Agents Course https://huggingface.co/learn/agents-course/bonus-unit1/introduction

Fine-tuning will squeeze everything out of your model for how you’re using it, more than any prompt.

2 replies

burtenshaw

posted an update 8 days ago

Post

3132

NEW COURSE! We’re cooking hard on Hugging Face courses, and it’s not just agents. The NLP course is getting the same treatment with a new chapter on Supervised Fine-Tuning!

👉 Follow to get more updates https://huggingface.co/nlp-course

The new SFT chapter will guide you through these topics:

1️⃣ Chat Templates: Master the art of structuring AI conversations for consistent and helpful responses.

2️⃣ Supervised Fine-Tuning (SFT): Learn the core techniques to adapt pre-trained models to your specific outputs.

3️⃣ Low Rank Adaptation (LoRA): Discover efficient fine-tuning methods that save memory and resources.

4️⃣ Evaluation: Measure your model's performance and ensure top-notch results.

This is the first update in a series, so follow along if you’re upskilling in AI.

2 replies

burtenshaw

posted an update 11 days ago

Post

3296

Hey, I’m Ben and I work at Hugging Face.

Right now, I’m focusing on educational stuff and getting loads of new people to build open AI models using free and open source tools.

I’ve made a collection of some of the tools I’m building and using for teaching. Stuff like quizzes, code challenges, and certificates.

burtenshaw/tools-for-learning-ai-6797453caae193052d3638e2

1 reply

davidberenstein1957

posted an update 13 days ago

Post

3139

🚀 Find banger tools for your smolagents!

I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with:
- inspiration
- best practices
- finding cool tools

Space: davidberenstein1957/smolagents-and-tools

1 reply

burtenshaw

posted an update 14 days ago

Post

8879

The Hugging Face agents course is finally out!

👉 https://huggingface.co/agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt

davidberenstein1957

posted an update 14 days ago

Post

2402

Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset

Blog: https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data

burtenshaw

posted an update 18 days ago

Post

3522

SmolLM2 paper is out! 😊

😍 Why do I love it? Because it facilitates teaching and learning!

Over the past few months I've engaged with (no joke) thousands of students based on SmolLM.

- People have inferred, fine-tuned, aligned, and evaluated this smol model.
- People used they're own machines and they've used free tools like colab, kaggle, and spaces.
- People tackled use cases in their job, for fun, in their own language, and with their friends.

upvote the paper SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)

1 reply

plaguss

authored a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

burtenshaw

authored a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

davidberenstein1957

posted an update 19 days ago

Post

2027

Agentic RAG: Applied, visual, and step-by-step! 🐾

Get familiar with the Agents and tools, not the bells and whistles!

Retrieve - Augment and now GENERATE.

part 3: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-3-generate

davidberenstein1957

posted an update 20 days ago

Post

2835

Anyone can create free hosted tools for their AI agents! 🔥

Agentic RAG stack part 2 - augment
Augment retrieval results by reranking optimises content without increasing time too much

part2: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-2-augment
code: https://github.com/huggingface/ai-blueprint

davidberenstein1957

posted an update 21 days ago

Post

1982

Creating an agentic RAG stack on the Hugging Face Hub - part 1 - retrieval (1/5).

🚀 Web apps and microservices included!

Chunk, embed and index documents at a huge scale without overhead.

Blog: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-1-retrieve

davidberenstein1957

posted an update 26 days ago

Post

1619

tldr; Parquet is awesome, DuckDB too!

Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDB’s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend

davidberenstein1957

posted an update 29 days ago

Post

1790

Let's uncover the post-training dataset from DeepSeek-R1 with Magpie!

Pass pre-query tokens <｜begin▁of▁sentence｜>User: , let the model generate the rest.

We can get realistic examples!

Gist: https://gist.github.com/davidberenstein1957/3f20046ce57395a6aba13f8b4e956b59

6 replies

burtenshaw

posted an update 29 days ago

Post

3229

Manic few days in open source AI, with game changing development all over the place. Here's a round up of the resources:

- The science team at @huggingface reproduced and open source the seek r1. https://github.com/huggingface/open-r1
- @qwen released a series of models with 1 million token context! https://qwenlm.github.io/blog/qwen2.5-1m/
- SmolVLM got even smaller with completely new variants at 256m and 500m https://huggingface.co/blog/smolervlm

There's so much you could do with these developments. Especially combining them together into agentic applications or fine-tuning them on your use case.

1 reply

burtenshaw

posted an update about 1 month ago

Post

1342

Hey 👋

I'm helping out on some community research to learn about the AI community. If you want to join in the conversation, head over here where I started a community discussion on the most influential model since BERT.

OSAIResearchCommunity/README#2

burtenshaw

posted an update about 1 month ago

Post

1998

📣 Teachers and Students! Here's a handy quiz app if you're preparing your own study material.

TLDR, It's a quiz that uses a dataset to make questions and save answers

Here's how it works:

- make a dataset of multiple choice questions
- duplicate the space add set the dataset repo
- log in and do the quiz
- submit the questions to create a new dataset

I made this to get ready for the agents course, but I hope it's useful for you projects too!

quiz app burtenshaw/dataset_quiz

dataset with questions burtenshaw/exam_questions

agents course we're working on https://huggingface.co/agents-course

burtenshaw

posted an update about 1 month ago

Post

2581

AI was built on side projects!

AI & ML interests

Recent Activity

Team members 10

argilla-internal-testing's activity