AI & ML interests

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

dataset-viber's activity

davidberenstein1957ย 
posted an update 19 days ago
davidberenstein1957ย 
posted an update 20 days ago
view post
Post
4217
๐ŸฅŠ Epic Agent Framework Showdown! Available today!

๐Ÿ”ต In the blue corner, the versatile challenger with a proven track record of knowledge retrieval: LlamaIndex!

๐Ÿ›‘ In the red corner, the defender, weighing in with lightweight efficiency: Hugging Face smolagents!

๐Ÿ”— URL: https://huggingface.co/agents-course

We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at

- What makes llama-index stand-out
- How the LlamaHub is used for integrations
- Creating QueryEngine components
- Using agents and tools
- Agentic and multi-agent workflows

The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.

Who won? You decide!
davidberenstein1957ย 
posted an update 21 days ago
view post
Post
3017
๐Ÿซธ New release to push vector search to the Hub with vicinity and work with any serialisable objects.

๐Ÿง‘โ€๐Ÿซ KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.

๐Ÿ”— Example Repo: minishlab/my-vicinity-repo
davidberenstein1957ย 
posted an update about 1 month ago
view post
Post
3295
๐Ÿš€ Find banger tools for your smolagents!

I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with:
- inspiration
- best practices
- finding cool tools

Space: davidberenstein1957/smolagents-and-tools
  • 1 reply
ยท
davidberenstein1957ย 
posted an update about 1 month ago
davidberenstein1957ย 
posted an update about 2 months ago
davidberenstein1957ย 
posted an update about 2 months ago
davidberenstein1957ย 
posted an update about 2 months ago
davidberenstein1957ย 
posted an update about 2 months ago
view post
Post
1634
tldr; Parquet is awesome, DuckDB too!

Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDBโ€™s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend
davidberenstein1957ย 
posted an update about 2 months ago
davidberenstein1957ย 
posted an update 2 months ago
davidberenstein1957ย 
posted an update 2 months ago
davidberenstein1957ย 
posted an update 2 months ago
davidberenstein1957ย 
posted an update 3 months ago
davidberenstein1957ย 
posted an update 3 months ago
davidberenstein1957ย 
posted an update 3 months ago
davidberenstein1957ย 
posted an update 3 months ago
view post
Post
4238
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator
  • 4 replies
ยท
davidberenstein1957ย 
posted an update 4 months ago
view post
Post
2087
Open Preference Dataset for Text-to-Image Generation by the ๐Ÿค— Community

Open Image Preferences is an Apache 2.0 licensed dataset for text-to-image generation. This dataset contains 10K text-to-image preference pairs across common image generation categories, while using different model families and varying prompt complexities.

https://huggingface.co/blog/image-preferences
davidberenstein1957ย 
posted an update 4 months ago
view post
Post
1194
This is amazing for cheap models fine-tunes without the hassle of actual deployment! TIL: LoRA fine-tunes for models on the Hub can directly be used for inference!


davidberenstein1957ย 
posted an update 4 months ago
view post
Post
3484
The Data Is Better Together community is set to release the first Apache 2 licensed image preference dataset!

Great work and let's give this a final push :)

@aashish1904 congrats on your month of HF pro. There is more to win during this sprint!

@aashish1904 @AnyaDesdein @davidberenstein1957 @Malalatiana @beta3 @fffiloni @munish0838 @Reza2kn @bbunzeck @Creazycreator @andrei-saceleanu @jafhaponiuk @rca-etl @kf120 @burtenshaw @mmhamdy @grib0ed0v @Doopus @AnyaDes @ttkap @Xceron @Lewox @davanstrien @Azazelle @adirik @Ashish08 @AntonVic @kenantang @sdiazlor @g-ronimo @dennis-rall @prithivMLmods @girtss3 @flozi00 @WaveCut @Taylor658 @Wildminder @Sara9999 @phaelishall @sararob @dvilasuero @pgabrys @plaguss @CDS899 @timajwilliams @rudzinskimaciej @pavel-ai @aggr8 @ignacioct @MouseAI @Leeps @MaksKul @NicolasDmln @Muinez @kusht55 @caiolang @Jakub-Brand24 @loamy @Demijan @eliab96 @Viewegger @JosephCatrambone @p1atdev @mrshu @o639 @Targezed @Aviv-anthonnyolime @thliang01 @Ahmed-Amine @glards @pranaykoppula @nataliaElv @MaPirlet @alvarobartt @gabrielmbmb @zlicastro @Jaydip @Chouettecheveche @lilcheaty @ruyrdiaz @robintema @fdaudens @ggcristian @a-r-r-o-w @pates @joheras @stopsatgreen @bezo97 @chachi902 @iamyann @liamcripwell @dmb23 @korbih @anonymous7743 @akbdx18 @OVAWARE @severo @akontra @lichorosario @lhoestq @SebastianBodza @Vishnou @ameerazam08 @appoose @Mukei @mearco @joaquincabezas @Fizzarolli @thomastraum @igortopolski @OxxoCodes @patrickfleith @asoria @bn22 @sitammeur @Krodolf @bergr7f @Sbxxn @wietsevenema @sugatoray @Iamladi @MikeTrizna @feveromo @mokady @Bolero @prath @Dowwie @kfahn @decodingchris @alili2050 @RahulRaman @yzimmermann @Ameeeee @ecyht2 @MattMC001 @hemanthkumarak @Thegorgibus @akos2 @LawRun @ramithuh @SuperMuel @sjans @peterizsak @mosama @Eyel @mtr3 @cfahlgren1 @legentil @clem @Citaman @Aurelien-Morgan @AntoineBourgois @TotoB12 @Stanmey @osanseviero @multimodalart @maxiw @ariG23498 @ngk89 @femboysLover @dvs @tacohiddink @blanchon @DavidJimenez
  • 2 replies
ยท