Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

BigScience Data

non-profit
https://bigscience.huggingface.co
Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

loubnabnl  authored a paper 14 days ago
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
stellaathena  authored a paper 14 days ago
Emergent and Predictable Memorization in Large Language Models
stellaathena  authored a paper 14 days ago
KMMLU: Measuring Massive Multitask Language Understanding in Korean
View all activity

Albert Villanova del Moral's profile picture Leandro von Werra's profile picture Mario Šaško's profile picture Jörg Frohberg's profile picture Quentin Lhoest's profile picture Christopher Akiki's profile picture Violette's profile picture Iz Beltagy's profile picture Yacine Jernite's profile picture Hugo Laurençon's profile picture Manuel Romero's profile picture Lucile Saulnier's profile picture Thomas Wang's profile picture Teven Le Scao's profile picture Sasha Luccioni's profile picture Huu Nguyen's profile picture Kyle Lo's profile picture Roman Castagné's profile picture Stella Biderman's profile picture Sourab Mangrulkar's profile picture Loubna Ben Allal's profile picture Francesco De Toni's profile picture gerard dupont's profile picture Angie McMillan-Major's profile picture Masoud's profile picture Hendrik Strobelt's profile picture Margaret Mitchell's profile picture Younes B's profile picture David McClure's profile picture Yozh's profile picture Giada Pistilli's profile picture Niklas Muennighoff's profile picture Aleksandra Piktus's profile picture Sheng Shen's profile picture Ben Schmidt's profile picture Ron Au's profile picture Marielle Lange's profile picture Anna Rogers's profile picture Colin Raffel's profile picture Tristan Thrush's profile picture Carlos Muñoz Ferrandis's profile picture Nazneen Rajani's profile picture Britney Muller's profile picture helen's profile picture Mostofa Patwary's profile picture Pedro Ortiz Suarez's profile picture Douwe Kiela's profile picture María Grandury's profile picture Nikhil Kandpal's profile picture Xinyu ZHANG's profile picture Odunayo Ogundepo's profile picture Jimmy Lin's profile picture Pete's profile picture Unso Eun Seo Jo's profile picture Chris Emezue's profile picture Zaid Alyafeai's profile picture Andrea Soria's profile picture Paulo Villegas's profile picture Manan Dey's profile picture M Saiful Bari's profile picture Thomas Wolf's profile picture Jonathan Li's profile picture Changran Hu's profile picture Thakker's profile picture Terra Blevins's profile picture Murray Kang's profile picture Na's profile picture Zeerak's profile picture Richard Diehl Martinez's profile picture Pierre-Carl Langlais's profile picture Demetris's profile picture Guilherme Penedo's profile picture

bigscience-data 's Spaces 12

Sleeping
2

Document Sizes

📚

Display document size plots

Jul 2, 2024
Runtime error
6

Process Pipeline Visualizer

👁

Jul 2, 2024
Running
15

Corpus Map

📈

Display a treemap of languages and datasets

Jul 2, 2024
Runtime error
2

Filter Values Distributions

🐠

Jul 2, 2024
Runtime error
6

Bigscience Corpus

🌍

Sep 8, 2023
Runtime error
48

Roots Search Tool

🌸

Search through ROOTS corpus using queries

Apr 3, 2023
Paused
3

Roots Search Tool - dev tier

🌖

Feb 22, 2023
Runtime error
3

Pyserini

🦫

Oct 14, 2022
Running
2

Bloom Tokenizer Multilinguality

📉

Display Bokeh plot

Aug 22, 2022
Build error

Token Explorer

🧑

Aug 19, 2022
Running
2

Bloom Tokens

🌍

Display a Bokeh plot

Aug 16, 2022
Runtime error
5

Bigscience Tokenizer

🔣

Jul 26, 2022
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs