Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
BigScience Data
non-profit
https://bigscience.huggingface.co
Activity Feed
Request to join this org
Follow
140
AI & ML interests
None defined yet.
Recent Activity
loubnabnl
authored
a paper
14 days ago
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
stellaathena
authored
a paper
14 days ago
Emergent and Predictable Memorization in Large Language Models
stellaathena
authored
a paper
14 days ago
KMMLU: Measuring Massive Multitask Language Understanding in Korean
View all activity
Team members
72
+38
+25
+4
bigscience-data
's models
8
Sort: Recently updated
bigscience-data/sgpt-bloom-1b7-nli
Sentence Similarity
•
Updated
Jan 27
•
25
•
11
bigscience-data/tokenizer_alpha_NFKC_250k
Updated
Feb 17, 2022
bigscience-data/tokenizer_equal_NFKC_250k
Updated
Feb 16, 2022
bigscience-data/tokenizer_alpha_nfkc_24M
Updated
Feb 16, 2022
bigscience-data/tokenizer_equal_nfkc_24M
Updated
Feb 15, 2022
bigscience-data/tokenizer_equal_weight_NFKC_v1
Updated
Feb 14, 2022
bigscience-data/tokenizer_alpha_weight_NFKC
Updated
Feb 14, 2022
bigscience-data/tokenizer_v0
Updated
Feb 8, 2022