File size: 908 Bytes
4058279
 
 
 
 
bb3c14b
 
 
 
f84bfdd
bb3c14b
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
datasets:
- GregSamek/TinyNews
language:
- en
---

# Tiny News

For a detailed overview of this project from start to finish, check out [GregSamek.github.io/TinyNews](https://GregSamek.github.io/TinyNews)

TinyNews is a collection of one million synthetically generated news bulletins and several language models scratch-trained on this data.  Evaluations suggests that TinyNews retains ~80% of the quality of the training data while using ~1/1000th the number of parameters as the models used to generate it.

To run these models, git clone [the repository](https://github.com/gregsamek/TinyNews)

Trained models and training data are available in this [🤗 Hugging Face Collection](https://huggingface.co/collections/GregSamek/tinynews-668aff540bf195d6e5e0e40f)

This project is essentially a modified reimplementation of the Microsoft Research [TinyStories](https://arxiv.org/abs/2305.07759) project.