Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: green | |
colorTo: gray | |
sdk: static | |
pinned: false | |
<a href="https://huggingface.co/gair-prox" target="_blank"> | |
<img src="https://cdn-uploads.huggingface.co/production/uploads/628f6e5ab90dde28ef57d293/gfqBTSEIa140Hu-mfo9Qe.png" alt="Clickable Image" /> | |
</a> | |
GAIR-ProX, a subsidiary of [GAIR](https://huggingface.co/GAIR), spearheads the π« ProX Project. This initiative aims to enhance pre-training efficiency by refining corpus documents using language models at scale. Through meticulous operations (e.g., document-level filtering and chunk-level cleaning), implemented as scalable, executable programs, π« ProX seeks to improve pre-training data quality at scale, ultimately developing more robust and efficient language models. | |
<i>Read our [technical report](https://huggingface.co/papers/2409.17115)!</i> |