thomwolf HF Staff commited on
Commit
d8919a0
Β·
1 Parent(s): 3310305
Files changed (2) hide show
  1. dist/index.html +1 -4
  2. src/index.html +1 -4
dist/index.html CHANGED
@@ -84,12 +84,9 @@
84
  download it <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">here</a>.</p>
85
  <p>Both datasets are released under the permissive <a href="https://opendatacommons.org/licenses/by/1-0/">ODC-By 1.0 license</a></p>
86
 
87
- <p>As 🍷 FineWeb has gathered a lot of interest from the
88
- community, we decided to explain in full detail the steps involved in creating it as well as our processing decisions and
89
- many lessons learned along the way. Hence, the present (lengthy) technical report. Read on for all the juicy details on large text dataset creation!</p>
90
- <aside>For the best possible reading experience, we recommend not using a mobile phone.</aside>
91
  <p><strong>TLDR:</strong> This blog covers a discussion on processing and evaluating data quality at scale, the 🍷 FineWeb
92
  recipe (listing and explaining all of our design choices), and the process followed to create its πŸ“š FineWeb-Edu subset.</p>
 
93
 
94
  <h2>What's web data</h2>
95
  <h3>Finding the data</h3>
 
84
  download it <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">here</a>.</p>
85
  <p>Both datasets are released under the permissive <a href="https://opendatacommons.org/licenses/by/1-0/">ODC-By 1.0 license</a></p>
86
 
 
 
 
 
87
  <p><strong>TLDR:</strong> This blog covers a discussion on processing and evaluating data quality at scale, the 🍷 FineWeb
88
  recipe (listing and explaining all of our design choices), and the process followed to create its πŸ“š FineWeb-Edu subset.</p>
89
+ <aside>For the best possible reading experience, we recommend not using a mobile phone.</aside>
90
 
91
  <h2>What's web data</h2>
92
  <h3>Finding the data</h3>
src/index.html CHANGED
@@ -84,12 +84,9 @@
84
  download it <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">here</a>.</p>
85
  <p>Both datasets are released under the permissive <a href="https://opendatacommons.org/licenses/by/1-0/">ODC-By 1.0 license</a></p>
86
 
87
- <p>As 🍷 FineWeb has gathered a lot of interest from the
88
- community, we decided to explain in full detail the steps involved in creating it as well as our processing decisions and
89
- many lessons learned along the way. Hence, the present (lengthy) technical report. Read on for all the juicy details on large text dataset creation!</p>
90
- <aside>For the best possible reading experience, we recommend not using a mobile phone.</aside>
91
  <p><strong>TLDR:</strong> This blog covers a discussion on processing and evaluating data quality at scale, the 🍷 FineWeb
92
  recipe (listing and explaining all of our design choices), and the process followed to create its πŸ“š FineWeb-Edu subset.</p>
 
93
 
94
  <h2>What's web data</h2>
95
  <h3>Finding the data</h3>
 
84
  download it <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">here</a>.</p>
85
  <p>Both datasets are released under the permissive <a href="https://opendatacommons.org/licenses/by/1-0/">ODC-By 1.0 license</a></p>
86
 
 
 
 
 
87
  <p><strong>TLDR:</strong> This blog covers a discussion on processing and evaluating data quality at scale, the 🍷 FineWeb
88
  recipe (listing and explaining all of our design choices), and the process followed to create its πŸ“š FineWeb-Edu subset.</p>
89
+ <aside>For the best possible reading experience, we recommend not using a mobile phone.</aside>
90
 
91
  <h2>What's web data</h2>
92
  <h3>Finding the data</h3>