Transformers
khulnasoft commited on
Commit
c9594ca
Β·
verified Β·
1 Parent(s): 295f278

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -41
README.md CHANGED
@@ -14,47 +14,6 @@ configs:
14
  path: data/*/*
15
  - config_name: sample-10BT
16
  ---
17
- # 🍷 Spidder
18
- <center>
19
- <img src="https://huggingface.co/datasets/cvedb/admin/resolve/main/spidder-logo.png" alt="Spidder: The finest collection of data the web has to offer">
20
- </center>
21
-
22
- > 15 trillion tokens of the finest data the 🌐 web has to offer
23
-
24
- # Table of Contents
25
- - [🍷 Spidder](#-spidder)
26
- * [What is it?](#what-is-it)
27
- * [What is being released?](#what-is-being-released)
28
- * [Changelog](#changelog)
29
- * [How to download and use 🍷 Spidder](#how-to-download-and-use-🍷-spidder)
30
- + [Using 🏭 `datatrove`](#using-datatrove)
31
- + [Using `huggingface_hub`](#using-huggingface_hub)
32
- + [Using `datasets`](#using-datasets)
33
- * [Breakdown by dump/crawl](#breakdown-by-dumpcrawl)
34
- * [Dataset performance evaluation and ablations](#dataset-performance-evaluation-and-ablations)
35
- + [Hyper-parameters for ablation models](#hyper-parameters-for-ablation-models)
36
- + [Ablation evaluation benchmarks](#ablation-evaluation-benchmarks)
37
- + [Comparison with other datasets](#comparison-with-other-datasets)
38
- - [Dataset card for 🍷 Spidder](#dataset-card-for-🍷-spidder)
39
- * [Dataset Summary](#dataset-summary)
40
- * [Dataset Structure](#dataset-structure)
41
- + [Data Instances](#data-instances)
42
- + [Data Fields](#data-fields)
43
- + [Data Splits](#data-splits)
44
- * [Dataset Creation](#dataset-creation)
45
- + [Curation Rationale](#curation-rationale)
46
- + [Source Data](#source-data)
47
- + [Data processing steps](#data-processing-steps)
48
- + [Annotations](#annotations)
49
- + [Personal and Sensitive Information](#personal-and-sensitive-information)
50
- * [Considerations for Using the Data](#considerations-for-using-the-data)
51
- + [Social Impact of Dataset](#social-impact-of-dataset)
52
- + [Discussion of Biases](#discussion-of-biases)
53
- + [Other Known Limitations](#other-known-limitations)
54
- * [Additional Information](#additional-information)
55
- + [Licensing Information](#licensing-information)
56
- + [Future work](#future-work)
57
- + [Citation Information](#citation-information)
58
 
59
  ## What is it?
60
 
 
14
  path: data/*/*
15
  - config_name: sample-10BT
16
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## What is it?
19