sagorsarker commited on
Commit
a9e2a24
1 Parent(s): c28c1ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -29,9 +29,9 @@ Datasets comprise Bangla, English, and Codes data. We mixed Bangla data with Eng
29
 
30
  Token-wise distribution will be added soon below.
31
 
32
- | Data chunk | Language | Token count |
33
  |----------------|----------|-------------|
34
- | Redpajama Arxiv | English | 00 |
35
  | Redpajama Book | English | 00 |
36
  | Redpajama Wikipedia | English | 00 |
37
  | Redpajama Github Code | English | 00 |
 
29
 
30
  Token-wise distribution will be added soon below.
31
 
32
+ | Data chunk | Language | Token count(Billion) |
33
  |----------------|----------|-------------|
34
+ | Redpajama Arxiv | English | 2.12 |
35
  | Redpajama Book | English | 00 |
36
  | Redpajama Wikipedia | English | 00 |
37
  | Redpajama Github Code | English | 00 |