sagorsarker
commited on
Commit
•
a9e2a24
1
Parent(s):
c28c1ea
Update README.md
Browse files
README.md
CHANGED
@@ -29,9 +29,9 @@ Datasets comprise Bangla, English, and Codes data. We mixed Bangla data with Eng
|
|
29 |
|
30 |
Token-wise distribution will be added soon below.
|
31 |
|
32 |
-
| Data chunk | Language | Token count |
|
33 |
|----------------|----------|-------------|
|
34 |
-
| Redpajama Arxiv | English |
|
35 |
| Redpajama Book | English | 00 |
|
36 |
| Redpajama Wikipedia | English | 00 |
|
37 |
| Redpajama Github Code | English | 00 |
|
|
|
29 |
|
30 |
Token-wise distribution will be added soon below.
|
31 |
|
32 |
+
| Data chunk | Language | Token count(Billion) |
|
33 |
|----------------|----------|-------------|
|
34 |
+
| Redpajama Arxiv | English | 2.12 |
|
35 |
| Redpajama Book | English | 00 |
|
36 |
| Redpajama Wikipedia | English | 00 |
|
37 |
| Redpajama Github Code | English | 00 |
|