Lingo-IITGN
commited on
Commit
•
b60f15a
1
Parent(s):
edb46e9
Update README.md
Browse files
README.md
CHANGED
@@ -42,9 +42,9 @@ The base model **``Ganga-1b``** trained on a monolingual **Hindi** language data
|
|
42 |
|
43 |
### Model Description 📚
|
44 |
|
45 |
-
**Project Unity** is an initiative
|
46 |
-
To achieve this, we train models on the monolingual regional languages of India. Our first release is the *Ganga-1B* model, *which has been trained on a large dataset of public domain web-crawled
|
47 |
-
|
48 |
|
49 |
|
50 |
|
|
|
42 |
|
43 |
### Model Description 📚
|
44 |
|
45 |
+
**Project Unity** is an initiative to address **India's linguistic diversity** and richness by creating a comprehensive resource covering the country's major languages. We strive to achieve state-of-the-art performance in understanding and generating text in **Indian languages**.
|
46 |
+
To achieve this, we train models on the monolingual regional languages of India. Our first release is the *Ganga-1B* model, *which has been trained on a large dataset of public domain web-crawled Hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality)*. Additionally, the dataset has been further curated by native Indian speakers to ensure high quality.
|
47 |
+
Significantly, the **Ganga-1B** model outperforms existing open-source models that support **Indian languages**, even at sizes of up to **7 billion parameters**.
|
48 |
|
49 |
|
50 |
|