Lingo-IITGN
commited on
Commit
•
46be83d
1
Parent(s):
d02e75f
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ widget:
|
|
13 |
|
14 |
# Model Card for Ganga-1b! 🌊
|
15 |
|
16 |
-
The base model **``Ganga-1b``** trained on a monolingual **Hindi** language dataset as part of ***Project Unity***. The first pre-trained Hindi model by any academic research lab in India 🇮🇳!
|
17 |
|
18 |
|
19 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/667b8f8ba271fc5a8e6929de/jG3tZnGPvH6vcGrvxO-YC.png)
|
@@ -21,15 +21,16 @@ The base model **``Ganga-1b``** trained on a monolingual **Hindi** language data
|
|
21 |
|
22 |
## Model Details
|
23 |
|
|
|
|
|
24 |
### Model Description
|
25 |
|
26 |
Project Unity is an initiative aimed at addressing India's linguistic diversity and richness by creating a comprehensive resource that covers the country's major languages. Our goal is to achieve state-of-the-art performance in understanding and generating text in Indian languages. To achieve this, we train models on monolingual regional languages of India. Our first release is the Ganga-1B model, which has been trained on a large dataset of public domain web-crawled hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality). Additionally, the dataset has been further curated by native Indian speakers to ensure high-quality. Importantly, the Ganga-1B model outperforms existing open-source models that support Indian languages, even at sizes of up to 7 billion parameters. Designed to be compact and efficient, the model can easily run on edge devices, making it ideal for a range of applications that require generating human-like text. Its modest size also enables easy integration into resource-constrained environments, such as personal devices or cloud infrastructure, allowing for wider adoption and innovation in AI-driven technologies.
|
27 |
|
28 |
|
29 |
-
|
30 |
- **Developed by:** Lingo Research Group and hyperlink
|
31 |
-
- **Model type:**
|
32 |
-
- **Language(s) (NLP):** Bilingual (Primary: Hindi [hi], Secondary: English [en]
|
33 |
- **License:** Apache 2.0
|
34 |
|
35 |
|
|
|
13 |
|
14 |
# Model Card for Ganga-1b! 🌊
|
15 |
|
16 |
+
The base model **``Ganga-1b``** trained on a monolingual **Hindi** language dataset as part of ***Project Unity***. <br> *(The first pre-trained Hindi model by any academic research lab in India 🇮🇳!)**
|
17 |
|
18 |
|
19 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/667b8f8ba271fc5a8e6929de/jG3tZnGPvH6vcGrvxO-YC.png)
|
|
|
21 |
|
22 |
## Model Details
|
23 |
|
24 |
+
|
25 |
+
|
26 |
### Model Description
|
27 |
|
28 |
Project Unity is an initiative aimed at addressing India's linguistic diversity and richness by creating a comprehensive resource that covers the country's major languages. Our goal is to achieve state-of-the-art performance in understanding and generating text in Indian languages. To achieve this, we train models on monolingual regional languages of India. Our first release is the Ganga-1B model, which has been trained on a large dataset of public domain web-crawled hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality). Additionally, the dataset has been further curated by native Indian speakers to ensure high-quality. Importantly, the Ganga-1B model outperforms existing open-source models that support Indian languages, even at sizes of up to 7 billion parameters. Designed to be compact and efficient, the model can easily run on edge devices, making it ideal for a range of applications that require generating human-like text. Its modest size also enables easy integration into resource-constrained environments, such as personal devices or cloud infrastructure, allowing for wider adoption and innovation in AI-driven technologies.
|
29 |
|
30 |
|
|
|
31 |
- **Developed by:** Lingo Research Group and hyperlink
|
32 |
+
- **Model type:** Autoregressive Language Model
|
33 |
+
- **Language(s) (NLP):** Bilingual (Primary: Hindi [hi], Secondary: English [en])
|
34 |
- **License:** Apache 2.0
|
35 |
|
36 |
|