Lingo-IITGN
commited on
Commit
•
4a18946
1
Parent(s):
be0f64c
Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ The base model **``Ganga-1b``** trained on a monolingual **Hindi** language data
|
|
29 |
|
30 |
|
31 |
|
32 |
-
### Model Description
|
33 |
|
34 |
Project Unity is an initiative aimed at addressing India's linguistic diversity and richness by creating a comprehensive resource that covers the country's major languages. Our goal is to achieve state-of-the-art performance in understanding and generating text in Indian languages.
|
35 |
To achieve this, we train models on monolingual regional languages of India. Our first release is the Ganga-1B model, which has been trained on a large dataset of public domain web-crawled hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality). Additionally, the dataset has been further curated by native Indian speakers to ensure high-quality.
|
@@ -44,7 +44,7 @@ Importantly, the Ganga-1B model outperforms existing open-source models that sup
|
|
44 |
|
45 |
|
46 |
|
47 |
-
## How to Get Started with the Model
|
48 |
|
49 |
Use the code below to get started with the model.
|
50 |
|
@@ -68,7 +68,7 @@ print(result)
|
|
68 |
|
69 |
```
|
70 |
|
71 |
-
## Technical Specifications
|
72 |
|
73 |
### Model Architecture and Objective
|
74 |
|
@@ -87,8 +87,7 @@ Ganga-1b is a decoder-only transformer model, featuring the following specificat
|
|
87 |
## Evaluation
|
88 |
|
89 |
|
90 |
-
|
91 |
-
### Results
|
92 |
|
93 |
<details open>
|
94 |
<summary>Tokenizers Results</summary>
|
@@ -124,21 +123,21 @@ Ganga-1b is a decoder-only transformer model, featuring the following specificat
|
|
124 |
</details>
|
125 |
|
126 |
|
127 |
-
|
128 |
|
129 |
|
130 |
|
131 |
-
## Bias, Risks, and Limitations
|
132 |
|
133 |
|
134 |
-
### Recommendations
|
135 |
|
136 |
<span style="color:red">This model described is a research preview and is under ongoing iterative updations, and as such, it only provides limited safety measures. Additionally, it may generate offensive content. It is strictly prohibited to use this service for any illegal, harmful, violent, racist, or sexual purposes.</span>.
|
137 |
|
138 |
|
139 |
|
140 |
|
141 |
-
## Model Card Contact
|
142 |
|
143 |
[Lingo Research Labs at IIT Gandhinagar, India](https://labs.iitgn.ac.in/lingo/) </br>
|
144 |
Mail at: [[email protected]]([email protected])
|
|
|
29 |
|
30 |
|
31 |
|
32 |
+
### Model Description 📚
|
33 |
|
34 |
Project Unity is an initiative aimed at addressing India's linguistic diversity and richness by creating a comprehensive resource that covers the country's major languages. Our goal is to achieve state-of-the-art performance in understanding and generating text in Indian languages.
|
35 |
To achieve this, we train models on monolingual regional languages of India. Our first release is the Ganga-1B model, which has been trained on a large dataset of public domain web-crawled hindi language data, including news articles, web documents, books, government publications, educational materials, and social media conversations (filtered for quality). Additionally, the dataset has been further curated by native Indian speakers to ensure high-quality.
|
|
|
44 |
|
45 |
|
46 |
|
47 |
+
## How to Get Started with the Model 👨🏻💻
|
48 |
|
49 |
Use the code below to get started with the model.
|
50 |
|
|
|
68 |
|
69 |
```
|
70 |
|
71 |
+
## Technical Specifications 🤖
|
72 |
|
73 |
### Model Architecture and Objective
|
74 |
|
|
|
87 |
## Evaluation
|
88 |
|
89 |
|
90 |
+
### Results 🏆
|
|
|
91 |
|
92 |
<details open>
|
93 |
<summary>Tokenizers Results</summary>
|
|
|
123 |
</details>
|
124 |
|
125 |
|
126 |
+
## Summary
|
127 |
|
128 |
|
129 |
|
130 |
+
## Bias, Risks, and Limitations 🚨
|
131 |
|
132 |
|
133 |
+
### Recommendations ‼️
|
134 |
|
135 |
<span style="color:red">This model described is a research preview and is under ongoing iterative updations, and as such, it only provides limited safety measures. Additionally, it may generate offensive content. It is strictly prohibited to use this service for any illegal, harmful, violent, racist, or sexual purposes.</span>.
|
136 |
|
137 |
|
138 |
|
139 |
|
140 |
+
## Model Card Contact ✉️
|
141 |
|
142 |
[Lingo Research Labs at IIT Gandhinagar, India](https://labs.iitgn.ac.in/lingo/) </br>
|
143 |
Mail at: [[email protected]]([email protected])
|