universitytehran
/

PersianMind-v1.0

@@ -26,9 +26,12 @@ co2_eq_emissions:
 <img src="PersianMind.jpg" alt="PersianMind logo" width=200/>
-# PersianMind
-PersianMind is a a cross-lingual Persian-English large language model.
 ### Model Description
@@ -39,7 +42,8 @@ PersianMind is a a cross-lingual Persian-English large language model.
 ## How to Get Started with the Model
-Use the code below to get started with the model. Note that you need to install `sentencepiece` and `accelerate` libraries to run this code.
 ```python
 from transformers import LlamaTokenizer, LlamaForCausalLM
@@ -73,11 +77,11 @@ model_output = model_output.replace(model_input, "")
 print(model_output)
 ```
-## How to Get Started with the Quantized Model
 Quantized models can be run on resource-constrained devices.
-To use quantized models, you should install the `bitsandbytes` library.
-To get started with 8-bit quantized model, use the code below to define the model.
 ```python
 model = LlamaForCausalLM.from_pretrained(
@@ -88,7 +92,7 @@ model = LlamaForCausalLM.from_pretrained(
 )
 ```
-To get started with 4-bit quantized model, use the code below to define the model.
 ```python
 from transformers import BitsAndBytesConfig
@@ -105,24 +109,25 @@ model = LlamaForCausalLM.from_pretrained(
 )
 ```
-## Evaluating Quantized Models
-| Model              | Belebele (Persian) | Translation Fa2En | Translation En2Fa | Model Size | Words/sec |
-| :----------------- | :----------------: | :---------------: | :---------------: | :--------: | :-------: |
-| PersianMind        |        73.9        |       83.61       |       79.44       |   13.66G   |   25.35   |
-| PersianMind-8bit   |        73.7        |       82.32       |       78.61       |    7.2G    |   11.36   |
-| PersianMind-4bit   |        70.2        |       82.07       |       80.36       |    3.9G    |   24.36   |
 We evaluated quantized models in various tasks against the original model.
 Specifically, we evaluated all models using the reading comprehension multiple-choice
-question-answering benchmark of Belebele (Persian subset) and reported the accuracy of each model.
 Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
-For this, we utilized the Persian-English subset of the Flores-200 dataset and reported our results using the Comet metric.
-Furthermore, we calculated the average number of words generated by each model per second during running the translation tasks.
-To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint` function.
 ## License
-PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
 It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
 Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page.
 If you suspect any violations, please reach out to us.

 <img src="PersianMind.jpg" alt="PersianMind logo" width=200/>
+# <span style="font-variant:small-caps;">PersianMind</span>
+<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model.
+The model achieves state-of-the-art results on Persian subset of the [Belebele](https://github.com/facebookresearch/belebele) benchmark
+and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task.
+It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.
 ### Model Description
 ## How to Get Started with the Model
+Use the code below to get started with the model.
+Note that you need to install <code><b>sentencepiece</b></code> and <code><b>accelerate</b></code> libraries along with <code><b>Pytorch</b></code> and <code><b>🤗Transformers</b></code> to run this code.
 ```python
 from transformers import LlamaTokenizer, LlamaForCausalLM
 print(model_output)
 ```
+### How to Quantize the Model
 Quantized models can be run on resource-constrained devices.
+To quantize the model, you should install the <code><b>bitsandbytes</b></code> library.
+In order to quantize the model in 8-bit (`INT8`), use the code below.
 ```python
 model = LlamaForCausalLM.from_pretrained(
 )
 ```
+Alternatively, you can quantize the model in 4-bit (`INT4`) with the following code.
 ```python
 from transformers import BitsAndBytesConfig
 )
 ```
+### Evaluating Quantized Models
+| Model                                                              | Belebele (Persian) | Fa→En Translation | En→Fa Translation | Model Size | Tokens/sec |
+| :----------------------------------------------------------------- | :----------------: | :---------------: | :---------------: | :--------: | :--------: |
+| <span style="font-variant:small-caps;">PersianMind</span> (`bf16`) |        73.9        |       83.61       |       79.44       |   13.7G    |   25.35    |
+| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) |        73.7        |       82.32       |       78.61       |    7.2G    |   11.36    |
+| <span style="font-variant:small-caps;">PersianMind</span> (`INT4`) |        70.2        |       82.07       |       80.36       |    3.9G    |   24.36    |
 We evaluated quantized models in various tasks against the original model.
 Specifically, we evaluated all models using the reading comprehension multiple-choice
+question-answering benchmark of [Belebele](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model.
 Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
+For this, we utilized the Persian-English subset of the [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and
+reported our results using the <span style="font-variant:small-caps;">Comet</span> metric.
+Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks.
+To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint()` function.
 ## License
+<span style="font-variant:small-caps;">PersianMind</span> is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
 It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
 Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page.
 If you suspect any violations, please reach out to us.