Mantra-14B

Mantra-14B is a 14.7B parameter instruction-tuned bilingual large language model for both Hindi and English, trained on a mixed language dataset.

~0.7 % better performance on English Tasks compared to the original (average benchmark scores)
~2.8 % better performance on Hindi Tasks compared to the original (average benchmark scores)
~4.4 % better performance on tougher english benchmarks (open-llm-leaderboard evals)
~8.5 % less emissions than the original (as reported on benchmark evaluations like open-llm-leaderboard)
Less Biases due to ordering of choices while answering MCQs

Model Details:

Developed by: Traversaal.ai, 1-800-LLMs
Language(s) (NLP): Optimized for Hindi and English
License: Apache 2.0
Paper : TBA April 15

Prompt Formats

Task	Input Format
Natural Language Inference	"`Text1 ### Text2 ### NLI ###`"
Multiple Choice Questions	"`Question ### A) a, B) b,... ### MCQ ###`"
Numeric Questions	"`Question ### NUMERIC ###`"
Boolean Questions	"`Question ### BOOLEAN ###`"
Questions seeking Long responses	"`Question ### LONG RESPONSE ###`"
Short responses (few words)	"`Input ### DIRECT RESPONSE ###`"
Coding	"`Input ### CODE ###`"
Text Summarization	"`Input ### SUMMARIZE ###`"
Paraphrasing/Rephrasing	"`Input ### PARAPHRASE ###`"
Translation to specified language	"`Input ### TRANSLATION [lang] ###`"
Text Simplification/ELI5	"`Input ### SIMPLIFY ###`"

The following prompt formats were used during training and are better suited for usage, however the model works well even without such formatting

Evaluation:

We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows:

Model	ARC-C	ARC-E	BoolQ	CMCQ	MMLU	Average*	MMLU-Pro	GPQA	MuSR	BBH	MATH-Hard
AryaBhatta-GemmaUltra-8.5B	22.70	25.04	22.95	62.23	23.70	31.32	22.66	25.34	42.72	41.12	2.95
Airavata-7B	25.09	30.47	25.31	62.17	33.20	35.25	16.35	27.43	37.57	36.00	13.60
sarvam-1-2B	30.03	33.25	62.17	42.80	27.90	39.23	-	-	-	-	-
Nemotron-4-Mini-Hindi-Instruct	55.80	71.63	62.11	68.10	43.20	60.17	25.95	30.87	41.53	40.11	2.04
Llama-3-Nanda-10B-Chat	65.36	80.64	82.29	67.60	50.61	69.30	31.57	30.12	43.52	49.38	5.59
Krutrim-2-12b-instruct	67.32	81.10	84.74	76.30	56.10	73.11	-	-	-	-	-
aya-expanse-8b	74.06	87.08	86.45	83.30	56.89	77.56	30.04	30.29	37.17	49.42	7.02
aya-expanse-32B	85.41	95.08	90.43	89.80	69.71	86.08	41.30	32.55	38.62	56.29	13.37
Mantra-14B	97.39	92.24	87.65	87.40	75.59	88.05	52.39	39.77	49.07	66.97	23.11

Table 1: Metrics (.2f) of our models and other LLMs over several English benchmarks

Model	ARC-C	ARC-E	BoolQ	CMCQ	MMLU	Average
AryaBhatta-GemmaUltra-8.5B	22.70	25.08	22.95	62.17	23.80	31.34
Airavata-7B	22.87	25.13	23.28	62.17	33.20	33.33
sarvam-1-2B	32.76	35.06	62.16	47.10	24.22	40.26
Llama-3-Nanda-10B-Chat	45.99	60.56	71.96	54.70	36.35	53.91
Nemotron-4-Mini-Hindi-4B-Instruct	50.68	63.72	68.74	51.30	37.18	54.32
Krutrim-2-12b-instruct	56.83	70.66	78.86	64.10	46.51	63.39
aya-expanse-8b	57.42	72.90	80.42	69.00	43.39	64.63
aya-expanse-32B	73.29	85.48	87.73	79.70	56.96	76.63
Mantra-14B	81.74	89.06	86.02	78.70	56.39	78.38

Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks

large-traversaal
/

Mantra-14B

Mantra-14B

Model Details:

Prompt Formats

Evaluation:

Model tree for large-traversaal/Mantra-14B

Dataset used to train large-traversaal/Mantra-14B

Spaces using large-traversaal/Mantra-14B 5

Collection including large-traversaal/Mantra-14B

Hindi LLMs

Evaluation results