---
license: apache-2.0
datasets:
- uonlp/CulturaX
- yahma/alpaca-cleaned
- Open-Orca/OpenOrca
language:
- gu
- en
pipeline_tag: text-generation
---
# Gujju-Llama 7B Base v0.1

Unveiling the debut of Gujju-Llama 7B Base model, offering researchers and developers a foundational resource for advancing Gujarati NLP. This causally trained model, built upon the LLaMA-2 7B and enriched with a nuanced Gujarati vocabulary, facilitates immediate inference tasks while enabling fine-tuning for Instruction-tuned models. Let's delve into its capabilities and unlock new possibilities for Gujarati language understanding.

## Model Details

### Model Description

- **Model type:** Llama-2 7B parameter model pretrained on CulturaX Gujarati Subset.
- **Language(s) (NLP):** Gujarati, English
- **Source Model** meta-llama/Llama-2-7b-hf
- **Training Precision** float16
- **License:** GNU General Public License v3.0

## Usage Note

These models possess impressive linguistic skills, but it's important to remember they haven't been specifically optimized to avoid potentially harmful or offensive content. To mitigate this risk, we advise users to:

- **Exercise discretion**: Carefully consider potential implications before utilizing outputs.
- **Supervise closely**: Monitor outputs, especially in public or sensitive settings.
- **Be aware of limitations**: Remember these models are under development and may not generate perfect results in all situations.

## Meet the researchers

- [**Khyat Anjaria**](https://www.linkedin.com/in/khyat-anjaria-939693148/)
- [**Dhruv Bhatnagar**](https://www.linkedin.com/in/dhruv-bhatnagar-405684b2/)
- [**Dixit Trivedi**](https://www.linkedin.com/in/dixit-trivedi/)

This model is your gateway to unlocking the potential of Gujarati language! Let's join forces to push the boundaries of comprehension and expression together!