|
--- |
|
library_name: peft |
|
license: apache-2.0 |
|
language: |
|
- mn |
|
- en |
|
tags: |
|
- Mongolian |
|
- QLora |
|
- Llama3 |
|
- Instructed-model |
|
--- |
|
|
|
### Model Description |
|
|
|
Mongolian-Llama3 implementation in Chat UI |
|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LC0xx4i9xqFmwn9l8T6vw25RIr-BP0Tq?usp=sharing]) |
|
|
|
Mongolian-Llama3 is the first open source instruction-tuned language model for Mongolian & English users with various abilities such as roleplaying & tool-using built upon the quantized Meta-Llama-3-8B model. |
|
|
|
Developed by: Dorjzodovsuren |
|
|
|
License: Llama-3 License |
|
|
|
Base Model: llama-3-8b-bnb-4bit |
|
|
|
Model Size: 4.65B |
|
|
|
Context length: 8K |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
To combat fake news, current strategies rely heavily on synthetic and translated data. However, these approaches have inherent biases, risks, and limitations: |
|
|
|
1. **Synthetic Data Bias**: Algorithms may inadvertently perpetuate biases present in training data. |
|
|
|
2. **Translation Inaccuracy**: Translations can distort meaning or lose context, leading to misinformation. |
|
|
|
3. **Cultural Nuances**: Synthetic and translated data may miss cultural intricacies, risking amplification of stereotypes. |
|
|
|
4. **Algorithmic Limits**: Effectiveness is constrained by algorithm capabilities and training data quality. |
|
|
|
5. **Dependency on Data**: Accuracy hinges on quality and representativeness of training data. |
|
|
|
6. **Adversarial Attacks**: Malicious actors can exploit vulnerabilities to manipulate content. |
|
|
|
7. **Different answer based on language**: Answer might be a bit different based on language. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. |
|
Due to hallucinations and pretraining datasets characteristics, some information might be misleading, and answer might be a bit different based on language. |
|
|
|
Please ask in <b>Mongolian</b> if possible. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
import torch |
|
import gradio as gr |
|
from threading import Thread |
|
from peft import PeftModel, PeftConfig |
|
from unsloth import FastLanguageModel |
|
from transformers import TextStreamer |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer |
|
|
|
config = PeftConfig.from_pretrained("Dorjzodovsuren/Mongolian_llama3") |
|
model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3-8b-bnb-4bit", torch_dtype = torch.float16) |
|
model = PeftModel.from_pretrained(model, "Dorjzodovsuren/Mongolian_llama3") |
|
|
|
#load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("Dorjzodovsuren/Mn_llama3") |
|
|
|
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
{} |
|
|
|
### Input: |
|
{} |
|
|
|
### Response: |
|
{}""" |
|
|
|
|
|
# Enable native 2x faster inference |
|
FastLanguageModel.for_inference(model) |
|
|
|
# Create a text streamer |
|
text_streamer = TextStreamer(tokenizer, skip_prompt=False,skip_special_tokens=True) |
|
|
|
# Get the device based on GPU availability |
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
|
|
# Move model into device |
|
model = model.to(device) |
|
|
|
class StopOnTokens(StoppingCriteria): |
|
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: |
|
stop_ids = [29, 0] |
|
for stop_id in stop_ids: |
|
if input_ids[0][-1] == stop_id: |
|
return True |
|
return False |
|
|
|
# Current implementation does not support conversation based on previous conversation. |
|
# Highly recommend to experiment on various hyper parameters to compare qualities. |
|
def predict(message, history): |
|
stop = StopOnTokens() |
|
messages = alpaca_prompt.format( |
|
message, |
|
"", |
|
"", |
|
) |
|
|
|
model_inputs = tokenizer([messages], return_tensors="pt").to(device) |
|
|
|
streamer = TextIteratorStreamer(tokenizer, timeout=10., skip_prompt=True, skip_special_tokens=True) |
|
generate_kwargs = dict( |
|
model_inputs, |
|
streamer=streamer, |
|
max_new_tokens=1024, |
|
top_p=0.95, |
|
temperature=0.001, |
|
repetition_penalty=1.1, |
|
stopping_criteria=StoppingCriteriaList([stop]) |
|
) |
|
t = Thread(target=model.generate, kwargs=generate_kwargs) |
|
t.start() |
|
|
|
partial_message = "" |
|
for new_token in streamer: |
|
if new_token != '<': |
|
partial_message += new_token |
|
yield partial_message |
|
|
|
gr.ChatInterface(predict).launch(debug=True, share=True, show_api=True) |
|
``` |
|
|
|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LC0xx4i9xqFmwn9l8T6vw25RIr-BP0Tq?usp=sharing]) |