|
--- |
|
language: |
|
- pt |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- mistral |
|
- trl |
|
base_model: unsloth/mistral-7b-bnb-4bit |
|
datasets: |
|
- cnmoro/GPT4-500k-Augmented-PTBR-Clean |
|
widget: |
|
- text: Me conte a história do Boto |
|
model-index: |
|
- name: boto-7B |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: ENEM Challenge (No Images) |
|
type: eduagarcia/enem_challenge |
|
split: train |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc |
|
value: 59.97 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BLUEX (No Images) |
|
type: eduagarcia-temp/BLUEX_without_images |
|
split: train |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc |
|
value: 48.82 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: OAB Exams |
|
type: eduagarcia/oab_exams |
|
split: train |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc |
|
value: 43.37 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Assin2 RTE |
|
type: assin2 |
|
split: test |
|
args: |
|
num_few_shot: 15 |
|
metrics: |
|
- type: f1_macro |
|
value: 89.58 |
|
name: f1-macro |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Assin2 STS |
|
type: eduagarcia/portuguese_benchmark |
|
split: test |
|
args: |
|
num_few_shot: 15 |
|
metrics: |
|
- type: pearson |
|
value: 69.87 |
|
name: pearson |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: FaQuAD NLI |
|
type: ruanchaves/faquad-nli |
|
split: test |
|
args: |
|
num_few_shot: 15 |
|
metrics: |
|
- type: f1_macro |
|
value: 55.57 |
|
name: f1-macro |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HateBR Binary |
|
type: ruanchaves/hatebr |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: f1_macro |
|
value: 77.04 |
|
name: f1-macro |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: PT Hate Speech Binary |
|
type: hate_speech_portuguese |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: f1_macro |
|
value: 58.84 |
|
name: f1-macro |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: tweetSentBR |
|
type: eduagarcia-temp/tweetsentbr |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: f1_macro |
|
value: 57.2 |
|
name: f1-macro |
|
source: |
|
url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B |
|
name: Open Portuguese LLM Leaderboard |
|
--- |
|
|
|
# Boto 7B |
|
|
|
<img src="https://i.imgur.com/xscFgqH.png" alt="logo do boto cor-de-rosa" width="300px" /> |
|
|
|
Boto é um fine-tuning do Mistral 7B para língua portuguesa. O Boto é bem "falante", as respostas tendem a ser longas e nem sempre objetivas por padrão. |
|
|
|
Acesse a [demonstração online](https://huggingface.co/spaces/lucianosb/boto-7B) disponível. E cante junto: |
|
|
|
[![Foi Boto Sinhá](https://markdown-videos-api.jorgenkh.no/url?url=https%3A%2F%2Fyoutu.be%2FxSyuWFvI9_8%3Fsi%3DSzIMawwQ6sF_xhZK)](https://youtu.be/xSyuWFvI9_8?si=SzIMawwQ6sF_xhZK) |
|
|
|
Boto é um nome dado a vários tipos de golfinhos e botos nativos do Amazonas e dos afluentes do rio Orinoco. Alguns botos existem exclusivamente em água doce, e estes são frequentemente considerados golfinhos primitivos. |
|
|
|
O “boto” das regiões do rio Amazonas no norte do Brasil é descrito de acordo com o folclore local como assumindo a forma de um humano, também conhecido como Boto cor-de-rosa, e com o hábito de seduzir mulheres humanas e engravidá-las. |
|
|
|
Métricas de avaliação em andamento... |
|
|
|
## English description |
|
|
|
Boto is a fine-tuning of Mistral 7B for portuguese language. Responses tend to be verbose. |
|
|
|
Try the [demo](https://huggingface.co/spaces/lucianosb/boto-7B). |
|
|
|
Boto is a Portuguese name given to several types of dolphins and river dolphins native to the Amazon and the Orinoco River tributaries. A few botos exist exclusively in fresh water, and these are often considered primitive dolphins. |
|
|
|
The "boto" of the Amazon River regions of northern Brazil are described according to local lore as taking the form of a human or merman, also known as Boto cor-de-rosa ("Pink Boto" in Portuguese) and with the habit of seducing human women and impregnating them. |
|
|
|
## How to Run on Colab T4 |
|
|
|
```python |
|
from transformers import AutoTokenizer, pipeline |
|
import torch |
|
|
|
model_id = "lucianosb/boto-7B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
pipe = pipeline( |
|
"text-generation", |
|
model=model_id, |
|
torch_dtype=torch.float16, |
|
device_map="cuda:0" |
|
) |
|
|
|
def make_prompt(question): |
|
return f"""Abaixo está uma instrução que descreve uma tarefa, combinada com uma entrada que fornece contexto adicional. |
|
Escreva uma resposta que complete adequadamente a solicitação. |
|
|
|
### Instruction: |
|
{question} |
|
|
|
### Response: |
|
""" |
|
|
|
question = "Conte a história do boto" |
|
|
|
prompt = make_prompt(question) |
|
sequences = pipe( |
|
prompt, |
|
do_sample=True, |
|
num_return_sequences=1, |
|
eos_token_id=tokenizer.eos_token_id, |
|
max_length=2048, |
|
temperature=0.9, |
|
top_p=0.6, |
|
repetition_penalty=1.15 |
|
) |
|
|
|
print(sequences[0]["generated_text"]) |
|
``` |
|
|
|
## Métricas |
|
|
|
| Tasks |Version| Filter |n-shot|Metric|Value | |Stderr| |
|
|---------|------:|-----------------------|-----:|------|-----:|---|-----:| |
|
|bluex | 1.1|all | 3|acc |0.0083|± |0.0020| |
|
|enem | 1.1|all | 3|acc |0.0014|± |0.0006| |
|
|oab_exams | 1.5|all | 3|acc |0.0096|± |0.0012| |
|
|assin2_rte| 1.1|all | 15|f1_macro|0.9032|± |0.0042| |
|
| | |all | 15|acc |0.9032|± |0.0042| |
|
|assin2_sts| 1.1|all | 15|pearson |0.4912|± |0.0141| |
|
| | |all | 15|mse |1.3185|± |N/A | |
|
|faquad_nli| 1.1|all | 15|f1_macro|0.6104|± |0.0137| |
|
| | |all | 15|acc |0.6292|± |0.0134| |
|
|hatebr_offensive_binary | 1|all | 25|f1_macro|0.7888|± |0.0078| |
|
| | |all | 25|acc |0.7936|± |0.0077| |
|
|portuguese_hate_speech_binary| 1|all | 25|f1_macro|0.5503|± |0.0121| |
|
| | |all | 25|acc |0.5523|± |0.0121| |
|
|
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** lucianosb |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** unsloth/mistral-7b-bnb-4bit |
|
|
|
This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
|
# [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/lucianosb/boto-7B) |
|
|
|
| Metric | Value | |
|
|--------------------------|---------| |
|
|Average |**62.25**| |
|
|ENEM Challenge (No Images)| 59.97| |
|
|BLUEX (No Images) | 48.82| |
|
|OAB Exams | 43.37| |
|
|Assin2 RTE | 89.58| |
|
|Assin2 STS | 69.87| |
|
|FaQuAD NLI | 55.57| |
|
|HateBR Binary | 77.04| |
|
|PT Hate Speech Binary | 58.84| |
|
|tweetSentBR | 57.20| |
|
|
|
|