Adding the Open Portuguese LLM Leaderboard Evaluation Results

76eaa5f verified 9 months ago

6.2 kB

	---
	language:
	- pt
	datasets:
	- adalbertojunior/dolphin_pt_test
	model-index:
	- name: Llama-3-8B-Instruct-Portuguese-v0.2-fft
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 59.69
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 44.37
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 39.09
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 91.54
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 77.89
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 68.51
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 82.27
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 63.01
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 67.48
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft
	name: Open Portuguese LLM Leaderboard
	---
	## Como Utilizar
	```
	import transformers
	import torch

	model_id = "adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft"

	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device="auto",
	)

	messages = [
	{"role": "system", "content": "Você é um robô pirata que sempre responde como um pirata deveria!"},
	{"role": "user", "content": "Quem é você?"},
	]

	prompt = pipeline.tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	terminators = [
	pipeline.tokenizer.eos_token_id,
	pipeline.tokenizer.convert_tokens_to_ids("<\|im_end\|>")
	]

	outputs = pipeline(
	prompt,
	max_new_tokens=256,
	eos_token_id=terminators,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	print(outputs[0]["generated_text"][len(prompt):])
	```
	### Formato do prompt
	```
	<\|im_start\|>system
	Você é um assistente útil com respostas curtas.<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	```


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/adalbertojunior/Llama-3-8B-Instruct-Portuguese-v0.2-fft) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|65.98\|
	\|ENEM Challenge (No Images)\| 59.69\|
	\|BLUEX (No Images) \| 44.37\|
	\|OAB Exams \| 39.09\|
	\|Assin2 RTE \| 91.54\|
	\|Assin2 STS \| 77.89\|
	\|FaQuAD NLI \| 68.51\|
	\|HateBR Binary \| 82.27\|
	\|PT Hate Speech Binary \| 63.01\|
	\|tweetSentBR \| 67.48\|