--- language: - zh - en license: apache-2.0 datasets: - Azure99/blossom-chat-v3 - Azure99/blossom-math-v4 - Azure99/blossom-wizard-v3 - Azure99/blossom-orca-v3 model-index: - name: blossom-v5.1-34b results: - task: type: text-generation name: Text Generation dataset: name: ENEM Challenge (No Images) type: eduagarcia/enem_challenge split: train args: num_few_shot: 3 metrics: - type: acc value: 72.64 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BLUEX (No Images) type: eduagarcia-temp/BLUEX_without_images split: train args: num_few_shot: 3 metrics: - type: acc value: 67.18 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: OAB Exams type: eduagarcia/oab_exams split: train args: num_few_shot: 3 metrics: - type: acc value: 54.44 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 RTE type: assin2 split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 90.88 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 STS type: eduagarcia/portuguese_benchmark split: test args: num_few_shot: 15 metrics: - type: pearson value: 82.94 name: pearson source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: FaQuAD NLI type: ruanchaves/faquad-nli split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 81.88 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HateBR Binary type: ruanchaves/hatebr split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 85.2 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: PT Hate Speech Binary type: hate_speech_portuguese split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 72.09 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: tweetSentBR type: eduagarcia/tweetsentbr_fewshot split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 72.13 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Azure99/blossom-v5.1-34b name: Open Portuguese LLM Leaderboard --- # **BLOSSOM-v5.1-34b** [💻Github](https://github.com/Azure99/BlossomLM) • [🚀Blossom Chat Demo](https://blossom-chat.com/) ### Introduction Blossom is a conversational large language model, fine-tuned on the Blossom Orca/Wizard/Chat/Math mixed dataset based on the Yi-1.5-34B pre-trained model. Blossom possesses robust general capabilities and context comprehension. Additionally, the high-quality Chinese and English datasets used for training have been made open source. Training was conducted in two stages. The first stage used 40K Wizard, 40K Orca, 10K Math single-turn instruction datasets, training for 1 epoch; the second stage used 10K Blossom chat multi-turn dialogue dataset, and 10% randomly sampled data from the first stage, training for 3 epochs. ### Inference Inference is performed in the form of dialogue continuation. Single-turn dialogue ``` A chat between a human and an artificial intelligence bot. The bot gives helpful, detailed, and polite answers to the human's questions. |Human|: hello |Bot|: ``` Multi-turn dialogue ``` A chat between a human and an artificial intelligence bot. The bot gives helpful, detailed, and polite answers to the human's questions. |Human|: hello |Bot|: Hello! How can I assist you today?<|endoftext|> |Human|: Generate a random number using python |Bot|: ``` Note: At the end of the Bot's output in the historical conversation, append a `<|endoftext|>`. # Open Portuguese LLM Leaderboard Evaluation Results Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Azure99/blossom-v5.1-34b) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) | Metric | Value | |--------------------------|---------| |Average |**75.49**| |ENEM Challenge (No Images)| 72.64| |BLUEX (No Images) | 67.18| |OAB Exams | 54.44| |Assin2 RTE | 90.88| |Assin2 STS | 82.94| |FaQuAD NLI | 81.88| |HateBR Binary | 85.20| |PT Hate Speech Binary | 72.09| |tweetSentBR | 72.13|