leaderboard-pr-bot's picture
Adding Evaluation Results
65922ec verified
|
raw
history blame
6.2 kB
metadata
language:
  - es
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
inference: false
model-index:
  - name: Llama-2-ft-instruct-es
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 22.7
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=clibrain/Llama-2-ft-instruct-es
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 25.04
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=clibrain/Llama-2-ft-instruct-es
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 23.12
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=clibrain/Llama-2-ft-instruct-es
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 0
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=clibrain/Llama-2-ft-instruct-es
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 49.57
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=clibrain/Llama-2-ft-instruct-es
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=clibrain/Llama-2-ft-instruct-es
          name: Open LLM Leaderboard

Llama-2-ft-instruct-es

鈿狅笍 Please go to clibrain/Llama-2-7b-ft-instruct-es for the fixed and updated version.

Llama 2 (7B) fine-tuned on Clibrain's Spanish instructions dataset.

Model Details

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B pretrained model. Links to other models can be found in the index at the bottom.

Example of Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoTokenizer, GenerationConfig

model_id = "clibrain/Llama-2-ft-instruct-es"

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_id)

def create_instruction(instruction, input_data=None, context=None):
    sections = {
        "Instrucci贸n": instruction,
        "Entrada": input_data,
        "Contexto": context,
    }

    system_prompt = "A continuaci贸n hay una instrucci贸n que describe una tarea, junto con una entrada que proporciona m谩s contexto. Escriba una respuesta que complete adecuadamente la solicitud.\n\n"
    prompt = system_prompt

    for title, content in sections.items():
        if content is not None:
            prompt += f"### {title}:\n{content}\n\n"

    prompt += "### Respuesta:\n"

    return prompt


def generate(
        instruction,
        input=None,
        context=None,
        max_new_tokens=128,
        temperature=0.1,
        top_p=0.75,
        top_k=40,
        num_beams=4,
        **kwargs
):
    
    prompt = create_instruction(instruction, input, context)
    print(prompt.replace("### Respuesta:\n", ""))
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].to("cuda")
    attention_mask = inputs["attention_mask"].to("cuda")
    generation_config = GenerationConfig(
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        num_beams=num_beams,
        **kwargs,
    )
    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=max_new_tokens,
            early_stopping=True
        )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    return output.split("### Respuesta:")[1].lstrip("\n")

instruction = "Dame una lista de lugares a visitar en Espa帽a."
print(generate(instruction))

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 20.07
AI2 Reasoning Challenge (25-Shot) 22.70
HellaSwag (10-Shot) 25.04
MMLU (5-Shot) 23.12
TruthfulQA (0-shot) 0.00
Winogrande (5-shot) 49.57
GSM8k (5-shot) 0.00