Model Description

Overview

This model is a fine-tuned version of Llama 3.1, specifically tailored for question answering tasks. Utilizing the unsloth library, the model has been trained on a custom dataset formatted in the Alpaca prompt style. It is designed to generate accurate answers along with explanations based on user queries.

Architecture

  • Base Model: unsloth/Meta-Llama-3.1-8B
  • Model Size: 8 Billion parameters
  • Architecture Type: Transformer-based Language Model
  • Modifications: Fine-tuned on a custom dataset using unsloth with 4-bit quantization for efficient training.

Hyperparameters

  • Maximum Sequence Length: 512 tokens
  • Batch Size: 4 (per device)
  • Gradient Accumulation Steps: 4
  • Learning Rate: 2e-4
  • Optimizer: adamw_8bit
  • Weight Decay: 0.01
  • Learning Rate Scheduler: Linear
  • Number of Epochs: 1
  • Warmup Steps: 5
  • Max Training Steps: 60
  • Seed: 3407
  • Mixed Precision Training:
    • FP16: Enabled if BF16 is not supported
    • BF16: Enabled if supported by the hardware

Intended Use

Primary Use Cases

  • Question Answering: The model is intended to answer user queries and provide explanations based on the provided context in the dataset.
  • Educational Tools: Can be used in applications that require answering questions with additional explanations.

Users

  • Developers: Integrating the model into applications requiring question-answering capabilities.
  • Researchers: Studying fine-tuning techniques on large language models.

Out-of-Scope Uses

  • Undefined Domains: The model may not perform well on queries outside the scope of the training data.
  • Sensitive Content: Should not be used for generating content that includes disallowed or harmful information.

    Ethical Considerations

    Potential Risks

    • Misinformation: The model might generate incorrect or misleading answers if the input is ambiguous or out-of-scope.
    • Bias: Without a bias analysis, there is a risk of the model exhibiting unintended biases present in the training data.

    Mitigation Strategies

    • User Review: Outputs should be reviewed by a human for critical applications.
    • Further Evaluation: Recommend conducting bias and fairness assessments before deployment.

    Training and Evaluation Environment

    • Hardware Used: "Trained on a single NVIDIA Tesla T4 GPU"
    • Software and Libraries:
      • Python Version: Python 3.8
      • Transformers Library: Transformers 4.8
      • Unsloth Library: Version used as per the code snippet
      • TRL (Transformers Reinforcement Learning): Used for SFTTrainer
      • Pandas: For data handling
    • Training Time: 4:00:00

    Usage Instructions

    Installation

    1. Clone the Repository: [If applicable]
    2. Install Dependencies:
      pip install unsloth transformers trl pandas torch
      

    Loading the Model

    from unsloth import FastLanguageModel
    import torch
    
    max_seq_length = 2048
    dtype = None
    load_in_4bit = True
    
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="unsloth/Meta-Llama-3.1-8B",
        max_seq_length=max_seq_length,
        dtype=dtype,
        load_in_4bit=load_in_4bit,
        device_map="auto",
    )
    

    Input Format

    • Expected Input: A user query formatted as per the Alpaca prompt template.
    • Example:
      Below is an instruction that describes a task, paired with an appropriate response.
      
      ## Instruction:
      User Query: How do block credit card?
      
      ### Input:
      None
      
      ### Response:
      Answer:
      

    Output Format

    • The model generates the answer and explanation following the prompt.
    • Example Output:
      Answer: Paris
      
      Explanation: Paris is the capital city of France.
      

    Inference Example

    input_text = '''Below is an instruction that describes a task, paired with an appropriate response.
    
    ## Instruction:
    User Query: How do block credit card?
    
    ### Input:
    None
    
    ### Response:
    Answer:'''
    
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=50)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(response)
    

    Contact Information

    • Support Email: [email protected]
    • GitHub Repository: To be updated
    • Feedback: Users are encouraged to report issues or provide feedback.

    Acknowledgments

    • Base Model: This model is built upon unsloth/Meta-Llama-3.1-8B.
    • Libraries Used: Thanks to the developers of Unsloth, Transformers, TRL, and other libraries that made this work possible.

    Changelog

    • Version 1.0: Initial release with fine-tuning on custom question-answering dataset.
Downloads last month
35
GGUF
Model size
8.03B params
Architecture
llama

16-bit

Inference API
Unable to determine this model's library. Check the docs .

Model tree for sethanimesh/Meta-Llama-3.1-8B-Banking-GGUF

Quantized
(178)
this model