NDMO Chroma DB - Fine-Tuned LLM

Model Description

This model has been fine-tuned on the NDMO Chroma DB Dataset, a collection of key documents related to data governance, privacy, and artificial intelligence (AI) regulations. The fine-tuning process enhances the model's ability to understand and generate responses related to these domains.

Developed by:

  • Jyad Aljohani
  • Abdulrahman Aljohani
  • Ryan Alshehri
  • Saud Altuwaijri
  • Ziyad Alharthi

Model Type:

Causal Language Model (CAUSAL_LM)

Language(s):

English

License:

[Specify License]

Finetuned from model:

[meta-llama/Llama-2-7b-chat-hf]


Model Sources

Repository:

[More Information Needed]

Paper [optional]:

[More Information Needed]

Demo [optional]:

[More Information Needed]


Uses

Direct Use

This model is designed for:

  • Answering questions on data governance, AI regulations, and privacy policies.
  • Assisting compliance professionals with regulatory inquiries.
  • Supporting AI policy research and development.

Downstream Use [optional]

  • Chatbots and virtual assistants focused on AI and data privacy compliance.
  • Automated document summarization for legal and regulatory documents.
  • Integration into AI governance frameworks.

Out-of-Scope Use

  • The model is not designed for providing legally binding advice.
  • Not suitable for tasks requiring real-time regulatory updates.

Bias, Risks, and Limitations

  • The model may reflect biases present in the training data.
  • It may not generalize well to regulations not covered in the dataset.
  • Users should verify outputs against official regulatory sources.

Recommendations

  • Users should cross-check information with official legal sources.
  • Outputs should be reviewed by regulatory professionals for critical applications.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

model_name = "IJyad/llama-2-7b-NDMO-agent"
base_model = "meta-llama/Llama-2-7b-chat-hf"

# Load base model in 4-bit quantized mode
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, model_name)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

Training Details

Training Data

The model was fine-tuned using the NDMO Chroma DB Dataset, which consists of key regulatory documents, including:

  • AI Principles
  • Data Classification Policy
  • Data Sharing Policy
  • Implementing Regulations
  • Personal Data Protection Guidelines
  • Generative AI Public Guidelines

NDMO Chroma DB Dataset

Training Procedure

Preprocessing

  • Data was extracted, cleaned, and formatted into question-answer pairs.
  • Documents were structured to maximize context retention.

Training Hyperparameters

  • Epochs: 1
  • Batch Size: 10
  • Gradient Accumulation Steps: 1
  • Learning Rate: 2e-4
  • Optimizer: paged_adamw_8bit
  • Scheduler: Linear decay with warmup steps
  • Evaluation Strategy: Steps-based

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Held-out subset of the NDMO Chroma DB Dataset

Factors Considered

  • Accuracy in responding to regulatory and AI policy-related queries.
  • Coherence and relevance of generated text.

Metrics Used

  • Perplexity: Measures fluency of the model.
  • BLEU Score: Evaluates text generation quality.
  • Human Evaluation: Subject matter experts assessed output correctness.

Results

  • Perplexity Score: [More Information Needed]
  • BLEU Score: [More Information Needed]
  • Human Evaluation Accuracy: [More Information Needed]

Environmental Impact

The model was fine-tuned on cloud-based infrastructure.

Technical Specifications

Model Architecture and Objective

  • Architecture: Transformer-based causal language model.
  • Fine-Tuned Objective: Text generation and AI policy understanding.

Compute Infrastructure

  • Software: Transformers, BitsAndBytes, PEFT, Hugging Face Trainer.

Model Card Authors

  • Jyad Aljohani

Contact

For further inquiries, feel free to reach out!

Downloads last month
53
Safetensors
Model size
6.74B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for IJyad/llama-2-7b-NDMO-agent

Finetuned
(429)
this model
Quantizations
1 model