NDMO Chroma DB - Fine-Tuned LLM

Model Description

This model has been fine-tuned on the NDMO Chroma DB Dataset, a collection of key documents related to data governance, privacy, and artificial intelligence (AI) regulations. The fine-tuning process enhances the model's ability to understand and generate responses related to these domains.

Developed by:

Jyad Aljohani
Abdulrahman Aljohani
Ryan Alshehri
Saud Altuwaijri
Ziyad Alharthi

Model Type:

Causal Language Model (CAUSAL_LM)

Language(s):

English

License:

[Specify License]

Finetuned from model:

[meta-llama/Llama-2-7b-chat-hf]

Model Sources

Repository:

[More Information Needed]

Paper [optional]:

[More Information Needed]

Demo [optional]:

[More Information Needed]

Uses

Direct Use

This model is designed for:

Answering questions on data governance, AI regulations, and privacy policies.
Assisting compliance professionals with regulatory inquiries.
Supporting AI policy research and development.

Downstream Use [optional]

Chatbots and virtual assistants focused on AI and data privacy compliance.
Automated document summarization for legal and regulatory documents.
Integration into AI governance frameworks.

Out-of-Scope Use

The model is not designed for providing legally binding advice.
Not suitable for tasks requiring real-time regulatory updates.

Bias, Risks, and Limitations

The model may reflect biases present in the training data.
It may not generalize well to regulations not covered in the dataset.
Users should verify outputs against official regulatory sources.

Recommendations

Users should cross-check information with official legal sources.
Outputs should be reviewed by regulatory professionals for critical applications.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

model_name = "IJyad/llama-2-7b-NDMO-agent"
base_model = "meta-llama/Llama-2-7b-chat-hf"

# Load base model in 4-bit quantized mode
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, model_name)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

Training Details

Training Data

The model was fine-tuned using the NDMO Chroma DB Dataset, which consists of key regulatory documents, including:

AI Principles
Data Classification Policy
Data Sharing Policy
Implementing Regulations
Personal Data Protection Guidelines
Generative AI Public Guidelines

NDMO Chroma DB Dataset

Training Procedure

Preprocessing

Data was extracted, cleaned, and formatted into question-answer pairs.
Documents were structured to maximize context retention.

Training Hyperparameters

Epochs: 1
Batch Size: 10
Gradient Accumulation Steps: 1
Learning Rate: 2e-4
Optimizer: paged_adamw_8bit
Scheduler: Linear decay with warmup steps
Evaluation Strategy: Steps-based

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out subset of the NDMO Chroma DB Dataset

Factors Considered

Accuracy in responding to regulatory and AI policy-related queries.
Coherence and relevance of generated text.

Metrics Used

Perplexity: Measures fluency of the model.
BLEU Score: Evaluates text generation quality.
Human Evaluation: Subject matter experts assessed output correctness.

Results

Perplexity Score: [More Information Needed]
BLEU Score: [More Information Needed]
Human Evaluation Accuracy: [More Information Needed]

Environmental Impact

The model was fine-tuned on cloud-based infrastructure.

Technical Specifications

Model Architecture and Objective

Architecture: Transformer-based causal language model.
Fine-Tuned Objective: Text generation and AI policy understanding.

Compute Infrastructure

Software: Transformers, BitsAndBytes, PEFT, Hugging Face Trainer.

Model Card Authors

Jyad Aljohani

Contact

Email: [email protected]
Hugging Face Profile: Ijyad

For further inquiries, feel free to reach out!

IJyad
/

llama-2-7b-NDMO-agent

NDMO Chroma DB - Fine-Tuned LLM

Model Description

Developed by:

Model Type:

Language(s):

License:

Finetuned from model:

Model Sources

Repository:

Paper [optional]:

Demo [optional]:

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors Considered

Metrics Used

Results

Environmental Impact

The model was fine-tuned on cloud-based infrastructure.

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Model Card Authors

Contact

Model tree for IJyad/llama-2-7b-NDMO-agent