NDMO Chroma DB - Fine-Tuned LLM
Model Description
This model has been fine-tuned on the NDMO Chroma DB Dataset, a collection of key documents related to data governance, privacy, and artificial intelligence (AI) regulations. The fine-tuning process enhances the model's ability to understand and generate responses related to these domains.
Developed by:
- Jyad Aljohani
- Abdulrahman Aljohani
- Ryan Alshehri
- Saud Altuwaijri
- Ziyad Alharthi
Model Type:
Causal Language Model (CAUSAL_LM)
Language(s):
English
License:
[Specify License]
Finetuned from model:
[meta-llama/Llama-2-7b-chat-hf]
Model Sources
Repository:
[More Information Needed]
Paper [optional]:
[More Information Needed]
Demo [optional]:
[More Information Needed]
Uses
Direct Use
This model is designed for:
- Answering questions on data governance, AI regulations, and privacy policies.
- Assisting compliance professionals with regulatory inquiries.
- Supporting AI policy research and development.
Downstream Use [optional]
- Chatbots and virtual assistants focused on AI and data privacy compliance.
- Automated document summarization for legal and regulatory documents.
- Integration into AI governance frameworks.
Out-of-Scope Use
- The model is not designed for providing legally binding advice.
- Not suitable for tasks requiring real-time regulatory updates.
Bias, Risks, and Limitations
- The model may reflect biases present in the training data.
- It may not generalize well to regulations not covered in the dataset.
- Users should verify outputs against official regulatory sources.
Recommendations
- Users should cross-check information with official legal sources.
- Outputs should be reviewed by regulatory professionals for critical applications.
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_name = "IJyad/llama-2-7b-NDMO-agent"
base_model = "meta-llama/Llama-2-7b-chat-hf"
# Load base model in 4-bit quantized mode
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config,
device_map="auto",
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, model_name)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
Training Details
Training Data
The model was fine-tuned using the NDMO Chroma DB Dataset, which consists of key regulatory documents, including:
- AI Principles
- Data Classification Policy
- Data Sharing Policy
- Implementing Regulations
- Personal Data Protection Guidelines
- Generative AI Public Guidelines
Training Procedure
Preprocessing
- Data was extracted, cleaned, and formatted into question-answer pairs.
- Documents were structured to maximize context retention.
Training Hyperparameters
- Epochs: 1
- Batch Size: 10
- Gradient Accumulation Steps: 1
- Learning Rate: 2e-4
- Optimizer: paged_adamw_8bit
- Scheduler: Linear decay with warmup steps
- Evaluation Strategy: Steps-based
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Held-out subset of the NDMO Chroma DB Dataset
Factors Considered
- Accuracy in responding to regulatory and AI policy-related queries.
- Coherence and relevance of generated text.
Metrics Used
- Perplexity: Measures fluency of the model.
- BLEU Score: Evaluates text generation quality.
- Human Evaluation: Subject matter experts assessed output correctness.
Results
- Perplexity Score: [More Information Needed]
- BLEU Score: [More Information Needed]
- Human Evaluation Accuracy: [More Information Needed]
Environmental Impact
The model was fine-tuned on cloud-based infrastructure.
Technical Specifications
Model Architecture and Objective
- Architecture: Transformer-based causal language model.
- Fine-Tuned Objective: Text generation and AI policy understanding.
Compute Infrastructure
- Software: Transformers, BitsAndBytes, PEFT, Hugging Face Trainer.
Model Card Authors
- Jyad Aljohani
Contact
- Email: [email protected]
- Hugging Face Profile: Ijyad
For further inquiries, feel free to reach out!
- Downloads last month
- 53
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.