AI Image Classification Model

This repository contains two trained classifiers, XGBoost and CatBoost, for AI image classification. These models are trained to distinguish between AI-generated and real human faces using embeddings extracted from the AuraFace model.

Model Overview

AuraFace: Used for extracting face embeddings from input images.
CatBoost & XGBoost: Trained classifiers to predict if an image is AI-generated or real.
Dataset: Trained using the Real vs AI Generated Faces Dataset.
Preferred Model: While both classifiers yield similar results, CatBoost is the preferred model.

Pipeline

An image is passed to AuraFace to extract a 512-dimensional face embedding.
The embedding is converted into a pandas DataFrame.
The trained classifier (CatBoost/XGBoost) is used to make predictions.

Model Usage

Dependencies

pip install opencv-python catboost xgboost pandas numpy pillow huggingface_hub

Loading AuraFace

from huggingface_hub import snapshot_download
from insightface.app import FaceAnalysis
import numpy as np
import cv2

# Download AuraFace model
snapshot_download(
    "fal/AuraFace-v1",
    local_dir="models/auraface",
)

# Initialize AuraFace
face_app = FaceAnalysis(
    name="auraface",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
    root="."
)
face_app.prepare(ctx_id=0, det_size=(640, 640))

Loading CatBoost Model

from catboost import CatBoostClassifier

# Load trained CatBoost model
ai_image_classifier = CatBoostClassifier()
ai_image_classifier.load_model('models/ai_image_classifier/cat_classifier.cbm')

Classifying an Image

def classify_image(image_path):
    # Load image
    img = Image.open(image_path).convert("RGB")
    img_array = np.array(img)[:, :, ::-1]  # Convert to BGR for processing
    
    # Detect faces and extract embedding
    faces = face_app.get(img_array)
    if not faces:
        return "No face detected."
    
    embedding = faces[0].normed_embedding
    
    # Convert embedding to DataFrame
    feature_columns = [f'feature_{i}' for i in range(512)]
    embedding_df = pd.DataFrame([embedding], columns=feature_columns)
    
    # Predict class
    prediction = ai_image_classifier.predict(embedding_df)[0]
    return "AI-generated" if prediction == 1 else "Real Face"

# Example Usage
image_path = "path/to/image.jpg"
result = classify_image(image_path)
print(f"Classification: {result}")

Using XGBoost

XGBoost follows the same process. To use XGBoost instead, replace the CatBoostClassifier loading step with:

from xgboost import XGBClassifier

# Load trained XGBoost model
ai_image_classifier = XGBClassifier()
ai_image_classifier.load_model('models/ai_image_classifier/xgb_classifier.json')

Acknowledgments

AuraFace-v1 for face embeddings.
Real vs AI Generated Faces Dataset for training data.