---
license: apache-2.0
tags:
- ai-image-classifiction
- catboost
- xgboost
- auraface
- cnn
---
# AI Image Classification Model

This repository contains two trained classifiers, **XGBoost** and **CatBoost**, for AI image classification. These models are trained to distinguish between AI-generated and real human faces using embeddings extracted from the **AuraFace** model.

## Model Overview

- **AuraFace**: Used for extracting face embeddings from input images.
- **CatBoost & XGBoost**: Trained classifiers to predict if an image is AI-generated or real.
- **Dataset**: Trained using the [Real vs AI Generated Faces Dataset](https://www.kaggle.com/datasets/philosopher0808/real-vs-ai-generated-faces-dataset).
- **Preferred Model**: While both classifiers yield similar results, **CatBoost** is the preferred model.

## Pipeline

1. An image is passed to **AuraFace** to extract a 512-dimensional face embedding.
2. The embedding is converted into a pandas DataFrame.
3. The trained classifier (CatBoost/XGBoost) is used to make predictions.

## Model Usage

### Dependencies

```bash
pip install opencv-python catboost xgboost pandas numpy pillow huggingface_hub
```

### Loading AuraFace

```python
from huggingface_hub import snapshot_download
from insightface.app import FaceAnalysis
import numpy as np
import cv2

# Download AuraFace model
snapshot_download(
    "fal/AuraFace-v1",
    local_dir="models/auraface",
)

# Initialize AuraFace
face_app = FaceAnalysis(
    name="auraface",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
    root="."
)
face_app.prepare(ctx_id=0, det_size=(640, 640))
```

### Loading CatBoost Model

```python
from catboost import CatBoostClassifier

# Load trained CatBoost model
ai_image_classifier = CatBoostClassifier()
ai_image_classifier.load_model('models/ai_image_classifier/cat_classifier.cbm')
```

### Classifying an Image

```python
def classify_image(image_path):
    # Load image
    img = Image.open(image_path).convert("RGB")
    img_array = np.array(img)[:, :, ::-1]  # Convert to BGR for processing
    
    # Detect faces and extract embedding
    faces = face_app.get(img_array)
    if not faces:
        return "No face detected."
    
    embedding = faces[0].normed_embedding
    
    # Convert embedding to DataFrame
    feature_columns = [f'feature_{i}' for i in range(512)]
    embedding_df = pd.DataFrame([embedding], columns=feature_columns)
    
    # Predict class
    prediction = ai_image_classifier.predict(embedding_df)[0]
    return "AI-generated" if prediction == 1 else "Real Face"

# Example Usage
image_path = "path/to/image.jpg"
result = classify_image(image_path)
print(f"Classification: {result}")
```

### Using XGBoost

XGBoost follows the same process. To use XGBoost instead, replace the `CatBoostClassifier` loading step with:

```python
from xgboost import XGBClassifier

# Load trained XGBoost model
ai_image_classifier = XGBClassifier()
ai_image_classifier.load_model('models/ai_image_classifier/xgb_classifier.json')
```

## Acknowledgments

- **[AuraFace-v1](https://huggingface.co/fal/AuraFace-v1)** for face embeddings.
- **[Real vs AI Generated Faces Dataset](https://www.kaggle.com/datasets/philosopher0808/real-vs-ai-generated-faces-dataset)** for training data.