|
--- |
|
license: apache-2.0 |
|
--- |
|
Here’s a draft for a model card that you can use for Hugging Face, detailing the purpose, training data, architecture, and intended use of your recommendation model: |
|
|
|
--- |
|
|
|
# Model Card: Profile-Based Movie Recommendation Model |
|
|
|
## Model Overview |
|
This model is a **profile-based movie recommendation system** designed to recommend movies based on user demographics and genre preferences. It was trained on the [MovieLens 1M dataset](http://files.grouplens.org/datasets/movielens/ml-1m.zip) and uses demographic and genre preferences to create user profiles through clustering. By leveraging user profiles and movie embeddings, the model provides movie recommendations tailored to each user’s interests. |
|
|
|
## Model Architecture |
|
The model is built using **TensorFlow** and **Keras** and employs an **embedding-based architecture**: |
|
1. **User Profiles and Clustering**: User demographics and genre preferences are clustered into a specified number of profiles using **KMeans** clustering. This results in profile IDs that capture user similarities based on age, occupation, gender, and preferred movie genres. |
|
2. **Embedding Layers**: |
|
- The **user profile IDs** are embedded in a lower-dimensional space using a trainable embedding layer. |
|
- Similarly, **movie IDs** are embedded into a separate lower-dimensional space. |
|
3. **Dot Product for Recommendation**: The model computes the dot product between the profile embedding and movie embedding, resulting in a similarity score. The higher the score, the more relevant the movie is predicted to be for the user profile. |
|
|
|
## Training Dataset |
|
The model was trained on the [MovieLens 1M dataset](http://files.grouplens.org/datasets/movielens/ml-1m.zip) by GroupLens. The dataset contains **1 million ratings** from **6,040 users** on **3,900 movies**. |
|
|
|
- **Users**: Contains demographic information such as age, gender, and occupation. |
|
- **Ratings**: Provides ratings from users for different movies. |
|
- **Movies**: Includes movie titles and genres (e.g., Action, Comedy, Romance). |
|
|
|
### Dataset Preparation |
|
- **Preprocessing**: |
|
- User demographic data was one-hot encoded to include age, occupation, and gender. |
|
- User genre preferences were extracted by identifying each user's top-rated genres, with genres being split and exploded for individual assignment. |
|
- **Clustering**: User profiles were clustered into 10 groups using KMeans clustering based on demographic and genre features. |
|
- **Embedding Preparation**: Profile IDs and Movie IDs were prepared for embedding layers. |
|
|
|
## Training Configuration |
|
- **Optimizer**: Adam |
|
- **Loss Function**: Mean Squared Error (MSE) |
|
- **Metric**: Mean Absolute Error (MAE) |
|
- **Epochs**: 10 |
|
- **Batch Size**: 256 |
|
- **Embedding Dimension**: 64 |
|
|
|
## Intended Use |
|
This model is intended to provide **movie recommendations** based on user profile clusters. By embedding user profiles and movies into a shared space, it provides recommendations by finding the best matching movies for a particular user profile. |
|
|
|
### Use Cases |
|
- **Personalized Movie Recommendations**: For streaming platforms, this model can serve as the core recommendation engine for suggesting movies tailored to user preferences based on demographics and past high-rated genres. |
|
- **User Segmentation**: The model clusters users based on demographic and genre preferences, which can also be used for analysis and targeted advertising. |
|
|
|
### Limitations |
|
- **Cold Start Problem**: The model may not perform optimally for new users without enough past ratings or for movies without sufficient interaction data. |
|
- **Demographic Constraints**: Recommendations are influenced heavily by demographic data and may not fully capture nuanced user preferences. |
|
- **Genre Limitation**: Genre preferences are based on past ratings, which may not always reflect the user’s evolving interests. |
|
|
|
## How to Use |
|
To use this model, you'll need: |
|
1. **Profile ID**: Identify or calculate the user’s profile ID based on demographics and genre preferences. |
|
2. **Movie ID**: Specify the movie IDs you want to score for a particular profile. |
|
|
|
```python |
|
from tensorflow import keras |
|
import numpy as np |
|
|
|
# Load the trained model |
|
model = keras.models.load_model("profile_based_recommendation_model.keras") |
|
|
|
# Example: Generate recommendations for a user with profile_id 3 for movies with IDs 10, 50, and 100 |
|
profile_id = np.array([3]) |
|
movie_ids = np.array([10, 50, 100]) |
|
|
|
# Predict scores |
|
predictions = model.predict([profile_id, movie_ids]) |
|
|
|
# Display predicted scores for each movie |
|
for movie_id, score in zip(movie_ids, predictions): |
|
print(f"Movie ID: {movie_id}, Predicted Score: {score}") |
|
``` |
|
|
|
## Dataset Citation |
|
If you use this model or the dataset, please cite the MovieLens dataset as follows: |
|
|
|
``` |
|
@article{harper2015movielens, |
|
title={The MovieLens datasets: History and context}, |
|
author={Harper, F Maxwell and Konstan, Joseph A}, |
|
journal={ACM Transactions on Interactive Intelligent Systems (TIIS)}, |
|
volume={5}, |
|
number={4}, |
|
pages={1--19}, |
|
year={2015}, |
|
publisher={ACM New York, NY, USA} |
|
} |
|
``` |
|
|
|
## Acknowledgments |
|
Thanks to **GroupLens Research** for providing the MovieLens dataset and the open-source tools that make it accessible for research purposes. |
|
|
|
--- |
|
|
|
This model card can be customized further if you want to add more specific instructions or additional use cases. |