File size: 3,049 Bytes
9a43d11
 
 
 
 
 
 
65ad801
 
 
9a43d11
 
630a153
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a43d11
630a153
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b553c60
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: mit
language:
- en
pipeline_tag: text-classification
tags:
- finance
metrics:
- accuracy
library_name: keras
---
# MF3Classifier

## Model Overview
This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis.

## Model Architecture
The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier:

### Preprocessing Pipeline
1. **Numerical Features Branch**
   - Features: ['AUM']
   - Transformation: StandardScaler
   
2. **Categorical Features Branch**
   - Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option']
   - Transformations:
     - OneHotEncoder (non-sparse output, handles unknown categories)
     - Feature Selection (SelectKBest with mutual_info_classif, k=30)

### Classifier
- **Model**: RandomForestClassifier
- **Key Parameters**:
  - n_estimators: 30
  - max_depth: 20
  - min_samples_split: 10
  - min_samples_leaf: 5
  - n_jobs: -1 (parallel processing)
  - random_state: 42

## Use Cases
- Mutual fund performance prediction
- Investment strategy optimization
- Portfolio management
- Risk assessment

## Model Parameters

### Preprocessing Configuration
- **Numerical Features**:
  - StandardScaler with default parameters
  - Handles mean centering and scaling
  
- **Categorical Features**:
  - OneHotEncoder:
    - handle_unknown: 'ignore'
    - sparse_output: False
    - dtype: numpy.float64
  - Feature Selection:
    - Method: SelectKBest with mutual_info_classif
    - Number of features: 30

### Random Forest Configuration
- **Tree Structure**:
  - Maximum depth: 20
  - Minimum samples for split: 10
  - Minimum samples per leaf: 5
  
- **Ensemble Settings**:
  - Number of trees: 30
  - Feature selection: sqrt (auto)
  - Bootstrap: True
  - Criterion: gini

## Technical Details

### File Information
- **Model Type**: Scikit-learn Pipeline
- **Last Updated**: November 3, 2024

### Input Features
1. **Numerical Features**:
   - AUM (Assets Under Management)

2. **Categorical Features**:
   - AMC (Asset Management Company)
   - Fund Category
   - Sub-Scheme
   - Investment Type
   - Growth Option

## Limitations and Considerations
- The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships
- Feature selection is limited to top 30 features
- Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder

## Usage Notes
- The model supports parallel processing (n_jobs=-1)
- Handles unknown categories in categorical features gracefully
- Uses standard scaling for numerical features
- Designed for production use with joblib serialization

## Download Modal

To download the pre-trained **MF3Classifier** model, use the link below:

[**Download MF3Classifier Model**](https://huggingface.co/alokpandey/MF3Classifier/resolve/main/fund_predictor_model_20241103_230654.joblib)