{
"cells": [
{
"cell_type": "markdown",
"id": "c7938b37",
"metadata": {},
"source": [
"Now that I have the data, it is time to train a model to learn how to create these tiers. As I am going through the fastai course (specifically the Tabular section), I will use the fastai library to train a neural network, as well as their recommended Random Forest approach to see which performs best. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6e9599ba",
"metadata": {},
"outputs": [],
"source": [
"from fastai.tabular.all import *\n",
"from sklearn.tree import DecisionTreeRegressor, export_graphviz\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from scipy.cluster import hierarchy as hc\n",
"import graphviz\n",
"import os\n",
"from dtreeviz.trees import *\n",
"from treeinterpreter import treeinterpreter\n",
"from waterfall_chart import plot as waterfall\n",
"from sklearn.inspection import plot_partial_dependence\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.metrics import confusion_matrix"
]
},
{
"cell_type": "markdown",
"id": "eac4bcb3",
"metadata": {},
"source": [
"# Load CSV into Fastai Dataloaders and Train/Valid Splits"
]
},
{
"cell_type": "markdown",
"id": "e804b4f8",
"metadata": {},
"source": [
"I will use FastAis dataloaders to easily load, normalize, and split the data for future training. \n",
"I also export the train and valid datasets to try out with the Decision Trees/Random Forest approach."
]
},
{
"cell_type": "code",
"execution_count": 143,
"id": "a5c98335",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"rookie_year.csv\")\n",
"splits = RandomSplitter(valid_pct=0.2)(range_of(df))"
]
},
{
"cell_type": "markdown",
"id": "3b41b614",
"metadata": {},
"source": [
"I've decided to use the following categories: Completions, Attempts, Yards, Completion Percentage, Touchdowns, Interceptions, Yards/Game, and Sacks to feed the model. "
]
},
{
"cell_type": "code",
"execution_count": 144,
"id": "b6df7e2a",
"metadata": {},
"outputs": [],
"source": [
"to = TabularPandas(df,\n",
" cont_names=[\"Cmp\", \"Att\", \"Yds\", \"Cmp%\", \"TD\", \"Int\", \"Y/G\", \"Sk\"],\n",
" y_names=\"Tier\",\n",
" procs=[FillMissing, Normalize],\n",
" splits=splits\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 145,
"id": "2994daa6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
"
\n",
"
\n",
"
Cmp
\n",
"
Att
\n",
"
Yds
\n",
"
Cmp%
\n",
"
TD
\n",
"
Int
\n",
"
Y/G
\n",
"
Sk
\n",
"
Tier
\n",
"
\n",
" \n",
" \n",
"
\n",
"
239
\n",
"
47.0
\n",
"
98.0
\n",
"
530.0
\n",
"
48.000000
\n",
"
3.0
\n",
"
2.0
\n",
"
132.500000
\n",
"
18.0
\n",
"
Below-Average Career QB
\n",
"
\n",
"
\n",
"
293
\n",
"
255.0
\n",
"
476.0
\n",
"
2894.0
\n",
"
53.599998
\n",
"
11.0
\n",
"
22.0
\n",
"
192.899994
\n",
"
38.0
\n",
"
Below-Average Career QB
\n",
"
\n",
"
\n",
"
267
\n",
"
52.0
\n",
"
126.0
\n",
"
716.0
\n",
"
41.299999
\n",
"
8.0
\n",
"
13.0
\n",
"
89.500000
\n",
"
20.0
\n",
"
Below-Average Career QB
\n",
"
\n",
"
\n",
"
98
\n",
"
169.0
\n",
"
320.0
\n",
"
2074.0
\n",
"
52.799999
\n",
"
10.0
\n",
"
12.0
\n",
"
172.800003
\n",
"
28.0
\n",
"
Average Career QB
\n",
"
\n",
"
\n",
"
322
\n",
"
82.0
\n",
"
161.0
\n",
"
864.0
\n",
"
50.900002
\n",
"
4.0
\n",
"
9.0
\n",
"
123.400002
\n",
"
15.0
\n",
"
Below-Average Career QB
\n",
"
\n",
"
\n",
"
5
\n",
"
173.0
\n",
"
296.0
\n",
"
2210.0
\n",
"
58.400002
\n",
"
20.0
\n",
"
6.0
\n",
"
200.899994
\n",
"
10.0
\n",
"
Elite Career QB
\n",
"
\n",
"
\n",
"
40
\n",
"
158.0
\n",
"
328.0
\n",
"
2158.0
\n",
"
48.200001
\n",
"
19.0
\n",
"
16.0
\n",
"
154.100006
\n",
"
36.0
\n",
"
Above Average Career QB
\n",
"
\n",
"
\n",
"
11
\n",
"
87.0
\n",
"
194.0
\n",
"
1126.0
\n",
"
44.799999
\n",
"
6.0
\n",
"
13.0
\n",
"
112.599998
\n",
"
14.0
\n",
"
Elite Career QB
\n",
"
\n",
"
\n",
"
41
\n",
"
166.0
\n",
"
346.0
\n",
"
2183.0
\n",
"
48.000000
\n",
"
18.0
\n",
"
21.0
\n",
"
155.899994
\n",
"
47.0
\n",
"
Above Average Career QB
\n",
"
\n",
"
\n",
"
288
\n",
"
178.0
\n",
"
261.0
\n",
"
1694.0
\n",
"
68.199997
\n",
"
8.0
\n",
"
7.0
\n",
"
24.200001
\n",
"
26.0
\n",
"
Below-Average Career QB
\n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"to.show()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b4855366",
"metadata": {},
"outputs": [],
"source": [
"dls = to.dataloaders(bs=64)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "1126b4b4",
"metadata": {},
"outputs": [],
"source": [
"# Train Valid Split\n",
"xs,y = to.train.xs,to.train.y \n",
"valid_xs,valid_y = to.valid.xs,to.valid.y"
]
},
{
"cell_type": "markdown",
"id": "d77e5d80",
"metadata": {},
"source": [
"# Decision Trees/Random Forests"
]
},
{
"cell_type": "markdown",
"id": "8bb4b750",
"metadata": {},
"source": [
"Fastai Recommends to at least try this approach first to see the performance since they are easier to work with, train, and understand than neural networks. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "065dc50c",
"metadata": {},
"outputs": [],
"source": [
"# Helper functions for Tree Visualization\n",
"os.environ[\"PATH\"] += os.pathsep + 'C:/Program Files (x86)/Graphviz/bin/'\n",
"def draw_tree(t, df, size=10, ratio=0.6, precision=0, **kwargs):\n",
" s=export_graphviz(t, out_file=None, feature_names=df.columns, filled=True, rounded=True,\n",
" special_characters=True, rotate=False, precision=precision, **kwargs)\n",
" return graphviz.Source(re.sub('Tree {', f'Tree {{ size={size}; ratio={ratio}', s))\n",
"\n",
"def cluster_columns(df, figsize=(10,6), font_size=12):\n",
" corr = np.round(scipy.stats.spearmanr(df).correlation, 4)\n",
" corr_condensed = hc.distance.squareform(1-corr)\n",
" z = hc.linkage(corr_condensed, method='average')\n",
" fig = plt.figure(figsize=figsize)\n",
" hc.dendrogram(z, labels=df.columns, orientation='left', leaf_font_size=font_size)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"id": "0f172991",
"metadata": {},
"source": [
"### Simple Decision Tree with 4 leaf nodes"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "f0341db8",
"metadata": {},
"outputs": [],
"source": [
"m = DecisionTreeRegressor(max_leaf_nodes=4)\n",
"m.fit(xs, y);"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "cf217221",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"draw_tree(m, xs, size=10, leaves_parallel=True, precision=2)"
]
},
{
"cell_type": "markdown",
"id": "588f5cff",
"metadata": {},
"source": [
"Similar Visualization, but with the datapoints"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "e3434720",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but DecisionTreeRegressor was fitted with feature names\n",
" warnings.warn(\n"
]
},
{
"data": {
"image/svg+xml": [
""
],
"text/plain": [
""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"samp_idx = np.random.permutation(len(y))[:500]\n",
"dtreeviz(m, xs.iloc[samp_idx], y.iloc[samp_idx], xs.columns, \"Tier\",\n",
" fontname='DejaVu Sans', scale=2, label_fontsize=10,\n",
" orientation='LR')"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "28ce0ea8",
"metadata": {},
"outputs": [],
"source": [
"def r_mse(pred,y): return round(math.sqrt(((pred-y)**2).mean()), 6)\n",
"def m_rmse(m, xs, y): return r_mse(m.predict(xs), y)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "155d9325",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"325"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(xs)"
]
},
{
"cell_type": "markdown",
"id": "e4b30628",
"metadata": {},
"source": [
"### Use arbitraly big Decision Tree"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "cf682f74",
"metadata": {},
"outputs": [],
"source": [
"m = DecisionTreeRegressor()\n",
"m.fit(xs, y);"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "49ee9105",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.0, 1.300522)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m_rmse(m, xs, y), m_rmse(m, valid_xs, valid_y)"
]
},
{
"cell_type": "markdown",
"id": "73d2d181",
"metadata": {},
"source": [
"Train Error is 0, but valid error is high (definition of overfitting). Here is why:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "f6c3c51c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(124, 325)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m.get_n_leaves(), len(xs)"
]
},
{
"cell_type": "markdown",
"id": "861edf2e",
"metadata": {},
"source": [
"There are too many leaves compared to datapoints, need to have less so it can generalize more... Time for random forests to do this!"
]
},
{
"cell_type": "markdown",
"id": "d816c4a7",
"metadata": {},
"source": [
"### Random Forests"
]
},
{
"cell_type": "markdown",
"id": "712c7fd5",
"metadata": {},
"source": [
"Use the random forest approach with 40 \"estimators\" (random trees) which will be averaged (called bagging). "
]
},
{
"cell_type": "code",
"execution_count": 88,
"id": "935e23b3",
"metadata": {},
"outputs": [],
"source": [
"def rf(xs, y, n_estimators=40, max_samples=len(xs),\n",
" max_features=0.5, min_samples_leaf=5, **kwargs):\n",
" m = RandomForestRegressor(n_jobs=-1, n_estimators=n_estimators,\n",
" max_samples=max_samples, max_features=max_features,\n",
" min_samples_leaf=min_samples_leaf, oob_score=True)\n",
" m.fit(xs, y)\n",
" return m"
]
},
{
"cell_type": "code",
"execution_count": 89,
"id": "8e50772a",
"metadata": {},
"outputs": [],
"source": [
"m = rf(xs, y)"
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "2762c7e7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.617003, 0.841323)"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m_rmse(m, xs, y), m_rmse(m, valid_xs, valid_y)"
]
},
{
"cell_type": "markdown",
"id": "82bd1adf",
"metadata": {},
"source": [
"It looks like the error is much better. Now I will check how many estimators to use..."
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "0d6d3707",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n",
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\base.py:443: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names\n",
" warnings.warn(\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"preds = np.stack([t.predict(valid_xs) for t in m.estimators_])\n",
"plt.plot([r_mse(preds[:i+1].mean(0), valid_y) for i in range(40)]);"
]
},
{
"cell_type": "markdown",
"id": "141e6fb8",
"metadata": {},
"source": [
"It looks like 40 estimators seems to give us some of the best performance. We can also look at the oob (out-of-bag) error. https://en.wikipedia.org/wiki/Out-of-bag_error"
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "7ab2abb1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.860368"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"r_mse(m.oob_prediction_, y)"
]
},
{
"cell_type": "markdown",
"id": "fa422767",
"metadata": {},
"source": [
"This is a bit higher than the valid set, which makes sense. Now I will look at the predictions. I want to know how important each feature is in the dataset. Luckily there is this nice property (feature_importances_) that tells us just that"
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "22035062",
"metadata": {},
"outputs": [],
"source": [
"def rf_feat_importance(m, df):\n",
" return pd.DataFrame({'cols':df.columns, 'imp':m.feature_importances_}\n",
" ).sort_values('imp', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 94,
"id": "0af127b7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
cols
\n",
"
imp
\n",
"
\n",
" \n",
" \n",
"
\n",
"
3
\n",
"
Cmp%
\n",
"
0.147414
\n",
"
\n",
"
\n",
"
4
\n",
"
TD
\n",
"
0.145543
\n",
"
\n",
"
\n",
"
0
\n",
"
Cmp
\n",
"
0.144447
\n",
"
\n",
"
\n",
"
2
\n",
"
Yds
\n",
"
0.143749
\n",
"
\n",
"
\n",
"
6
\n",
"
Y/G
\n",
"
0.120133
\n",
"
\n",
"
\n",
"
1
\n",
"
Att
\n",
"
0.114358
\n",
"
\n",
"
\n",
"
7
\n",
"
Sk
\n",
"
0.107574
\n",
"
\n",
"
\n",
"
5
\n",
"
Int
\n",
"
0.076782
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cols imp\n",
"3 Cmp% 0.147414\n",
"4 TD 0.145543\n",
"0 Cmp 0.144447\n",
"2 Yds 0.143749\n",
"6 Y/G 0.120133\n",
"1 Att 0.114358\n",
"7 Sk 0.107574\n",
"5 Int 0.076782"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fi = rf_feat_importance(m, xs)\n",
"fi"
]
},
{
"cell_type": "markdown",
"id": "532e13e4",
"metadata": {},
"source": [
"Completion Percentage Seems to be the highest category followed by TD. Interestingly enough, interceptions are the lowest predictor. One thing to do is see if I can remove low-importance features. I'll use the cluster_columns function defined above which can tell us how closely connected two components are. "
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "73887e96",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmoAAAFlCAYAAABbbMQ3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAX90lEQVR4nO3de/Tkd13f8ddblpsgF11lYyAuiKDRDVUDQgkaC8glJw3UHLCgkY1VqyJVoMcbxhRFrbZIaeSIUDYKMVy2QMAQk2NtvEBCXSxmuUmJQIBkAwkkIYCYhHf/mFkdNnsZlt/+vp/fzuNxzpwz853vb+adz5lsnvnMzP6quwMAwHi+YuoBAADYP6EGADAooQYAMCihBgAwKKEGADAooQYAMKhNB7tz8+bNvXXr1nUaBQDg8L3jHe+4rru/duo51tJBQ23r1q3ZtWvXes0CAHDYqurDU8+w1rz1CQAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgCslKq6tKo+VVV3Xjj2oap6zMLtrVXVVbVpmilnhBoAsDKqamuSRyXpJP962mkObdJKBAD4J7t2JLt3HulnOSPJ5UnenuSHk7yuql6Z5Lgkb66q25I8P8kz5+ffUFVJ8tjuvuxID7cvO2oAwBh270z27D7Sz3JGkvPml8dV1X26+4eSXJXk1O6+e3f/VpLvnp9/r/mxdY+0xI4aADCSLduS7Rce3s+eWQe9u6pOSvINSV7b3ddV1ZVJnpbkdw7vCY88O2oAwKr44SSXdPd189t/ND82LDtqAMBRr6rumuQpSe5QVXvmh++c5F5V9ZDMvlywaN/bkxBqAMAqeFKS25JsS/KPC8dfm9nn1q5N8oCF459I8oX5sfevz4i3561PAGAV/HCSHd19VXfv2XtJck6Spyf5jSTPq6obquq53f3ZJC9I8tb5sYdPMbQdNQDgqNfdjz/A8ddmtquWJBfsc99ZSc46wqMdlB01AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQfml7ADAdHbtSHbvnF3fc0Wy5YRp5xmMHTUAYDq7dyZ7dk89xbCEGgAwrS3bku0X2k3bD6EGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGAKyUqnpaVe2qqpur6pqquqiqTpp6rv0RagDAyqiqZyd5UZJfT3KfJMcleUmS0yYc64CEGgCwEqrqnkmen+Snuvv13f2Z7r6lu9/c3f+xqs6uqtdV1auq6tNVtbuqHlRVv1BVH6+qj1TV9y083qVV9RtV9X+q6saquqCqvnotZ960lg8GAPBl2bM72XHKkXr0RyS5S5I3HOScUzPbXXtGklckuTjJy5McOz/20iT3Xzj/jCSPS/LBJH+Y5MVJfnCtBrajBgCMYdvpyZZtR/IZvibJdd1960HO+cvuvnh+zuuSfG2S3+zuW5K8OsnWqrrXwvmv7O53dfdnkvxykqdU1R3WamA7agDAGE7cPrscrjPrUGdcn2RzVW06SKxdu3D9c5mF3W0Lt5Pk7klumF//yML5H05yxySb93mcw2ZHDQBYFZcl+YckT1rDx7zfwvXjktyS5Lq1enA7agDASujuG6vqrCS/W1W3Jrkks7B6TJLvTfLZw3jYH6yqP0zyocy+qLBzYQfuy2ZHDQBYGd39wiTPTvK8JJ/I7K3LZyZ542E+5CuTnJtkT2ZfVHjWlz3kAjtqAMBK6e7zkpy3n7vets95f5pk68LtW5Ps+0G4K7v7F9Z6xr3sqAEADEqoAQAMylufAACHobtPPtLPIdQAWC27diS7d049BXvtuSLZcsLUUwzLW58ArJbdO2e/pgg2ADtqAKyeLduS7RdOPQXJkfy9nkcFO2oAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAIMSagAAgxJqAACDEmoAAAuq6uSq+ujUcyRCDQBYAVV1XlW9Yp9j31NV11fVMVPNdShCDQBYBc9K8sSqemySVNVdkrwsyXO6+5pJJzsIoQYAHPW6+/okP53k96vqbkl+JcmV3X1uVd21qs6tqk9V1XuSPHTxZ6vq56rqY1X16ar6u6p69HrNvWm9nggAYL/27E52nHLEn6a7X1dVT01yfpJHJvn2+V2/kuQb55e7Jblo789U1YOTPDPJQ7v76qramuQOR3zYOaEGAExn2+nr/Yw/leTKJL/U3VfNjz0lyU929yeTfLKqXpzkrPl9tyW5c5Ljq+oT3f2h9RxWqAEA0zlx++yyFs6sQ57S3ddW1XVJ3r1w+OuTfGTh9ocXzv9AVf1MkrOTfGtVXZzk2d199VqMfCg+owYArLprktxv4fZxi3d29x9190lJviFJJ/nP6zWYUAMAVt1rk/xCVd27qu6b2ZcOksw+o1ZV/6qq7pzkH5J8LrO3Q9eFtz6Bo9+uHcnunVNPwSj2XJFsOWHqKRjLf0rye0k+mOTqJDuS/If5fXdO8ptJviXJLUneluTH1mswoQYc/XbvnH2rbMu2qScBBtDdW/e5/dkkZ+xz2m/P77siycPWZ7LbE2rAatiyLdl+4dRTMIJ1+GsgYK34jBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAwKCEGgDAoIQaAMCghBoAcNSrqpsXLl+oqs8t3H56VZ1dVbdU1afnl/dX1TlVdcyUcws1AOCo191333tJclWSUxeOnTc/7TXd/VVJvjrJk5NsSfKOKWNt01RPDACT2bM72XHK1FMwqO6+Jcm7q+qpSf4myXOSPHeKWYQaAKtl2+lTT8AG0d23VdUFSR431QxCDYDVcuL22YWjz5l1JB716szeCp2EUANub9eOZPfOqadYO3uuSLacMPUUwMZ0bJJPTvXkvkwA3N7unbPP8ACssKr6iiSnJvnLqWawowbs35ZtyfYLp55ibfjQOPAlqKo7JnlgkrMz++bnC6eaxY4aAMDMU6vq5iQ3JHlTkuuTfGd3Xz3VQHbUAICV0t1b93Ps7Mx20IZiRw0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AWHlV9Yyq+qup59iXUAMAVkZVnVRVb6uqG6vqk1X11qp66NRzHcimqQcAAFgPVXWPJH+c5CeSvDbJnZI8Ksnnp5zrYIQasBr27E52nDL1FMC0HpQk3X3+/PbnklySJFX1HYsnVtVvJ3lEklO6+8b1HHKRUGO17dqR7N459RTj2XNFsuWEqadYO9tOn3oCYAzvT3JbVf1Bklcnuby7P7V4QlV9RZKXJjkuyfd192fXf8x/JtRYbbt3znZatmybehKOpBO3zy7A0e3MOujd3X1TVZ2U5OeSvCzJlqp6S5IfnZ9yxyTnZ9ZHp3b3Px7BaZci1GDLtmT7hVNPMRZvEQJHqe5+b5JnJElVfXOSVyV5UZKLkzwwyUOSPGyESEt86xMAWFHd/b4k5yb5tvmh9ybZnuSiqnrwVHMtsqMGAKyE+Q7aKUle090frar7Jfm3SS7fe053n19Vd0ryp1V1cndfOdG4SeyoAQCr49NJvivJ26vqM5kF2ruSPGfxpO7+gyTPT/JnVbV1vYdcZEcNAFgJ3f2xJE85wN3nzi97z31ZZl84mJQdNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AIBBCTUAgEEJNQCAQQk1AOCoV1XnVdUr9jn2PVV1fVUdM799SVV93/z6N1XVq6vqE1V1U1X9v6r671V13/WcW6gBAKvgWUmeWFWPTZKqukuSlyV5TndfU1V3S/KdSf68qh6Y5O1Jrk7y7d19jySPTHJlkpPWc+hN6/lkwAayZ3ey45SppwBYE919fVX9dJLfr6pvS/K8JFd297nzUx6d5K3d/fmqOnt+/dkLP//xJC9a36kHCbU/evtVueCdH5t6DFbRntPymuMumHqK8Ww7feoJANZcd7+uqp6a5PzMdsi+feHuJya5cH79MUl+fp3H268hQu2Cd34s77nmphx/zD2mHgVIkhO3zy4AG8mZtcxZP5XZW5i/1N1XLRx/QpIXzK9vTrJn7x1V9cwkv5ZZN53f3T+6JvMuYYhQS5Ljj7lHXvPjj5h6DFbNjl+begIA1lF3X1tV1yV5995jVbUtyU3d/ZH5oeuTHLPwM+ckOaeqfi2JLxMAAKyjxbc9k+R/Jfk3E83yRYQaALDqTknyloXbZyd5VFW9sKqOTZKq2pzkW9Z7MKEGAKysqrpnZgH2tr3Huvv9SR6e2ducf1tVn07y1sz+uo5fXs/5hvmMGgDAeujurQs3H5fkf3f3rfuc874kT1nPufbHjhoAsMpuSPI7Uw9xIHbUAICV1d2XTD3DwdhRAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADAFZCVX2oqh6zxHmXVtW/W4+ZDkWoAQAMSqgBACulqp5RVX9VVf+lqj5VVR+sqifM73tBkkclOaeqbq6qc6acddOUTw5D2LM72XHK1FMAsL6+K8kfJNmc5MeS/I+qOra7f6mqHpnkVd398kknjB01Vt2205Mt26aeAoD19+Hufll335ZZsB2T5D4Tz3Q7dtRYbSdun10A2PjOrC/l7D17r3T3Z6sqSe6+1iN9ueyoAQB8sZ56gL2EGgDAF7s2yQOmHiIRagAA+/pvSU6ffyP0xVMO4jNqAMBK6O6tCzfP3ee+Wrh+WZIHrc9UB2dHDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAGBQQg0AYFBCDQBgUEINAFgpVfW0qtpVVTdX1TVVdVFVnbTGz3HPqrq4qm6oqvOq6g4L972sqp68zOMINQBgZVTVs5O8KMmvJ7lPkuOSvCTJaWv8VD+e5P/On2NrkifPn/8RSY7p7jcs8yCb1niow/aea27KU1962dRjAABHqaq6Z5LnJ9ne3a9fuOvNSd5cVWcn+dYkn88s3D6U5Pvnl5+dH/+R7r5k/niXJrksyaOTPDjJpfPH/mSS+yd5Y3d/vqr+MskD5rtqv5Pk6cvOPMSO2mn/4tgcf8w9ph4DADi6PSLJXZIcbDfr1CSvTHLvzHbELs6sl47NLPJeus/5ZyQ5M8nXJ7k1yYvnx9+V5DFVddckj0ry7iTPSnJRd1+57MDV3Qe888QTT+xdu3Yt+1gAAJOpqnd094kHuf/pSf5rd285wP1nJ3lkdz92fvvUJOcnuWd331ZVX5XkpiT37u4b5jtql3f3z8/PPz7JO5PcNckdM4u2hyd5S5Jzkrwpyfcm+c3Mdu7+orufd7B/pmHe+gQAOMKuT7K5qjZ1960HOOfaheufS3Jdd9+2cDtJ7p7khvn1jyyc/+HMAm1zd1+b5Mf23lFVr0vyi5m97XmHJN+T5JKqenx3/8mBBh7irU8AgHVwWZJ/SPKkNXzM+y1cPy7JLUmuWzyhqh6f2buYf5JkW5JdPXtLc1eSEw724EINAFgJ3X1jkrOS/G5VPamqvrKq7lhVT6iq3zrMh/3Bqjq+qr4ys8+w7VzYgUtV3SWztzp/dn7og0lOrqo7JXlkkr8/2IMLNQBgZXT3C5M8O8nzknwis7cun5nkjYf5kK9Mcm6SPZl9UeFZ+9z/i0nO6+69b5G+NMnm+XN/NAf/YoMvEwAAR4dDfZngCDzfpUle1d0vP1LPYUcNAGBQQg0AYFD+eg4AgMPQ3Scf6eewowYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwKKEGADAooQYAMCihBgAwqOruA99Z9YkkH16/cSazOcl1Uw9xlLCWa8t6rh1ruXas5dqynmvnwd39VVMPsZYO+kvZu/tr12uQKVXVru4+ceo5jgbWcm1Zz7VjLdeOtVxb1nPtVNWuqWdYa976BAAYlFADABiUUJv5/akHOIpYy7VlPdeOtVw71nJtWc+1c9St5UG/TAAAwHTsqAEADGplQq2qHl9Vf1dVH6iqnz/AOSdX1Tur6t1V9efrPeNGcqj1nK/ljfP1fGdVnTXFnBvBMq/N+XkPrarbqur09Zxvo1nitXlaVV0xf13uqqqTpphzI1hiLZ8+X8srquptVfWQKebcCJZYy2+uqsuq6vNV9dwpZtxIlljPqqoXz++/oqq+Y4o510R3H/WXJHdIcmWSByS5U5K/TXL8PufcK8l7khw3v/11U8896mXJ9Tw5yR9PPevol2XWcuG8P0vyliSnTz33qJclX5t3zz9/7OOEJO+beu4RL0uu5b9Mcu/59SckefvUc494WXItvy7JQ5O8IMlzp5555MuS6/nEJBclqSQP38ivzVXZUXtYkg9099939z8meXWS0/Y552lJXt/dVyVJd398nWfcSJZZT5az7Fr+dJL/mcTr8uAOuZ7dfXPP/yRPcrckPqi7f8us5du6+1Pzm5cnue86z7hRLLOWH+/uv05yyxQDbjDL/Ll5WpI/7JnLk9yrqo5Z70HXwqqE2rFJPrJw+6PzY4selOTeVXVpVb2jqs5Yt+k2nmXWM0keUVV/W1UXVdW3rs9oG84h17Kqjk3y5CS/t45zbVRLvTar6slV9b4kFyY5c51m22iW/fd8rx/JbAeD2/tS15KDW2Y9j5o1P+hvJjiK1H6O7ft/0ZuSfGeSRye5a5LLqury7n7/kR5uA1pmPf8myTd0981V9cQkb0zyTUd6sA1ombV8UZKf6+7bqvZ3OguWWc909xuSvKGqvjvJryZ5zJEebANaai2TpKq+N7NQ83m//Vt6LVnKMut51Kz5qoTaR5Pcb+H2fZNcvZ9zruvuzyT5TFX9RZKHJBFqt3fI9ezumxauv6WqXlJVm7vb77P7Ysu8Nk9M8up5pG1O8sSqurW737guE24sy6znP+nuv6iqb/Ta3K+l1rKqTkjy8iRP6O7r12m2jeZLel1ySMv+N/2oWPNVeevzr5N8U1Xdv6rulOQHkrxpn3MuSPKoqtpUVV+Z5LuSvHed59woDrmeVbWl5mVRVQ/L7LXmD/HbO+Radvf9u3trd29NsjPJT4q0A1rmtfnAhdfmd2T2YWSvzdtbZi2PS/L6JD/k3YeDWua/QSxvmfV8U5Iz5t/+fHiSG7v7mvUedC2sxI5ad99aVc9McnFm3xZ5RXe/u6r+/fz+3+vu91bVnyS5IskXkry8u9813dTjWmY9k5ye5Ceq6tYkn0vyAwsf4GZuybVkSUuu5/dn9gf4LZm9Np/qtXl7S67lWUm+JslL5u17a/vl4rezzFpW1ZYku5LcI8kXqupnMvsm400HetxVteRr8y2ZffPzA0k+m2T7VPN+ufxmAgCAQa3KW58AABuOUAMAGJRQAwAYlFADABiUUAMAGJRQAwAYlFADABiUUAMAGNT/B90UUROglFoZAAAAAElFTkSuQmCC\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"cluster_columns(xs)"
]
},
{
"cell_type": "markdown",
"id": "424be215",
"metadata": {},
"source": [
"It looks like Attempts and Completions are pretty similar (which makes sense) "
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "fe8c12fb",
"metadata": {},
"outputs": [],
"source": [
"def get_oob(df):\n",
" m = RandomForestRegressor(n_estimators=40, min_samples_leaf=15,\n",
" max_samples=len(df), max_features=0.5, n_jobs=-1, oob_score=True)\n",
" m.fit(df, y)\n",
" return m.oob_score_"
]
},
{
"cell_type": "markdown",
"id": "7ec8a178",
"metadata": {},
"source": [
"I will check the difference of OOB Error between Completions and Attempts"
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "ee94b69a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.006721497139362986"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_oob(xs)"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "4485995a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Cmp': 0.003757497556912348, 'Att': -0.000486470984402132}"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{c:get_oob(xs.drop(c, axis=1)) for c in (\n",
" 'Cmp', 'Att')}"
]
},
{
"cell_type": "markdown",
"id": "6188e231",
"metadata": {},
"source": [
"I will drop the completions column and try the random forest again"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "336da4b9",
"metadata": {},
"outputs": [],
"source": [
"xs_final = xs.drop((\"Cmp\"), axis=1)\n",
"valid_xs_final = valid_xs.drop((\"Cmp\"), axis=1)"
]
},
{
"cell_type": "markdown",
"id": "fc0a7c96",
"metadata": {},
"source": [
"But first, save the final dataset for future processing"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "e5518547",
"metadata": {},
"outputs": [],
"source": [
"save_pickle('xs_final.pkl', xs_final)\n",
"save_pickle('valid_xs_final.pkl', valid_xs_final)"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "30840512",
"metadata": {},
"outputs": [],
"source": [
"xs_final = load_pickle('xs_final.pkl')\n",
"valid_xs_final = load_pickle('valid_xs_final.pkl')"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "ca07591b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.628506, 0.869091)"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m = rf(xs_final, y)\n",
"m_rmse(m, xs_final, y), m_rmse(m, valid_xs_final, valid_y)"
]
},
{
"cell_type": "markdown",
"id": "e0d794a9",
"metadata": {},
"source": [
"I will also plot partial dependence on some of the best features. This will tell me how much each feature affects the overall score if all the other features were unchanged. "
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "df65d79f",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\sklearn\\utils\\deprecation.py:87: FutureWarning: Function plot_partial_dependence is deprecated; Function `plot_partial_dependence` is deprecated in 1.0 and will be removed in 1.2. Use PartialDependenceDisplay.from_estimator instead\n",
" warnings.warn(msg, category=FutureWarning)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig,ax = plt.subplots(figsize=(12, 4))\n",
"plot_partial_dependence(m, valid_xs_final, ['Cmp%','TD'],\n",
" grid_resolution=20, ax=ax);"
]
},
{
"cell_type": "markdown",
"id": "d873a9ab",
"metadata": {},
"source": [
"Completion percentage looks good. The better the completion percentage, the better the overall score. However, TDs are a bit concerning. I will now use the treeinterpreter on a row to see how these predictions are made on an individual row. "
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "1ea2721d",
"metadata": {},
"outputs": [],
"source": [
"row = valid_xs_final.iloc[:5]"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "be976f7a",
"metadata": {},
"outputs": [],
"source": [
"prediction,bias,contributions = treeinterpreter.predict(m, row.values)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "cc57a5be",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([1.62246257]), 1.4856153846153846, 0.13684718549424438)"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prediction[0], bias[0], contributions[0].sum()"
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "f9ed2e3a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Att 0.301647\n",
"Yds 0.442272\n",
"Cmp% 1.244385\n",
"TD -0.550133\n",
"Int -1.090999\n",
"Y/G 0.323963\n",
"Sk -0.063729\n",
"Name: 81, dtype: float64"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"valid_xs_final.iloc[0]"
]
},
{
"cell_type": "markdown",
"id": "c0045a86",
"metadata": {},
"source": [
"Lets's see which QB this is..."
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "76c9976d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Unnamed: 0 5\n",
"Name Jeff Hostetler\n",
"Year 1991\n",
"Age 30\n",
"Tm 0.0\n",
"Pos 0.0\n",
"No. 15.0\n",
"G 12\n",
"GS 12.0\n",
"QBrec 0.0\n",
"Cmp 179\n",
"Att 285\n",
"Cmp% 62.8\n",
"Yds 2032\n",
"TD 5\n",
"TD% 1.8\n",
"Int 4\n",
"Int% 1.4\n",
"1D 0.0\n",
"Lng 55\n",
"Y/A 7.1\n",
"AY/A 6.8\n",
"Y/C 11.4\n",
"Y/G 169.3\n",
"Rate 84.1\n",
"QBR 0.0\n",
"Sk 20\n",
"Yds.1 100\n",
"Sk% 6.6\n",
"NY/A 6.33\n",
"ANY/A 6.07\n",
"4QC 1.0\n",
"GWD 2.0\n",
"AV 10\n",
"Awards 0.0\n",
"Career_AV 65\n",
"Tier Above Average Career QB\n",
"Name: 81, dtype: object"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[valid_xs_final.iloc[0].name]"
]
},
{
"cell_type": "code",
"execution_count": 120,
"id": "60a8acf4",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\matth\\Anaconda3\\envs\\qb_preds\\lib\\site-packages\\waterfall_chart.py:66: FutureWarning: Behavior when concatenating bool-dtype and numeric-dtype arrays is deprecated; in a future version these will cast to object dtype (instead of coercing bools to numeric values). To retain the old behavior, explicitly cast bool-dtype arrays to numeric dtype.\n",
" trans.loc[net_label]= total\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAsRUlEQVR4nO3de3xU1bn/8c9jIIDcg1ERFLGiCFVRIygYg2IVUA5Ua5XiBW/UgmhbS8WfVouXovV4KigHb0CwWC/1Sm1QOWgUqyIRI4otEpGbgCKBCEJCAs/vj70Th5CQDJfMnvB9v155zZ619trzLEjmmbX2mr3N3REREYma/RIdgIiISFWUoEREJJKUoEREJJKUoEREJJKUoEREJJIaJDqA3XHAAQf44YcfnugwRERkN3z44Yffunt65fKkTlCHH344eXl5iQ5DRER2g5ktrapcU3wiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJSlAiIhJJcScoM+trZgvNrMDMRldR39nM3jOzEjP7XaW6JWb2iZnlm1leTHmamc00s0XhY+td646IyK7Jzs+m56Se9Jrci3mr5m1XV1xWzJAXhpA5JZMhLwyhuKwYgCXrl3Dm1DPpNbkXf5r9p4r9z3riLHpn9ybj0Qye+uSpOu0H7Lwv7y5/l2MnHkvjuxqz4rsVFeWjXh9FVnYW3R/rzqjXRwFQuLmQ3tm9K34a3tmQdZvX1V1H3L3WP0AK8AVwBJAKfAx0qbTPgcDJwN3A7yrVLQEOqOK4fwZGh9ujgXtrE89JJ53kIiK7q3BToZ/w8AleUlbiiwsXe69Jvbarnzh3ot+Re4e7u4/JHeMT5050d/eL/n6Rv73kbXd37zO1j/97zb/d3b2krMTd3YuKi/zwBw6vq264e819Wb95vW8o2eBZU7J8edHyivLymN3dT59yun/69afbtZuzYo6f89dz9krMQJ5X8R4f7wiqO1Dg7ovdfQvwNDCwUsL7xt3nAqVxHHcgMDXcngoMijMuEUmAPfVJHeDKl6+k7f1tuXr61XUWf7k5X80h87BMUlNS6di6Ixu3bKSkrKSiPndJLucddR4AA44awNtL3wYgf3U+mR0yATi307kV5akpqQB8v+V7uqZ3rcuu1NiXlo1b0iy12Q7tymMu3VpK04ZNOaT5IdvVT5s/jUuOu2TvBl9JvAmqHbA85vmKsKy2HHjdzD40s2Ex5Qe5+yqA8PHA6g5gZsPMLM/M8tasWRPHS4vInrRu8zrGzxlP7tBcpv10GtfPuH67+q7pXXnvqvc4pf0p25Xf3edu3hr6Fh9c8wEfrPyABd8sAODOM+7kqQvqfjoMgqms1k1+OLPQsnFLCjcXVlnfqnEr1m5eC8A231axT6vGrVi7KSjfum0rWdlZHDvxWAYevd1n+L2upr7szMickRwx/gjaNmtLy8YtK8rLtpXxyuevMKjzoD0d7k7Fm6CsijKPo30vdz8R6AeMMLPT43x93P1Rd89w94z09B3ubyUidWRPf1Jv1yKez7p7VlqTNNYXr694XlRcRFqTtCrri0p+qNvPfngLjS1P2S+Ft4a+xcLrFjL2nbEUFRft/U5UESvs2JedebD/g3x5w5d8u/lbXi14taL89S9e59RDT63y/3NvijdBrQAOjXneHlhZ28buvjJ8/AZ4kWDKEOBrM2sLED5+E2dcIlLH9sYn9UTp0a4H7yx7h9KtpSwrWkaz1GY0atCooj6rQxY5i3IAyFmUQ1aHLACOP/h43l3+LgAzCmZweofTKd1aWjGyapralMYNGtO4QePI9KU65Qs/GuzXgKYNm7J/w/0r6qbNn8Ylx9bt9B7En6DmAp3MrKOZpQIXA9Nr09DMmppZ8/Jt4Gzg07B6OnB5uH058HKccYlIHdsbn9QTpXWT1gw/eThZ2VkMfn4wD/R9gPzV+dz3r/sAGNptKJ988wmZUzL55JtPGNptKABj+4zlljduodfkXmQelskx6cfwzfff0Du7N2dMPYMzp57JH07/Q60SRF315fO1n3PWE2fx8dcfM/j5wUycOxGAIS8MoXd2b3pN7sWhLQ6l9+G9Adi4ZSPvrXiPn/zoJ3XWhwpVrZzY2Q/QH/icYDXfLWHZtcC14fbBBCOt74D14XYLgpV/H4c/C8rbhm3aALOAReFjWm1i0So+kcQp3FToJz1ykm8p2+JL1y/dYbVYucqrxTaXbq7YHvzcYH/zyzcrnr/55Zt+1ctX7bWYJZqoZhWfBXXJKSMjw/Py8mreUUT2iskfTebxeY9jZozrO44G+zVg5hczGdVrFJ+v/Zzh/xzOh6s+5McH/phf/PgX/OrkX3HBsxewdtNaSreVctqhp3HvT+4F4NY3bmVGwQxWb1zNMQccw8sXv0zT1KYJ7qHUBTP70N0zdihXghIRkUSqLkHpUkciIhJJDRIdgIhIInRZc2rCXvuz9Pf26PE6zd2jh4vLopP33rE1ghIRkUjSCEpEaq0+jTr29PFkz9MISkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIkkJSkREIinuBGVmfc1soZkVmNnoKuo7m9l7ZlZiZr+rTVszSzOzmWa2KHxsvWvdERGR+iKuBGVmKcAEoB/QBRhsZl0q7VYIXA/8dxxtRwOz3L0TMCt8LiIi+7B4R1DdgQJ3X+zuW4CngYGxO7j7N+4+FyiNo+1AYGq4PRUYFGdcIiJSz8SboNoBy2OerwjLdrftQe6+CiB8PLC6g5jZMDPLM7O8NWvW1DpwERFJLvEmKKuizOug7Q8N3B919wx3z0hPT4+3uYiIJIl4E9QK4NCY5+2BlXug7ddm1hYgfPwmzrhERKSeiTdBzQU6mVlHM0sFLgam74G204HLw+3LgZfjjEtEROqZBvHs7O5lZnYd8BqQAkx29wVmdm1Y/7CZHQzkAS2AbWb2a6CLu39XVdvw0PcAz5rZVcAy4MI90DcREUlicSUoAHfPAXIqlT0cs72aYPquVm3D8rVAn3hjERGR+ktXkhARkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUhSghIRkUiKO0GZWV8zW2hmBWY2uop6M7PxYf18Mzsxpm6JmX1iZvlmlhdTnmZmM81sUfjYete7JCIi9UFcCcrMUoAJQD+gCzDYzLpU2q0f0Cn8GQZMrFR/hrt3c/eMmLLRwCx37wTMCp+LiMg+LN4RVHegwN0Xu/sW4GlgYKV9BgJPeOB9oJWZta3huAOBqeH2VGBQnHGJiEg9E2+Cagcsj3m+Iiyr7T4OvG5mH5rZsJh9DnL3VQDh44HVBWBmw8wsz8zy1qxZE2f4IiKSLOJNUFZFmcexTy93P5FgGnCEmZ0e5+vj7o+6e4a7Z6Snp8fbXEREkkS8CWoFcGjM8/bAytru4+7lj98ALxJMGQJ8XT4NGD5+E2dcIiJSz8SboOYCncyso5mlAhcD0yvtMx24LFzNdwpQ5O6rzKypmTUHMLOmwNnApzFtLg+3Lwde3oW+iIhIPdIgnp3dvczMrgNeA1KAye6+wMyuDesfBnKA/kABsAm4Imx+EPCimZW/7t/c/dWw7h7gWTO7ClgGXLhbvRIRkaQXV4ICcPccgiQUW/ZwzLYDI6potxg4vppjrgX6xBuLiIjUX7qShIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlIiIRJISlEgdy87PpueknvSa3It5q+ZtV1dcVsyQF4aQOSWTIS8MobisGIBRr48iKzuL7o91Z9Tro7ZrU7i5kNb3tmba/Gl11geRuqAEJVKH1m1ex/g548kdmsu0n07j+hnXb1efnZ9N5zadmX3FbI5uczTZ+dkA3N3nbt4a+hYfXPMBH6z8gAXfLKhoM3b2WHod2qsuuyFSJ5SgROrQnK/mkHlYJqkpqXRs3ZGNWzZSUlZSUZ+7JJfzjjoPgAFHDeDtpW8DkJqSCkDp1lKaNmzKIc0PAWBZ0TJWbVxFxiEZddwTkb1PCUqkDhVuLqR1k9YVz1s2bknh5sIq61s1bsXazWsr6kbmjOSI8UfQtllbWjZuCcCY3DHcknlLHUUvUreUoETqUFqTNNYXr694XlRcRFqTtCrri0q2r3uw/4N8ecOXfLv5W14teJVPvv4EM+OY9GPqKnyROhX3HXVFZNf1aNeDW9+4ldKtpazauIpmqc1o1KBRRX1WhyxyFuXQ7eBu5CzKIatDFhAsnmjcoDEN9mtA04ZN2b/h/ny46kMWrl1I32l9KSgsoGlqU45qcxTd23VPVPdE9iglKJE61LpJa4afPJys7CzMjHF9x5G/Op+ZX8xkVK9RDO02lCunX0nmlEzat2jPlIFTABjywhDWblpL6bZSTjv0NHof3huAod2GAvDH3D9yZNqRSk5Sr5i7JzqGXZaRkeF5eXmJDkNkn9FlzakJe+3P0t9L2GtHXae5iXvtRSfv/jHM7EN332Glj85BiYhIJMWdoMysr5ktNLMCMxtdRb2Z2fiwfr6ZnVhTWzNLM7OZZrYofGxd+bgiIrJviStBmVkKMAHoB3QBBptZl0q79QM6hT/DgIm1aDsamOXunYBZ4XMREdmHxbtIojtQ4O6LAczsaWAg8FnMPgOBJzw4ufW+mbUys7bA4TtpOxDoHbafCuQCN+1Cf0QiJ1HnbfbGORudB5K6FO8UXztgeczzFWFZbfbZWduD3H0VQPh4YJxxiYhIPRPvCMqqKKu8DLC6fWrTtuYAzIYRTB2Snp7OsGHD6Nq1K5mZmTz88MO0bNmSUaNGceuttwJw//33c/PNN7NlyxZ+85vf8NJLL/Hll1/y85//nBUj7uSdotWc2uIgOjRqxtNrvuCwRs24ML0j96/4hBTbj/s6due3i98H4PYOJ/LgygUUlpZw1cFH88GGNXzyfSF9W7cH4NV1Kzi2aRrdm6czafVC0ho2YuQhXRmzNLgg6P8ccQqjvvyArb6Nm2a/wjPPPMOyZcsYPHgwS5cu5d133yUzM5O2bdvy7LPP0rFjRwYNGsRf/vIXUlNTGTt2LDfeeCMAd911F/fddx9FRUVce+21zJ49mwULFjBgwABKSkp4/fXXOeGEEzjhhBOYPHky6enp/OpXv+KOO+4A4MEHH2TkyJEAjB49mr/+9a989dVXXHLJJSxatIg5c+bQu3dv2rRpw/PPP8+RRx5J//79GT9+PE2aNGHMmDH8/ve/B2Ds2LHcfffdbNy4kREjRjBr1iz+85//MGjQIDZs2MCsWbPIyMiga9euTJ06lYMPPpirrrqKu+++e4dYbrnlFiZNmsTq1au5/PLLWbBgAXl5efTp04fmzZvz0ksv0blzZ/r06cOECRNo1qwZt9xyCzfffDMAf/7zn7n99tvZvHkz119/PTk5ORQUFHDBBRewdu1acnNz6dGjB506dWLatGm0a9eOSy+9lHvuuWeHWG677TYmTpzImjVruPLKK/noo4/46KOPOPvss2nUqBH/+Mc/avW7t+q7xaT9si0bZhRSuqyEFv/VhrKvt7BpzgaanNyMhu0b8d2La2nYLpUWg9qwdsIq2A8OGtOBr/+wFIADft+edY+uZuv6MlpdciCb522k5LNNND2zFQDfv7GeRl32p8mJzVg/7RtSWjVg1c2ruP322wF44IEHuPHGG9m6dSujRo3S7149+90bFMf73qpVq5g9ezY9e/akQ4cOPPXUUxx22GFcdNFF3HfffaSkpHD//ffz61//GoAxY8Ywbtw4CgsLueaaa5gzZw7z58+nX79+AAx7bAbHHXccPXr04LHHHiMtLY0bbrghrt+9at/v41lmbmanAn9093PC5zcDuPvYmH0eAXLd/anw+UKC6bvDq2tbvo+7rwqnA3Pd/eia4tndZeZlmVftctvd1WD2pIS9ttSt+jTFJ7I37Kll5nOBTmbW0cxSgYuB6ZX2mQ5cFq7mOwUoCqftdtZ2OnB5uH058HKccYmISD0T1xSfu5eZ2XXAa0AKMNndF5jZtWH9w0AO0B8oADYBV+ysbXjoe4BnzewqYBlw4W73TEREklrclzpy9xyCJBRb9nDMtgMjats2LF8L9Ik3FhERqb90JQkREYkkJSgREYkkJShJCtn52fSc1JNek3sxb9W87eqKy4oZ8sIQMqdkMuSFIRSXFW9Xn5WdxdXTr654vnjdYgY8NYAzp57JZS9eVifxi0j8lKAk8tZtXsf4OePJHZrLtJ9O4/oZ129Xn52fTec2nZl9xWyObnM02fnZFXWvfP4KLRq12G7/63Ku49HzHuWNy9/giZ8+URddEJFdoAQlkTfnqzlkHpZJakoqHVt3ZOOWjZSUlVTU5y7J5byjzgNgwFEDeHvp2wBs821MmDuBESf/sGZn6fqlbCrdxA2v3kDv7N48/9nzddsZEak13bBQIq9wcyGtm/xwgfuWjVtSuLmQts3b7lDfqnEr1m5eC8DU/Kmc3/l8GjdoXNF25YaVfLT6Iz4b/hnNGzWn56SenNnxzO2OLyLRoBGURF5akzTWF6+veF5UXERak7Qq64tKgrrismKe/ORJrjjhih2OdeyBx9KuRTtaNGpBt4O7sahwUV10Q0TipAQlkdejXQ/eWfYOpVtLWVa0jGapzWjUoFFFfVaHLHIWBV+vy1mUQ1aHLL5c9yXri9dz3t/O4/czf89rX7zG4/Me58i0I9lUuokNJRso21bGZ2s+o0PLDonqmojshKb4JPJaN2nN8JOHk5WdhZkxru848lfnM/OLmYzqNYqh3YZy5fQryZySSfsW7ZkycAqNGzQmb1hwncbcJblMmz+Nq08MVvLde9a99HuyH6XbSrnmxGs4qNlBieyeiFQjrovFRo0uFivJQBeLFdm5PXWxWBERkTqhBCUiIpGkBCUiIpGkBCUiIpGkVXwSSfVpYYEWK4jsGo2gREQkkpSgREQkkpSgREQkkpSgREQkkpSgREQkkpSgREQkkuJKUBYYb2YFZjbfzE6sZr+OZjbHzBaZ2TNmlhqW9zazIjPLD39ui2nT18wWhscevXvdEhGRZBfvCKof0Cn8GQZMrGa/e4G/uHsnYB0Qe1XW2e7eLfy5A8DMUoAJ4fG7AIPNrEucsYmISD0Sb4IaCDzhgfeBVmbWNnYHMzPgTOC5sGgqMKiG43YHCtx9sbtvAZ4OX0tERPZR8SaodsDymOcrwrJYbYD17l5WzT6nmtnHZjbDzLrGcVwREdmHxHupI6uirPINpXa2zzygg7tvNLP+wEsE04W1OW5wcLNhBNOLHHbYYbUIWUREklGNIygzG1G+qAFYCRwaU90+LIv1LcHUX4PK+7j7d+6+MdzOARqa2QEEI6aajkvY7lF3z3D3jPT09JrCFxGRJFVjgnL3CeWLGghGPJeFq/lOAYrcfVWl/R14E/hZWHQ58DKAmR0cnqPCzLqHr78WmAt0Clf/pQIXA9P3QP9ERCRJxXsOKgdYDBQAjwHDyyvMLMfMDgmf3gT81swKCM5Jld/f/GfAp2b2MTAeuDhccFEGXAe8BvwbeNbdF+xin0REpB6I6xxUODoaUU1d/5jtxQQr8yrv8xDwUDXtcwgSoIiIiK4kUZ9l52fTc1JPek3uxbxV87arKy4rZsgLQ8icksmQF4ZQXFYMwIxFMzj5sZMrysu2BYsxH8l7hB6P9yBzSiazFs+q876IyL5HCaqeWrd5HePnjCd3aC7TfjqN62dcv119dn42ndt0ZvYVszm6zdFk52cD8Ic3/8BzFz7H7Ctm03C/hsz8YibffP8Nj3z4CO9c8Q45v8jhpv+7ia3btiagVyKyL1GCqqfmfDWHzMMySU1JpWPrjmzcspGSspKK+twluZx31HkADDhqAG8vfRuArgd2ZX3xetydopIi0pums2T9Erqkd6FhSkOaN2pO09SmfLHui4T0S0T2Hbrlez1VuLmQ1k1aVzxv2bglhZsLadu87Q71rRq3Yu3mtQBcdtxl9H2yLy0ateD4g44n45AMCjcXkr86n+9KvmNDyQY+Xv0xhZsL675TIrJP0Qiqnkprksb64vUVz4uKi0hrklZlfVHJD3W/fOWXfHD1Byy8biFpTdL4+4K/k9YkjTG9xzDgqQH85rXfcPzBx3NI80MQEdmblKDqqR7tevDOsnco3VrKsqJlNEttRqMGjSrqszpkkbMoWDSZsyiHrA5ZAKTsl1IxskrfP71ipHRBlwt4a+hbjOs7jv0b7s9hLXUVDxHZuzTFV0+1btKa4ScPJys7CzNjXN9x5K/OZ+YXMxnVaxRDuw3lyulXkjklk/Yt2jNl4BQA7jrjLs6ceiaNGzSmVeNW3HTaTQBc9uJlLP9uOfs33J8H+z2YyK6JyD7Cgq82JaeMjAzPy8vb5fZlmVfVvNNe0mD2pJp32od1WXNqQl73s/T3EvK6IvsyM/vQ3TMql2uKT0REIkkJSkREIkkJSkREIkmLJOqR+nTeRueCREQjKBERiSQlKBERiSQlKBERiSQlKBERiSQlKBERiSQlKBERiSQlKBERiSQlKBERiSQlKBERiaS4EpQFxptZgZnNN7MTq9nvunAfN7MDatPezPqa2cKwbvSud0lEROqDeEdQ/YBO4c8wYGI1+/0LOAtYWpv2ZpYCTAjruwCDzaxLnLGJiEg9Em+CGgg84YH3gVZm1rbyTu7+kbsviaN9d6DA3Re7+xbg6XBfERHZR8WboNoBy2OerwjLdrf97h5XRETqmXgTlFVRFs8teatrX+vjmtkwM8szs7w1a9bE8dIiIpJMakxQZjbCzPLNLB9YCRwaU90+LKutFdW0r658B+7+qLtnuHtGenp6HC8tIiLJpMYE5e4T3L2bu3cDXgIuC1fjnQIUufuqOF5vejXt5wKdzKyjmaUCF4f7iojIPireKb4cYDFQADwGDC+vMLMcMzsk3L7ezFYQjITmm9njO2vv7mXAdcBrwL+BZ919wa52SkREkl9cd9R1dwdGVFPXP2Z7PDA+zvY5BAlMREREV5IQEZFoUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIUoISEZFIiitBWWC8mRWY2XwzO7Ga/a4L93EzOyCmvLeZFZlZfvhzW0xdXzNbGLYbvetdEhGR+qBBnPv3AzqFPz2AieFjZf8CXgFyq6ib7e7nxRaYWQowAfgJsAKYa2bT3f2zOOMTEZF6It4pvoHAEx54H2hlZm0r7+TuH7n7kjiO2x0ocPfF7r4FeDp8LRER2UfFm6DaActjnq8Iy+Jxqpl9bGYzzKzrHjyuiIjUI/FO8VkVZR5H+3lAB3ffaGb9gZcIpgtrfVwzGwYMAzjssMPieGkREUkmNY6gzGxE+aIGYCVwaEx1+7CsVtz9O3ffGG7nAA3DRRQrantcd3/U3TPcPSM9Pb22Ly0iIkmmxgTl7hPcvZu7dyMY8VwWruY7BShy91W1fTEzO9jMLNzuHr7+WmAu0MnMOppZKnAxMD3u3oiISL0R7zmoHGAxUAA8BgwvrzCzHDM7JNy+3sxWEIyE5pvZ4+FuPwM+NbOPgfHAxeGCizLgOuA14N/As+6+YDf6JSIiSS6uc1Du7sCIaur6x2yPJ0hAlfd5CHiomvY5BAlQREREV5IQEZFoUoISEZFIUoISEZFIUoISEZFIiveLuvVKg9mTEh3CHvVZ+nuJDkFEZI/RCEpERCJJCUpERCJJCUpERCJJCaqyefOgVy/o2ROys3es37ABTj0VWrWCadO2r7v3XujTB3r3hjfeCMo2bYKrr/6hfN26vRu/iEg9sU8vkqjSyJFB4mnXDk45BQYOhNatf6hv0gRefBEefnj7djNmQFERzJq1ffmYMfDzn8PZZ+/92EVE6hGNoGKVlMD330PHjpCaCpmZMHfu9vs0aAAHH7xj22efheLiYKR06aVBsoIgYb36ajB6uv32vd4FEZH6Qgkq1tq1wdRduVatgrLaWLkS9tsvSEg9esDYsUH5J5/AmWfCm2/CZ58FyUpERGqkBAXw0EPBCOe2234Y+UCwnZZWu2OkpUHfvsF2374wf/725WZwzjk/lIuIyE4pQQFcdx3k5sLjj8P++8OyZVBaCu+8A9271+4YvXtDXl6wnZcHRx6583IREdkpLZKobNw4GDwY3GH48B8WSAwZAk8+GWwPGAALFgTJ7J13ggUTQ4fCNdfAGWdAw4bwxBPBvvfcE5QXF0OnTjBoUCJ6JSKSdCy4xVNyysjI8Lzy0YmIiCQlM/vQ3TMql2uKT0REIkkJSkREIkkJSkREIkkJSkREIkkJSkREIimuBGWB8WZWYGbzzezEavZ70swWmtmnZjbZzBrW1N7M+oZtCsxs9O51S0REkl28I6h+QKfwZxgwsZr9ngQ6A8cCTYCrd9bezFKACWF9F2CwmXWJMzYREalH4k1QA4EnPPA+0MrM2lbeyd1zwn0c+ABoX0P77kCBuy929y3A0+G+IiKyj4o3QbUDlsc8XxGWVSmc2rsUKL9CanXt4zquiIjUf/EmKKuibGeXovhf4G13n11D+1of18yGmVmemeWtWbNmp8GKiEjyqjFBmdkIM8s3s3xgJXBoTHX7sKyqdrcD6cBvY4pXVNO+uvIduPuj7p7h7hnp6ek1hS8iIkmqxgTl7hPcvZu7dwNeAi4LV+OdAhS5+6rKbczsauAcYLC7b4upml5N+7lAJzPraGapwMXhviIiso+K62KxZmbAQ0BfYBNwhbvnhXU5wNXuvtLMyoClwIaw6QvufkcN7fsDDwApwGR3v7sW8awJXydRDgC+TeDr70nqSzSpL9GkvuxZHdx9hymxpL6aeaKZWV5VV+BNRupLNKkv0aS+1A1dSUJERCJJCUpERCJJCWr3PJroAPYg9SWa1JdoUl/qgM5BiYhIJGkEJSIikaQEJSIikaQEJfWWmen3O2LMrGWiY5DkoT/g3RB+8bjeS5Z+hlciOd7MeppZqrtvS5bYq1Mef7L3A8DM7gHGmlnnRMeyp8X+/5hZi0TGsiui+vulBLWLzMzC24lgZpeb2fmJjmlPM7M2AJ4EK2nM7FzgGeBP4c/HZna4u3tU//hqYmZpMf/2P05oMLvJzB4BOhKsGPs+weHsUZXeC64Ariq/SWsyCGPtGm73q+oWSonSINEBJKuYX8hRwCDglwkNaO/4i5lNdfdZiQ5kZ8zsHGAM8Ft3fzssexB4zczOcvflsW8iSaS/mZ0MLCK4iWc/YEOy9SP8GznE3QfElLUFegKz3H19omLbE2LeC7oDZwIj3L00sVHF5Wjg0ph78/VIcDwVNIKKk5kdWX6rejNrD5zt7r2AFWbW38xuSmyEe1QuUHGxXzNLidpoJExO04Eb3f1tM2sM4O4jgf8DXjGzhsn2pg7g7tOAM4C7gEvd/TuS82+2BcGFpgEws3RgNnAncJGZtU5QXHuEme1nZp2Ax4GWQNKMngDc/VOgBDgfmOjuRQkOqUIy/rInTDi3fA1wsZkdD6wD2pnZFGA8MAAYaWa3JDDM3WJm7cysg5mlAR8DV4TlVwD3AaPN7IxExlguTE53Ah8BfzKzBu5eXJ6kgBuA9QSfEJNCFR8AHiK4K/UYM2vi7lsTENYuMbOzwv+LYmD/sKwBcBTwG+C/gAsJp5eSSez/k7tvc/dFwK+BNOC0ZJriC00luLnscWZ2Wfn0vpk1TWRQmuKrhfJfRnf/zsweBYYR3BJkLHABwR/Z8+6+wMxygW5mlpJMbyYAZnYE8BbwN4J7ef0vsCy8BUpX4HmCT8O9gDcTFSdUJKeHgAvcfb6ZPQt8YGYnhUmqIbAV2EjwBhl5lc5lnEsQ91R3f9TMXgKeAC40syFAsbs/n7hod87MmgC/A7YQjMRfNrM3wr+RfHf/PtyvCEi6EVTM/9O1QBeCuzM8THD+83eAm9mr7r4lcVHWzMxGEtyLrzVwE8H/10XApnBU2NbMfuvuZYmITyOo2mkfM0WUCRwDDCb49F7m7neEf3gjgVuBacmWnEIdgEfc/SaCT+1O8Mn350BjghFjK+AkM2uUqCDN7GyCN+uPCZIQ7v5zgnM188xsv/AcwCUEUy7rEhVrbVio0on2e4E/Ag+Y2fHuPghINbPpwB+A/yQq3pqY2RiCBRH3ArcACwimKV83s+MI76AdfqhY4+7/SFSsu8PMRgA/A/5K8L4wwt1zCD7YjQHOSmB4NTKzXwEDCRLrycBt7v5P4J/AKcB5wKREJSfQCKpGZnYgMNnMLgQyCIbxPYDTgT7AUDObBqwCzia4SeOCBIW7uzYBPc2sD8Gij7XALIJE9TTBlFkH4DF3L0lEgGFsDxHcqfkggpO7r7n7m+5+kZk9A7wVjnSHA9e4+9pExBqHlPI3gXDkdD5wLMHf5+3AFWY2yd0HmlkGsNLdq7zjdKKFo+12QJq7v2nBkvKz3f2BcLpvClBoZqVAobtfG7ZLxkUsbQimKa8GvgNuMbNG7v6cmW0mSMyRE/NvfSDBTNDlwFfATeEU7PPu/oyZNXP3jYmMVQmqZg0JprX2I/jk93X45jzTzL4DJhJMh90NnJ9kq3eAH35h3X1OuPrtUIJR4N/DYf4Yd/+FmX0M4O4bEviG8h0w1N3fNbOjCUZJZ4fh5IZJ6iVgMnC8u3+WgBhrzcwOAPLM7ER3LwROArKA7uH/xziCDwYjzexxd/8gkfHWxN23mNkSgje+d4CVwPlm9rK732Nmfyf4wNOk/INcOOLdVu1BEyyc4rfYGMOy9gQzDQvdvV9Yfq2ZbXL3JxITba10MrPFwBHAc8BqYKC7l5nZdUCZmT2S6OQEmuKrkbt/BbwH9AY+BNaFoyncfQ7BH+EG4PtkTE4QzKdb8OXWcwkScLa73x9WrwBWhZ+mNrj7hvI2CYp1bpic9nP3hQRTfaXAOWZ2erjPIODQqCcnAHf/FhgJvGtmLd39DoIR4mgzO87d1wAPEvw/LElcpDtnZgPN7LzwjfsTgtE47v4ywd1aHw6ff+Hui2OSk0U5OYUal8doZj8xs6zw9/8egkU488K6Kwg+TLyfqEBrEiagfxJMv35JMFLPDZPTUIJZh1lRGc3qauZVCN/oBhJ80psM3AgUuPtYMxtGsGCgEcHqscsIpvWWJSre3WVmWQSreF4kmFO/A/hbzInsMcB0d/8wcVFWLxzl/YJgyuUZd/9XgkOKm5n1J1gJehLBKPHmcPtudy8/rxbJN3ILvm5xL9Cc4Eu4BwDdgF+4+0wLLjl1J/DP8MNF0kznmdmPCPp2FdCf4BzzBoLFRC8SfDiaACwnGFFdFdUPRmb2XwTnle4lOB3RAuhM8OH7n8AJBFPikYlfCaoK4dTRj4GfEHxC+hnBJ9jrgH8DxxMskmhMsKjg08REuvvMrCvBp76/h28mvQjOe0x19ycTG13thec6fgo8Ho46kk6YpMYRnOv8juBNvSNwJbAlGd7ULVgJeijBebRvgTfd/Z1wtVtrdx+b0ADjZGYdCEa4HQneL88Pp2VvIlhl+STwOcF7QapH9EvHZtaOYCbo/9z9ynCR0wUE/1ctCH7vSjxC34ECJahaCVcenUuwgu358vMAFnzvJmErXHZHOBUDwUqxCwj+0B5w981mdh7B6qs+7r4pQSHGzYIv5CblNGu5MEndD5zq7uvNrE0SLPKoOI8Z8/gjgpPvqcALBNOT0whWhBZFPdnGLhAwsxMIzgveDJzm7ovMrCPBdFhjYIq7z0tctLVjweXYHiK44srT4ch2KNAJuDeKyVWLJKpR6Q9ufrgqZwjBqrEUd38vGZNT7Aoed//azP5IsAz7GMJLzxCc2N6QuCh3TbInJwB3zwlXws0ys4xkSE7wwznJmMcvzOyvwAjgR+7+gZmdmwz/R+Ho4lIz+4rgPfIkgmsI/gi4y8xucvcvzexhgi+yL09ctLXn7i+YWQnBBXsJk1Q20LT83HLUaAQVh/owjQQQjpD+hyAZferuE8zs/xF8b2Ml4Zd0wxPckgBRWOK7J5jZQe7+dbidTOeeuhB8wXgL0NHdS8NR01CCK2HcGibhpJtFseCajo8Cv3H35xIdz84oQcUpWaeRYkaErQi+6DmT4I/vauAjd7/PzG4g+ILecx7hqxRI8kmG5BS7ECVMUHcDhwNj3f3ZsLw9wdTeIQR/O1uj3q+qmNlPgC/cfXGiY9kZTfHFKRmTE1QsJc8kuPhoS4JVSKUElwK6wcxuc/c7LLih3Llmtgp4Lxn/+CR6kuH3KCY5/ZLg8kVLCK4ScaeZNXX3KQQJ65/Av5Nt5BTL3WcmOoba0Peg6jEzSw1PhGJmpxAsY94fOIdg0cc2II/gxOmxFlxu/17gC2BxMrypiOxJZnYBcD3BFS8aEHyZ9Rng/5nZUwR/H0s9+FK17GWa4qunwqXyNwJPEVxK/88E36mZYWYXE1yV/X8JvssB0DxqS0xF6lp4LnaLu/93uFjlSoLvCk0k+GrJUx58QVzqgEZQ9VA4f/4cwbXA/kPw5cmtwIjwXMDTwCME3+W4wIPbBSg5icBnQKaZdXH3Le7+MMEXWL939z8qOdUtjaDqGQvuWfUi8KS7T44p/zHBhW43ATeE56QGA1+6e2QvzSJSl8JFRL8juO5mLtAEuA3ol8wrd5OVElQ9Y8F9kCYBI929KHbVoZkdQ/DF3G/dfUQCwxSJLDM7hODL6wMIFhGNcfePExvVvklTfPVPU4IpidMgWHVoZilh3VqCeyg1Dy9xJCKVuPtKd38QGARcquSUOFpmXs+El8d5ELjAzL5y9/yY6gyCC92O8ojeT0gkKpLpMl/1lUZQ9dOLBDdQvNbMzgS2hReBvR/4q5KTiCQDnYOqp8zsIIILcw4nuF/Nj4B73P2lRMYlIlJbSlD1XJiotgGN3H1FMlxyRkQElKBERCSidA5KREQiSQlKREQiSQlKREQiSQlKREQiSQlKREQiSQlKREQiSQlKREQi6f8DfQD47F4wsI8AAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"waterfall(valid_xs_final.columns, contributions[0], threshold=0.08, \n",
" rotation_value=45,formatting='{:,.3f}');"
]
},
{
"cell_type": "markdown",
"id": "ed896bc5",
"metadata": {},
"source": [
"This plot is a nice visualization of how much each factor affected the tier. One thing to look at is the confusion matrix, which helps visualize the accuracy of the predictions."
]
},
{
"cell_type": "code",
"execution_count": 122,
"id": "841fcb62",
"metadata": {},
"outputs": [],
"source": [
"def create_confusion_matrix(m, xs, y):\n",
" predictions = m.predict(xs).round()\n",
" return confusion_matrix(y, predictions)\n",
"def plot_confusion(c):\n",
" plt.figure(figsize=(15, 10))\n",
" ax = sns.heatmap(c, annot=True, cmap='Blues')\n",
"\n",
" ax.set_title('Confusion Matrix with labels\\n\\n');\n",
" ax.set_xlabel('\\nPredicted QB Tier')\n",
" ax.set_ylabel('Actual QB Tier');\n",
"\n",
" ## Ticket labels - List must be in alphabetical order\n",
" labels = ['Elite','Above-Average', 'Average', \"Below-Average\", \"Poor\"]\n",
" try:\n",
" ax.xaxis.set_ticklabels(labels)\n",
" ax.yaxis.set_ticklabels(labels)\n",
" except ValueError:\n",
" ax.xaxis.set_ticklabels(labels[:-1])\n",
" ax.yaxis.set_ticklabels(labels[:-1]) \n",
" \n",
"\n",
" ## Display the visualization of the Confusion Matrix.\n",
" plt.show() "
]
},
{
"cell_type": "code",
"execution_count": 123,
"id": "e4fb9c71",
"metadata": {},
"outputs": [],
"source": [
"cf1 = create_confusion_matrix(m, xs_final, y)\n",
"cf2 = create_confusion_matrix(m, valid_xs_final, valid_y)"
]
},
{
"cell_type": "code",
"execution_count": 124,
"id": "d8d9cc61",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"