avngrstark commited on
Commit
9f6845b
·
verified ·
1 Parent(s): 42e6e72

Upload main.ipynb

Browse files
Files changed (1) hide show
  1. main.ipynb +1717 -0
main.ipynb ADDED
@@ -0,0 +1,1717 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "id": "c04aac25-dad2-4b5b-b7e9-6102add4febb",
7
+ "metadata": {},
8
+ "outputs": [],
9
+ "source": [
10
+ "import pandas as pd\n",
11
+ "import numpy as np\n",
12
+ "import matplotlib.pyplot as plt\n",
13
+ "import seaborn as sns"
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "code",
18
+ "execution_count": 2,
19
+ "id": "6b397e38-4659-4205-9716-f72f20e5d865",
20
+ "metadata": {},
21
+ "outputs": [],
22
+ "source": [
23
+ "movies = pd.read_csv('data/tmdb_5000_movies.csv')\n",
24
+ "credits = pd.read_csv('data/tmdb_5000_credits.csv')"
25
+ ]
26
+ },
27
+ {
28
+ "cell_type": "code",
29
+ "execution_count": 3,
30
+ "id": "c3ad363c-de60-4f3f-829f-75dc4ec20b37",
31
+ "metadata": {},
32
+ "outputs": [
33
+ {
34
+ "data": {
35
+ "text/html": [
36
+ "<div>\n",
37
+ "<style scoped>\n",
38
+ " .dataframe tbody tr th:only-of-type {\n",
39
+ " vertical-align: middle;\n",
40
+ " }\n",
41
+ "\n",
42
+ " .dataframe tbody tr th {\n",
43
+ " vertical-align: top;\n",
44
+ " }\n",
45
+ "\n",
46
+ " .dataframe thead th {\n",
47
+ " text-align: right;\n",
48
+ " }\n",
49
+ "</style>\n",
50
+ "<table border=\"1\" class=\"dataframe\">\n",
51
+ " <thead>\n",
52
+ " <tr style=\"text-align: right;\">\n",
53
+ " <th></th>\n",
54
+ " <th>budget</th>\n",
55
+ " <th>genres</th>\n",
56
+ " <th>homepage</th>\n",
57
+ " <th>id</th>\n",
58
+ " <th>keywords</th>\n",
59
+ " <th>original_language</th>\n",
60
+ " <th>original_title</th>\n",
61
+ " <th>overview</th>\n",
62
+ " <th>popularity</th>\n",
63
+ " <th>production_companies</th>\n",
64
+ " <th>production_countries</th>\n",
65
+ " <th>release_date</th>\n",
66
+ " <th>revenue</th>\n",
67
+ " <th>runtime</th>\n",
68
+ " <th>spoken_languages</th>\n",
69
+ " <th>status</th>\n",
70
+ " <th>tagline</th>\n",
71
+ " <th>title</th>\n",
72
+ " <th>vote_average</th>\n",
73
+ " <th>vote_count</th>\n",
74
+ " </tr>\n",
75
+ " </thead>\n",
76
+ " <tbody>\n",
77
+ " <tr>\n",
78
+ " <th>0</th>\n",
79
+ " <td>237000000</td>\n",
80
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
81
+ " <td>http://www.avatarmovie.com/</td>\n",
82
+ " <td>19995</td>\n",
83
+ " <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",
84
+ " <td>en</td>\n",
85
+ " <td>Avatar</td>\n",
86
+ " <td>In the 22nd century, a paraplegic Marine is di...</td>\n",
87
+ " <td>150.437577</td>\n",
88
+ " <td>[{\"name\": \"Ingenious Film Partners\", \"id\": 289...</td>\n",
89
+ " <td>[{\"iso_3166_1\": \"US\", \"name\": \"United States o...</td>\n",
90
+ " <td>2009-12-10</td>\n",
91
+ " <td>2787965087</td>\n",
92
+ " <td>162.0</td>\n",
93
+ " <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...</td>\n",
94
+ " <td>Released</td>\n",
95
+ " <td>Enter the World of Pandora.</td>\n",
96
+ " <td>Avatar</td>\n",
97
+ " <td>7.2</td>\n",
98
+ " <td>11800</td>\n",
99
+ " </tr>\n",
100
+ " <tr>\n",
101
+ " <th>1</th>\n",
102
+ " <td>300000000</td>\n",
103
+ " <td>[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...</td>\n",
104
+ " <td>http://disney.go.com/disneypictures/pirates/</td>\n",
105
+ " <td>285</td>\n",
106
+ " <td>[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...</td>\n",
107
+ " <td>en</td>\n",
108
+ " <td>Pirates of the Caribbean: At World's End</td>\n",
109
+ " <td>Captain Barbossa, long believed to be dead, ha...</td>\n",
110
+ " <td>139.082615</td>\n",
111
+ " <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...</td>\n",
112
+ " <td>[{\"iso_3166_1\": \"US\", \"name\": \"United States o...</td>\n",
113
+ " <td>2007-05-19</td>\n",
114
+ " <td>961000000</td>\n",
115
+ " <td>169.0</td>\n",
116
+ " <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
117
+ " <td>Released</td>\n",
118
+ " <td>At the end of the world, the adventure begins.</td>\n",
119
+ " <td>Pirates of the Caribbean: At World's End</td>\n",
120
+ " <td>6.9</td>\n",
121
+ " <td>4500</td>\n",
122
+ " </tr>\n",
123
+ " <tr>\n",
124
+ " <th>2</th>\n",
125
+ " <td>245000000</td>\n",
126
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
127
+ " <td>http://www.sonypictures.com/movies/spectre/</td>\n",
128
+ " <td>206647</td>\n",
129
+ " <td>[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...</td>\n",
130
+ " <td>en</td>\n",
131
+ " <td>Spectre</td>\n",
132
+ " <td>A cryptic message from Bond’s past sends him o...</td>\n",
133
+ " <td>107.376788</td>\n",
134
+ " <td>[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...</td>\n",
135
+ " <td>[{\"iso_3166_1\": \"GB\", \"name\": \"United Kingdom\"...</td>\n",
136
+ " <td>2015-10-26</td>\n",
137
+ " <td>880674609</td>\n",
138
+ " <td>148.0</td>\n",
139
+ " <td>[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...</td>\n",
140
+ " <td>Released</td>\n",
141
+ " <td>A Plan No One Escapes</td>\n",
142
+ " <td>Spectre</td>\n",
143
+ " <td>6.3</td>\n",
144
+ " <td>4466</td>\n",
145
+ " </tr>\n",
146
+ " <tr>\n",
147
+ " <th>3</th>\n",
148
+ " <td>250000000</td>\n",
149
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...</td>\n",
150
+ " <td>http://www.thedarkknightrises.com/</td>\n",
151
+ " <td>49026</td>\n",
152
+ " <td>[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...</td>\n",
153
+ " <td>en</td>\n",
154
+ " <td>The Dark Knight Rises</td>\n",
155
+ " <td>Following the death of District Attorney Harve...</td>\n",
156
+ " <td>112.312950</td>\n",
157
+ " <td>[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...</td>\n",
158
+ " <td>[{\"iso_3166_1\": \"US\", \"name\": \"United States o...</td>\n",
159
+ " <td>2012-07-16</td>\n",
160
+ " <td>1084939099</td>\n",
161
+ " <td>165.0</td>\n",
162
+ " <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
163
+ " <td>Released</td>\n",
164
+ " <td>The Legend Ends</td>\n",
165
+ " <td>The Dark Knight Rises</td>\n",
166
+ " <td>7.6</td>\n",
167
+ " <td>9106</td>\n",
168
+ " </tr>\n",
169
+ " <tr>\n",
170
+ " <th>4</th>\n",
171
+ " <td>260000000</td>\n",
172
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
173
+ " <td>http://movies.disney.com/john-carter</td>\n",
174
+ " <td>49529</td>\n",
175
+ " <td>[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...</td>\n",
176
+ " <td>en</td>\n",
177
+ " <td>John Carter</td>\n",
178
+ " <td>John Carter is a war-weary, former military ca...</td>\n",
179
+ " <td>43.926995</td>\n",
180
+ " <td>[{\"name\": \"Walt Disney Pictures\", \"id\": 2}]</td>\n",
181
+ " <td>[{\"iso_3166_1\": \"US\", \"name\": \"United States o...</td>\n",
182
+ " <td>2012-03-07</td>\n",
183
+ " <td>284139100</td>\n",
184
+ " <td>132.0</td>\n",
185
+ " <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
186
+ " <td>Released</td>\n",
187
+ " <td>Lost in our world, found in another.</td>\n",
188
+ " <td>John Carter</td>\n",
189
+ " <td>6.1</td>\n",
190
+ " <td>2124</td>\n",
191
+ " </tr>\n",
192
+ " </tbody>\n",
193
+ "</table>\n",
194
+ "</div>"
195
+ ],
196
+ "text/plain": [
197
+ " budget genres \\\n",
198
+ "0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",
199
+ "1 300000000 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n",
200
+ "2 245000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",
201
+ "3 250000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n",
202
+ "4 260000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",
203
+ "\n",
204
+ " homepage id \\\n",
205
+ "0 http://www.avatarmovie.com/ 19995 \n",
206
+ "1 http://disney.go.com/disneypictures/pirates/ 285 \n",
207
+ "2 http://www.sonypictures.com/movies/spectre/ 206647 \n",
208
+ "3 http://www.thedarkknightrises.com/ 49026 \n",
209
+ "4 http://movies.disney.com/john-carter 49529 \n",
210
+ "\n",
211
+ " keywords original_language \\\n",
212
+ "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n",
213
+ "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... en \n",
214
+ "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... en \n",
215
+ "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... en \n",
216
+ "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... en \n",
217
+ "\n",
218
+ " original_title \\\n",
219
+ "0 Avatar \n",
220
+ "1 Pirates of the Caribbean: At World's End \n",
221
+ "2 Spectre \n",
222
+ "3 The Dark Knight Rises \n",
223
+ "4 John Carter \n",
224
+ "\n",
225
+ " overview popularity \\\n",
226
+ "0 In the 22nd century, a paraplegic Marine is di... 150.437577 \n",
227
+ "1 Captain Barbossa, long believed to be dead, ha... 139.082615 \n",
228
+ "2 A cryptic message from Bond’s past sends him o... 107.376788 \n",
229
+ "3 Following the death of District Attorney Harve... 112.312950 \n",
230
+ "4 John Carter is a war-weary, former military ca... 43.926995 \n",
231
+ "\n",
232
+ " production_companies \\\n",
233
+ "0 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... \n",
234
+ "1 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"... \n",
235
+ "2 [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam... \n",
236
+ "3 [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"... \n",
237
+ "4 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}] \n",
238
+ "\n",
239
+ " production_countries release_date revenue \\\n",
240
+ "0 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2009-12-10 2787965087 \n",
241
+ "1 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2007-05-19 961000000 \n",
242
+ "2 [{\"iso_3166_1\": \"GB\", \"name\": \"United Kingdom\"... 2015-10-26 880674609 \n",
243
+ "3 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2012-07-16 1084939099 \n",
244
+ "4 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2012-03-07 284139100 \n",
245
+ "\n",
246
+ " runtime spoken_languages status \\\n",
247
+ "0 162.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n",
248
+ "1 169.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n",
249
+ "2 148.0 [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},... Released \n",
250
+ "3 165.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n",
251
+ "4 132.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n",
252
+ "\n",
253
+ " tagline \\\n",
254
+ "0 Enter the World of Pandora. \n",
255
+ "1 At the end of the world, the adventure begins. \n",
256
+ "2 A Plan No One Escapes \n",
257
+ "3 The Legend Ends \n",
258
+ "4 Lost in our world, found in another. \n",
259
+ "\n",
260
+ " title vote_average vote_count \n",
261
+ "0 Avatar 7.2 11800 \n",
262
+ "1 Pirates of the Caribbean: At World's End 6.9 4500 \n",
263
+ "2 Spectre 6.3 4466 \n",
264
+ "3 The Dark Knight Rises 7.6 9106 \n",
265
+ "4 John Carter 6.1 2124 "
266
+ ]
267
+ },
268
+ "execution_count": 3,
269
+ "metadata": {},
270
+ "output_type": "execute_result"
271
+ }
272
+ ],
273
+ "source": [
274
+ "movies.head()"
275
+ ]
276
+ },
277
+ {
278
+ "cell_type": "code",
279
+ "execution_count": 4,
280
+ "id": "2d4d19ed-104f-46c1-96c9-5ffd94cf8b15",
281
+ "metadata": {},
282
+ "outputs": [
283
+ {
284
+ "data": {
285
+ "text/html": [
286
+ "<div>\n",
287
+ "<style scoped>\n",
288
+ " .dataframe tbody tr th:only-of-type {\n",
289
+ " vertical-align: middle;\n",
290
+ " }\n",
291
+ "\n",
292
+ " .dataframe tbody tr th {\n",
293
+ " vertical-align: top;\n",
294
+ " }\n",
295
+ "\n",
296
+ " .dataframe thead th {\n",
297
+ " text-align: right;\n",
298
+ " }\n",
299
+ "</style>\n",
300
+ "<table border=\"1\" class=\"dataframe\">\n",
301
+ " <thead>\n",
302
+ " <tr style=\"text-align: right;\">\n",
303
+ " <th></th>\n",
304
+ " <th>movie_id</th>\n",
305
+ " <th>title</th>\n",
306
+ " <th>cast</th>\n",
307
+ " <th>crew</th>\n",
308
+ " </tr>\n",
309
+ " </thead>\n",
310
+ " <tbody>\n",
311
+ " <tr>\n",
312
+ " <th>0</th>\n",
313
+ " <td>19995</td>\n",
314
+ " <td>Avatar</td>\n",
315
+ " <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",
316
+ " <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",
317
+ " </tr>\n",
318
+ " <tr>\n",
319
+ " <th>1</th>\n",
320
+ " <td>285</td>\n",
321
+ " <td>Pirates of the Caribbean: At World's End</td>\n",
322
+ " <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",
323
+ " <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",
324
+ " </tr>\n",
325
+ " <tr>\n",
326
+ " <th>2</th>\n",
327
+ " <td>206647</td>\n",
328
+ " <td>Spectre</td>\n",
329
+ " <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",
330
+ " <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",
331
+ " </tr>\n",
332
+ " <tr>\n",
333
+ " <th>3</th>\n",
334
+ " <td>49026</td>\n",
335
+ " <td>The Dark Knight Rises</td>\n",
336
+ " <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",
337
+ " <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",
338
+ " </tr>\n",
339
+ " <tr>\n",
340
+ " <th>4</th>\n",
341
+ " <td>49529</td>\n",
342
+ " <td>John Carter</td>\n",
343
+ " <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",
344
+ " <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",
345
+ " </tr>\n",
346
+ " </tbody>\n",
347
+ "</table>\n",
348
+ "</div>"
349
+ ],
350
+ "text/plain": [
351
+ " movie_id title \\\n",
352
+ "0 19995 Avatar \n",
353
+ "1 285 Pirates of the Caribbean: At World's End \n",
354
+ "2 206647 Spectre \n",
355
+ "3 49026 The Dark Knight Rises \n",
356
+ "4 49529 John Carter \n",
357
+ "\n",
358
+ " cast \\\n",
359
+ "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n",
360
+ "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n",
361
+ "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n",
362
+ "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n",
363
+ "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n",
364
+ "\n",
365
+ " crew \n",
366
+ "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",
367
+ "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n",
368
+ "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n",
369
+ "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n",
370
+ "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... "
371
+ ]
372
+ },
373
+ "execution_count": 4,
374
+ "metadata": {},
375
+ "output_type": "execute_result"
376
+ }
377
+ ],
378
+ "source": [
379
+ "credits.head()"
380
+ ]
381
+ },
382
+ {
383
+ "cell_type": "code",
384
+ "execution_count": 5,
385
+ "id": "1203ff17-269a-4641-8045-5a16e204460c",
386
+ "metadata": {},
387
+ "outputs": [],
388
+ "source": [
389
+ "movies = movies.merge(credits, on='title')"
390
+ ]
391
+ },
392
+ {
393
+ "cell_type": "code",
394
+ "execution_count": 6,
395
+ "id": "1764ac22-2898-453b-a19c-da36b619a2ce",
396
+ "metadata": {},
397
+ "outputs": [
398
+ {
399
+ "name": "stdout",
400
+ "output_type": "stream",
401
+ "text": [
402
+ "<class 'pandas.core.frame.DataFrame'>\n",
403
+ "RangeIndex: 4809 entries, 0 to 4808\n",
404
+ "Data columns (total 23 columns):\n",
405
+ " # Column Non-Null Count Dtype \n",
406
+ "--- ------ -------------- ----- \n",
407
+ " 0 budget 4809 non-null int64 \n",
408
+ " 1 genres 4809 non-null object \n",
409
+ " 2 homepage 1713 non-null object \n",
410
+ " 3 id 4809 non-null int64 \n",
411
+ " 4 keywords 4809 non-null object \n",
412
+ " 5 original_language 4809 non-null object \n",
413
+ " 6 original_title 4809 non-null object \n",
414
+ " 7 overview 4806 non-null object \n",
415
+ " 8 popularity 4809 non-null float64\n",
416
+ " 9 production_companies 4809 non-null object \n",
417
+ " 10 production_countries 4809 non-null object \n",
418
+ " 11 release_date 4808 non-null object \n",
419
+ " 12 revenue 4809 non-null int64 \n",
420
+ " 13 runtime 4807 non-null float64\n",
421
+ " 14 spoken_languages 4809 non-null object \n",
422
+ " 15 status 4809 non-null object \n",
423
+ " 16 tagline 3965 non-null object \n",
424
+ " 17 title 4809 non-null object \n",
425
+ " 18 vote_average 4809 non-null float64\n",
426
+ " 19 vote_count 4809 non-null int64 \n",
427
+ " 20 movie_id 4809 non-null int64 \n",
428
+ " 21 cast 4809 non-null object \n",
429
+ " 22 crew 4809 non-null object \n",
430
+ "dtypes: float64(3), int64(5), object(15)\n",
431
+ "memory usage: 864.2+ KB\n"
432
+ ]
433
+ }
434
+ ],
435
+ "source": [
436
+ "movies.info()"
437
+ ]
438
+ },
439
+ {
440
+ "cell_type": "code",
441
+ "execution_count": 7,
442
+ "id": "5f039dca-eeae-44cd-8cab-4ea1e28c2b62",
443
+ "metadata": {},
444
+ "outputs": [],
445
+ "source": [
446
+ "movies = movies[['movie_id', 'title', 'genres', 'overview', 'keywords', 'cast', 'crew']]"
447
+ ]
448
+ },
449
+ {
450
+ "cell_type": "code",
451
+ "execution_count": 8,
452
+ "id": "533b1e16-c9d3-42b0-82bc-80a4d4fdc029",
453
+ "metadata": {},
454
+ "outputs": [
455
+ {
456
+ "data": {
457
+ "text/html": [
458
+ "<div>\n",
459
+ "<style scoped>\n",
460
+ " .dataframe tbody tr th:only-of-type {\n",
461
+ " vertical-align: middle;\n",
462
+ " }\n",
463
+ "\n",
464
+ " .dataframe tbody tr th {\n",
465
+ " vertical-align: top;\n",
466
+ " }\n",
467
+ "\n",
468
+ " .dataframe thead th {\n",
469
+ " text-align: right;\n",
470
+ " }\n",
471
+ "</style>\n",
472
+ "<table border=\"1\" class=\"dataframe\">\n",
473
+ " <thead>\n",
474
+ " <tr style=\"text-align: right;\">\n",
475
+ " <th></th>\n",
476
+ " <th>movie_id</th>\n",
477
+ " <th>title</th>\n",
478
+ " <th>genres</th>\n",
479
+ " <th>overview</th>\n",
480
+ " <th>keywords</th>\n",
481
+ " <th>cast</th>\n",
482
+ " <th>crew</th>\n",
483
+ " </tr>\n",
484
+ " </thead>\n",
485
+ " <tbody>\n",
486
+ " <tr>\n",
487
+ " <th>0</th>\n",
488
+ " <td>19995</td>\n",
489
+ " <td>Avatar</td>\n",
490
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
491
+ " <td>In the 22nd century, a paraplegic Marine is di...</td>\n",
492
+ " <td>[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...</td>\n",
493
+ " <td>[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...</td>\n",
494
+ " <td>[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...</td>\n",
495
+ " </tr>\n",
496
+ " <tr>\n",
497
+ " <th>1</th>\n",
498
+ " <td>285</td>\n",
499
+ " <td>Pirates of the Caribbean: At World's End</td>\n",
500
+ " <td>[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...</td>\n",
501
+ " <td>Captain Barbossa, long believed to be dead, ha...</td>\n",
502
+ " <td>[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...</td>\n",
503
+ " <td>[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...</td>\n",
504
+ " <td>[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...</td>\n",
505
+ " </tr>\n",
506
+ " <tr>\n",
507
+ " <th>2</th>\n",
508
+ " <td>206647</td>\n",
509
+ " <td>Spectre</td>\n",
510
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
511
+ " <td>A cryptic message from Bond’s past sends him o...</td>\n",
512
+ " <td>[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...</td>\n",
513
+ " <td>[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...</td>\n",
514
+ " <td>[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...</td>\n",
515
+ " </tr>\n",
516
+ " <tr>\n",
517
+ " <th>3</th>\n",
518
+ " <td>49026</td>\n",
519
+ " <td>The Dark Knight Rises</td>\n",
520
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...</td>\n",
521
+ " <td>Following the death of District Attorney Harve...</td>\n",
522
+ " <td>[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...</td>\n",
523
+ " <td>[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...</td>\n",
524
+ " <td>[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...</td>\n",
525
+ " </tr>\n",
526
+ " <tr>\n",
527
+ " <th>4</th>\n",
528
+ " <td>49529</td>\n",
529
+ " <td>John Carter</td>\n",
530
+ " <td>[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...</td>\n",
531
+ " <td>John Carter is a war-weary, former military ca...</td>\n",
532
+ " <td>[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...</td>\n",
533
+ " <td>[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...</td>\n",
534
+ " <td>[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...</td>\n",
535
+ " </tr>\n",
536
+ " </tbody>\n",
537
+ "</table>\n",
538
+ "</div>"
539
+ ],
540
+ "text/plain": [
541
+ " movie_id title \\\n",
542
+ "0 19995 Avatar \n",
543
+ "1 285 Pirates of the Caribbean: At World's End \n",
544
+ "2 206647 Spectre \n",
545
+ "3 49026 The Dark Knight Rises \n",
546
+ "4 49529 John Carter \n",
547
+ "\n",
548
+ " genres \\\n",
549
+ "0 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",
550
+ "1 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n",
551
+ "2 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",
552
+ "3 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n",
553
+ "4 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n",
554
+ "\n",
555
+ " overview \\\n",
556
+ "0 In the 22nd century, a paraplegic Marine is di... \n",
557
+ "1 Captain Barbossa, long believed to be dead, ha... \n",
558
+ "2 A cryptic message from Bond’s past sends him o... \n",
559
+ "3 Following the death of District Attorney Harve... \n",
560
+ "4 John Carter is a war-weary, former military ca... \n",
561
+ "\n",
562
+ " keywords \\\n",
563
+ "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... \n",
564
+ "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... \n",
565
+ "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... \n",
566
+ "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... \n",
567
+ "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... \n",
568
+ "\n",
569
+ " cast \\\n",
570
+ "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n",
571
+ "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n",
572
+ "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n",
573
+ "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n",
574
+ "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n",
575
+ "\n",
576
+ " crew \n",
577
+ "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n",
578
+ "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n",
579
+ "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n",
580
+ "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n",
581
+ "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... "
582
+ ]
583
+ },
584
+ "execution_count": 8,
585
+ "metadata": {},
586
+ "output_type": "execute_result"
587
+ }
588
+ ],
589
+ "source": [
590
+ "movies.head()"
591
+ ]
592
+ },
593
+ {
594
+ "cell_type": "code",
595
+ "execution_count": 9,
596
+ "id": "309f68b0-46b9-492e-bf0c-605fc91e1e57",
597
+ "metadata": {},
598
+ "outputs": [],
599
+ "source": [
600
+ "movies = movies.dropna()"
601
+ ]
602
+ },
603
+ {
604
+ "cell_type": "code",
605
+ "execution_count": 10,
606
+ "id": "8a22d175-b884-43f5-b013-5fd549100b19",
607
+ "metadata": {},
608
+ "outputs": [
609
+ {
610
+ "data": {
611
+ "text/plain": [
612
+ "movie_id 0\n",
613
+ "title 0\n",
614
+ "genres 0\n",
615
+ "overview 0\n",
616
+ "keywords 0\n",
617
+ "cast 0\n",
618
+ "crew 0\n",
619
+ "dtype: int64"
620
+ ]
621
+ },
622
+ "execution_count": 10,
623
+ "metadata": {},
624
+ "output_type": "execute_result"
625
+ }
626
+ ],
627
+ "source": [
628
+ "movies.isnull().sum()"
629
+ ]
630
+ },
631
+ {
632
+ "cell_type": "code",
633
+ "execution_count": 11,
634
+ "id": "d9f5ccca-3a68-43cd-841e-97e8fdd7f68f",
635
+ "metadata": {},
636
+ "outputs": [
637
+ {
638
+ "data": {
639
+ "text/plain": [
640
+ "0"
641
+ ]
642
+ },
643
+ "execution_count": 11,
644
+ "metadata": {},
645
+ "output_type": "execute_result"
646
+ }
647
+ ],
648
+ "source": [
649
+ "movies.duplicated().sum()"
650
+ ]
651
+ },
652
+ {
653
+ "cell_type": "code",
654
+ "execution_count": 12,
655
+ "id": "517bee9d-a24c-4763-943c-9d81ad7307af",
656
+ "metadata": {},
657
+ "outputs": [],
658
+ "source": [
659
+ "import ast\n",
660
+ "\n",
661
+ "def get_name(string):\n",
662
+ " List = []\n",
663
+ " for i in ast.literal_eval(string):\n",
664
+ " List.append(i['name'])\n",
665
+ " return List\n",
666
+ "\n",
667
+ "\n",
668
+ "def get_cast(string):\n",
669
+ " List = []\n",
670
+ " for i in ast.literal_eval(string):\n",
671
+ " List.append(i['name'])\n",
672
+ " if len(List)==3:\n",
673
+ " return List\n",
674
+ " return List\n",
675
+ "\n",
676
+ "\n",
677
+ "def get_director_name(string):\n",
678
+ " List = []\n",
679
+ " for i in ast.literal_eval(string):\n",
680
+ " if i['job']=='Director':\n",
681
+ " List.append(i['name'])\n",
682
+ " return List\n",
683
+ " return List"
684
+ ]
685
+ },
686
+ {
687
+ "cell_type": "code",
688
+ "execution_count": 13,
689
+ "id": "0be843da-f3f6-43b9-8ef6-da85b57775a5",
690
+ "metadata": {},
691
+ "outputs": [],
692
+ "source": [
693
+ "movies['genres'] = movies['genres'].apply(get_name)\n",
694
+ "movies['keywords'] = movies['keywords'].apply(get_name)"
695
+ ]
696
+ },
697
+ {
698
+ "cell_type": "code",
699
+ "execution_count": 14,
700
+ "id": "a0503645-58d4-4d19-9a9f-5414f8130e24",
701
+ "metadata": {},
702
+ "outputs": [],
703
+ "source": [
704
+ "movies['cast'] = movies['cast'].apply(get_cast)"
705
+ ]
706
+ },
707
+ {
708
+ "cell_type": "code",
709
+ "execution_count": 15,
710
+ "id": "b9b60133-4d3f-4197-af98-b188fa9ccf4c",
711
+ "metadata": {},
712
+ "outputs": [],
713
+ "source": [
714
+ "movies['crew'] = movies['crew'].apply(get_director_name)"
715
+ ]
716
+ },
717
+ {
718
+ "cell_type": "code",
719
+ "execution_count": 16,
720
+ "id": "607f9f9b-f4aa-4e56-a74a-e8dcf351c38f",
721
+ "metadata": {},
722
+ "outputs": [],
723
+ "source": [
724
+ "movies['overview'] = movies['overview'].apply(lambda x: x.split())"
725
+ ]
726
+ },
727
+ {
728
+ "cell_type": "code",
729
+ "execution_count": 17,
730
+ "id": "f0e93004-ae79-4b40-9323-7cb9fd9908bf",
731
+ "metadata": {},
732
+ "outputs": [],
733
+ "source": [
734
+ "movies['genres'] = movies['genres'].apply(lambda x: [i.replace(' ', '') for i in x])\n",
735
+ "movies['overview'] = movies['overview'].apply(lambda x: [i.replace(' ', '') for i in x])\n",
736
+ "movies['keywords'] = movies['keywords'].apply(lambda x: [i.replace(' ', '') for i in x])\n",
737
+ "movies['cast'] = movies['cast'].apply(lambda x: [i.replace(' ', '') for i in x])\n",
738
+ "movies['crew'] = movies['crew'].apply(lambda x: [i.replace(' ', '') for i in x])"
739
+ ]
740
+ },
741
+ {
742
+ "cell_type": "code",
743
+ "execution_count": 18,
744
+ "id": "0019446c-147f-493c-a0c7-6679b13b5b5d",
745
+ "metadata": {},
746
+ "outputs": [
747
+ {
748
+ "data": {
749
+ "text/plain": [
750
+ "movie_id 0\n",
751
+ "title 0\n",
752
+ "genres 0\n",
753
+ "overview 0\n",
754
+ "keywords 0\n",
755
+ "cast 0\n",
756
+ "crew 0\n",
757
+ "dtype: int64"
758
+ ]
759
+ },
760
+ "execution_count": 18,
761
+ "metadata": {},
762
+ "output_type": "execute_result"
763
+ }
764
+ ],
765
+ "source": [
766
+ "movies.isnull().sum()"
767
+ ]
768
+ },
769
+ {
770
+ "cell_type": "code",
771
+ "execution_count": 19,
772
+ "id": "d6b495e5-1d34-46fc-9d15-9385c15e0840",
773
+ "metadata": {},
774
+ "outputs": [
775
+ {
776
+ "data": {
777
+ "text/html": [
778
+ "<div>\n",
779
+ "<style scoped>\n",
780
+ " .dataframe tbody tr th:only-of-type {\n",
781
+ " vertical-align: middle;\n",
782
+ " }\n",
783
+ "\n",
784
+ " .dataframe tbody tr th {\n",
785
+ " vertical-align: top;\n",
786
+ " }\n",
787
+ "\n",
788
+ " .dataframe thead th {\n",
789
+ " text-align: right;\n",
790
+ " }\n",
791
+ "</style>\n",
792
+ "<table border=\"1\" class=\"dataframe\">\n",
793
+ " <thead>\n",
794
+ " <tr style=\"text-align: right;\">\n",
795
+ " <th></th>\n",
796
+ " <th>movie_id</th>\n",
797
+ " <th>title</th>\n",
798
+ " <th>genres</th>\n",
799
+ " <th>overview</th>\n",
800
+ " <th>keywords</th>\n",
801
+ " <th>cast</th>\n",
802
+ " <th>crew</th>\n",
803
+ " </tr>\n",
804
+ " </thead>\n",
805
+ " <tbody>\n",
806
+ " <tr>\n",
807
+ " <th>0</th>\n",
808
+ " <td>19995</td>\n",
809
+ " <td>Avatar</td>\n",
810
+ " <td>[Action, Adventure, Fantasy, ScienceFiction]</td>\n",
811
+ " <td>[In, the, 22nd, century,, a, paraplegic, Marin...</td>\n",
812
+ " <td>[cultureclash, future, spacewar, spacecolony, ...</td>\n",
813
+ " <td>[SamWorthington, ZoeSaldana, SigourneyWeaver]</td>\n",
814
+ " <td>[JamesCameron]</td>\n",
815
+ " </tr>\n",
816
+ " <tr>\n",
817
+ " <th>1</th>\n",
818
+ " <td>285</td>\n",
819
+ " <td>Pirates of the Caribbean: At World's End</td>\n",
820
+ " <td>[Adventure, Fantasy, Action]</td>\n",
821
+ " <td>[Captain, Barbossa,, long, believed, to, be, d...</td>\n",
822
+ " <td>[ocean, drugabuse, exoticisland, eastindiatrad...</td>\n",
823
+ " <td>[JohnnyDepp, OrlandoBloom, KeiraKnightley]</td>\n",
824
+ " <td>[GoreVerbinski]</td>\n",
825
+ " </tr>\n",
826
+ " <tr>\n",
827
+ " <th>2</th>\n",
828
+ " <td>206647</td>\n",
829
+ " <td>Spectre</td>\n",
830
+ " <td>[Action, Adventure, Crime]</td>\n",
831
+ " <td>[A, cryptic, message, from, Bond’s, past, send...</td>\n",
832
+ " <td>[spy, basedonnovel, secretagent, sequel, mi6, ...</td>\n",
833
+ " <td>[DanielCraig, ChristophWaltz, LéaSeydoux]</td>\n",
834
+ " <td>[SamMendes]</td>\n",
835
+ " </tr>\n",
836
+ " <tr>\n",
837
+ " <th>3</th>\n",
838
+ " <td>49026</td>\n",
839
+ " <td>The Dark Knight Rises</td>\n",
840
+ " <td>[Action, Crime, Drama, Thriller]</td>\n",
841
+ " <td>[Following, the, death, of, District, Attorney...</td>\n",
842
+ " <td>[dccomics, crimefighter, terrorist, secretiden...</td>\n",
843
+ " <td>[ChristianBale, MichaelCaine, GaryOldman]</td>\n",
844
+ " <td>[ChristopherNolan]</td>\n",
845
+ " </tr>\n",
846
+ " <tr>\n",
847
+ " <th>4</th>\n",
848
+ " <td>49529</td>\n",
849
+ " <td>John Carter</td>\n",
850
+ " <td>[Action, Adventure, ScienceFiction]</td>\n",
851
+ " <td>[John, Carter, is, a, war-weary,, former, mili...</td>\n",
852
+ " <td>[basedonnovel, mars, medallion, spacetravel, p...</td>\n",
853
+ " <td>[TaylorKitsch, LynnCollins, SamanthaMorton]</td>\n",
854
+ " <td>[AndrewStanton]</td>\n",
855
+ " </tr>\n",
856
+ " <tr>\n",
857
+ " <th>...</th>\n",
858
+ " <td>...</td>\n",
859
+ " <td>...</td>\n",
860
+ " <td>...</td>\n",
861
+ " <td>...</td>\n",
862
+ " <td>...</td>\n",
863
+ " <td>...</td>\n",
864
+ " <td>...</td>\n",
865
+ " </tr>\n",
866
+ " <tr>\n",
867
+ " <th>4804</th>\n",
868
+ " <td>9367</td>\n",
869
+ " <td>El Mariachi</td>\n",
870
+ " <td>[Action, Crime, Thriller]</td>\n",
871
+ " <td>[El, Mariachi, just, wants, to, play, his, gui...</td>\n",
872
+ " <td>[unitedstates–mexicobarrier, legs, arms, paper...</td>\n",
873
+ " <td>[CarlosGallardo, JaimedeHoyos, PeterMarquardt]</td>\n",
874
+ " <td>[RobertRodriguez]</td>\n",
875
+ " </tr>\n",
876
+ " <tr>\n",
877
+ " <th>4805</th>\n",
878
+ " <td>72766</td>\n",
879
+ " <td>Newlyweds</td>\n",
880
+ " <td>[Comedy, Romance]</td>\n",
881
+ " <td>[A, newlywed, couple's, honeymoon, is, upended...</td>\n",
882
+ " <td>[]</td>\n",
883
+ " <td>[EdwardBurns, KerryBishé, MarshaDietlein]</td>\n",
884
+ " <td>[EdwardBurns]</td>\n",
885
+ " </tr>\n",
886
+ " <tr>\n",
887
+ " <th>4806</th>\n",
888
+ " <td>231617</td>\n",
889
+ " <td>Signed, Sealed, Delivered</td>\n",
890
+ " <td>[Comedy, Drama, Romance, TVMovie]</td>\n",
891
+ " <td>[\"Signed,, Sealed,, Delivered\", introduces, a,...</td>\n",
892
+ " <td>[date, loveatfirstsight, narration, investigat...</td>\n",
893
+ " <td>[EricMabius, KristinBooth, CrystalLowe]</td>\n",
894
+ " <td>[ScottSmith]</td>\n",
895
+ " </tr>\n",
896
+ " <tr>\n",
897
+ " <th>4807</th>\n",
898
+ " <td>126186</td>\n",
899
+ " <td>Shanghai Calling</td>\n",
900
+ " <td>[]</td>\n",
901
+ " <td>[When, ambitious, New, York, attorney, Sam, is...</td>\n",
902
+ " <td>[]</td>\n",
903
+ " <td>[DanielHenney, ElizaCoupe, BillPaxton]</td>\n",
904
+ " <td>[DanielHsia]</td>\n",
905
+ " </tr>\n",
906
+ " <tr>\n",
907
+ " <th>4808</th>\n",
908
+ " <td>25975</td>\n",
909
+ " <td>My Date with Drew</td>\n",
910
+ " <td>[Documentary]</td>\n",
911
+ " <td>[Ever, since, the, second, grade, when, he, fi...</td>\n",
912
+ " <td>[obsession, camcorder, crush, dreamgirl]</td>\n",
913
+ " <td>[DrewBarrymore, BrianHerzlinger, CoreyFeldman]</td>\n",
914
+ " <td>[BrianHerzlinger]</td>\n",
915
+ " </tr>\n",
916
+ " </tbody>\n",
917
+ "</table>\n",
918
+ "<p>4806 rows × 7 columns</p>\n",
919
+ "</div>"
920
+ ],
921
+ "text/plain": [
922
+ " movie_id title \\\n",
923
+ "0 19995 Avatar \n",
924
+ "1 285 Pirates of the Caribbean: At World's End \n",
925
+ "2 206647 Spectre \n",
926
+ "3 49026 The Dark Knight Rises \n",
927
+ "4 49529 John Carter \n",
928
+ "... ... ... \n",
929
+ "4804 9367 El Mariachi \n",
930
+ "4805 72766 Newlyweds \n",
931
+ "4806 231617 Signed, Sealed, Delivered \n",
932
+ "4807 126186 Shanghai Calling \n",
933
+ "4808 25975 My Date with Drew \n",
934
+ "\n",
935
+ " genres \\\n",
936
+ "0 [Action, Adventure, Fantasy, ScienceFiction] \n",
937
+ "1 [Adventure, Fantasy, Action] \n",
938
+ "2 [Action, Adventure, Crime] \n",
939
+ "3 [Action, Crime, Drama, Thriller] \n",
940
+ "4 [Action, Adventure, ScienceFiction] \n",
941
+ "... ... \n",
942
+ "4804 [Action, Crime, Thriller] \n",
943
+ "4805 [Comedy, Romance] \n",
944
+ "4806 [Comedy, Drama, Romance, TVMovie] \n",
945
+ "4807 [] \n",
946
+ "4808 [Documentary] \n",
947
+ "\n",
948
+ " overview \\\n",
949
+ "0 [In, the, 22nd, century,, a, paraplegic, Marin... \n",
950
+ "1 [Captain, Barbossa,, long, believed, to, be, d... \n",
951
+ "2 [A, cryptic, message, from, Bond’s, past, send... \n",
952
+ "3 [Following, the, death, of, District, Attorney... \n",
953
+ "4 [John, Carter, is, a, war-weary,, former, mili... \n",
954
+ "... ... \n",
955
+ "4804 [El, Mariachi, just, wants, to, play, his, gui... \n",
956
+ "4805 [A, newlywed, couple's, honeymoon, is, upended... \n",
957
+ "4806 [\"Signed,, Sealed,, Delivered\", introduces, a,... \n",
958
+ "4807 [When, ambitious, New, York, attorney, Sam, is... \n",
959
+ "4808 [Ever, since, the, second, grade, when, he, fi... \n",
960
+ "\n",
961
+ " keywords \\\n",
962
+ "0 [cultureclash, future, spacewar, spacecolony, ... \n",
963
+ "1 [ocean, drugabuse, exoticisland, eastindiatrad... \n",
964
+ "2 [spy, basedonnovel, secretagent, sequel, mi6, ... \n",
965
+ "3 [dccomics, crimefighter, terrorist, secretiden... \n",
966
+ "4 [basedonnovel, mars, medallion, spacetravel, p... \n",
967
+ "... ... \n",
968
+ "4804 [unitedstates–mexicobarrier, legs, arms, paper... \n",
969
+ "4805 [] \n",
970
+ "4806 [date, loveatfirstsight, narration, investigat... \n",
971
+ "4807 [] \n",
972
+ "4808 [obsession, camcorder, crush, dreamgirl] \n",
973
+ "\n",
974
+ " cast crew \n",
975
+ "0 [SamWorthington, ZoeSaldana, SigourneyWeaver] [JamesCameron] \n",
976
+ "1 [JohnnyDepp, OrlandoBloom, KeiraKnightley] [GoreVerbinski] \n",
977
+ "2 [DanielCraig, ChristophWaltz, LéaSeydoux] [SamMendes] \n",
978
+ "3 [ChristianBale, MichaelCaine, GaryOldman] [ChristopherNolan] \n",
979
+ "4 [TaylorKitsch, LynnCollins, SamanthaMorton] [AndrewStanton] \n",
980
+ "... ... ... \n",
981
+ "4804 [CarlosGallardo, JaimedeHoyos, PeterMarquardt] [RobertRodriguez] \n",
982
+ "4805 [EdwardBurns, KerryBishé, MarshaDietlein] [EdwardBurns] \n",
983
+ "4806 [EricMabius, KristinBooth, CrystalLowe] [ScottSmith] \n",
984
+ "4807 [DanielHenney, ElizaCoupe, BillPaxton] [DanielHsia] \n",
985
+ "4808 [DrewBarrymore, BrianHerzlinger, CoreyFeldman] [BrianHerzlinger] \n",
986
+ "\n",
987
+ "[4806 rows x 7 columns]"
988
+ ]
989
+ },
990
+ "execution_count": 19,
991
+ "metadata": {},
992
+ "output_type": "execute_result"
993
+ }
994
+ ],
995
+ "source": [
996
+ "movies"
997
+ ]
998
+ },
999
+ {
1000
+ "cell_type": "code",
1001
+ "execution_count": 20,
1002
+ "id": "63cb68c3-8bcb-413b-bca2-f2ab12f6bcb2",
1003
+ "metadata": {},
1004
+ "outputs": [],
1005
+ "source": [
1006
+ "movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']\n",
1007
+ "movies = movies[['movie_id', 'title', 'tags']]"
1008
+ ]
1009
+ },
1010
+ {
1011
+ "cell_type": "code",
1012
+ "execution_count": 21,
1013
+ "id": "3eccb9ec-d8af-4cb6-99f5-256f8722d643",
1014
+ "metadata": {},
1015
+ "outputs": [
1016
+ {
1017
+ "name": "stderr",
1018
+ "output_type": "stream",
1019
+ "text": [
1020
+ "C:\\Users\\thaku\\AppData\\Local\\Temp\\ipykernel_7988\\2153568569.py:1: SettingWithCopyWarning: \n",
1021
+ "A value is trying to be set on a copy of a slice from a DataFrame.\n",
1022
+ "Try using .loc[row_indexer,col_indexer] = value instead\n",
1023
+ "\n",
1024
+ "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
1025
+ " movies['tags'] = movies['tags'].apply(lambda x: \" \".join(x))\n"
1026
+ ]
1027
+ }
1028
+ ],
1029
+ "source": [
1030
+ "movies['tags'] = movies['tags'].apply(lambda x: \" \".join(x))"
1031
+ ]
1032
+ },
1033
+ {
1034
+ "cell_type": "code",
1035
+ "execution_count": 22,
1036
+ "id": "2d415c55-fdcf-4d26-9feb-2eb20659831c",
1037
+ "metadata": {},
1038
+ "outputs": [
1039
+ {
1040
+ "name": "stderr",
1041
+ "output_type": "stream",
1042
+ "text": [
1043
+ "C:\\Users\\thaku\\AppData\\Local\\Temp\\ipykernel_7988\\3982405354.py:1: SettingWithCopyWarning: \n",
1044
+ "A value is trying to be set on a copy of a slice from a DataFrame.\n",
1045
+ "Try using .loc[row_indexer,col_indexer] = value instead\n",
1046
+ "\n",
1047
+ "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
1048
+ " movies['tags'] = movies['tags'].apply(lambda x: x.lower())\n"
1049
+ ]
1050
+ }
1051
+ ],
1052
+ "source": [
1053
+ "movies['tags'] = movies['tags'].apply(lambda x: x.lower())"
1054
+ ]
1055
+ },
1056
+ {
1057
+ "cell_type": "code",
1058
+ "execution_count": 23,
1059
+ "id": "0b5ebd6d-6e36-4fbd-9fb3-3e933a27bfc1",
1060
+ "metadata": {},
1061
+ "outputs": [
1062
+ {
1063
+ "data": {
1064
+ "text/plain": [
1065
+ "'in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. action adventure fantasy sciencefiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d samworthington zoesaldana sigourneyweaver jamescameron'"
1066
+ ]
1067
+ },
1068
+ "execution_count": 23,
1069
+ "metadata": {},
1070
+ "output_type": "execute_result"
1071
+ }
1072
+ ],
1073
+ "source": [
1074
+ "movies.iloc[0]['tags']"
1075
+ ]
1076
+ },
1077
+ {
1078
+ "cell_type": "code",
1079
+ "execution_count": 24,
1080
+ "id": "bf04370b-e001-4271-8c96-f4280ab66bec",
1081
+ "metadata": {
1082
+ "scrolled": true
1083
+ },
1084
+ "outputs": [
1085
+ {
1086
+ "ename": "TypeError",
1087
+ "evalue": "PorterStemmer.stem() missing 1 required positional argument: 'word'",
1088
+ "output_type": "error",
1089
+ "traceback": [
1090
+ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
1091
+ "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
1092
+ "Cell \u001b[1;32mIn[24], line 12\u001b[0m\n\u001b[0;32m 9\u001b[0m List\u001b[38;5;241m.\u001b[39mappend(ps\u001b[38;5;241m.\u001b[39mstem(i))\n\u001b[0;32m 10\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(List)\n\u001b[1;32m---> 12\u001b[0m movies[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtags\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[43mmovies\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mtags\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstem\u001b[49m\u001b[43m)\u001b[49m\n",
1093
+ "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\series.py:4924\u001b[0m, in \u001b[0;36mSeries.apply\u001b[1;34m(self, func, convert_dtype, args, by_row, **kwargs)\u001b[0m\n\u001b[0;32m 4789\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mapply\u001b[39m(\n\u001b[0;32m 4790\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[0;32m 4791\u001b[0m func: AggFuncType,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 4796\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m 4797\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame \u001b[38;5;241m|\u001b[39m Series:\n\u001b[0;32m 4798\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 4799\u001b[0m \u001b[38;5;124;03m Invoke function on values of Series.\u001b[39;00m\n\u001b[0;32m 4800\u001b[0m \n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 4915\u001b[0m \u001b[38;5;124;03m dtype: float64\u001b[39;00m\n\u001b[0;32m 4916\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m 4917\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mSeriesApply\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 4918\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m 4919\u001b[0m \u001b[43m \u001b[49m\u001b[43mfunc\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 4920\u001b[0m \u001b[43m \u001b[49m\u001b[43mconvert_dtype\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconvert_dtype\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 4921\u001b[0m \u001b[43m \u001b[49m\u001b[43mby_row\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mby_row\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 4922\u001b[0m \u001b[43m \u001b[49m\u001b[43margs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 4923\u001b[0m \u001b[43m \u001b[49m\u001b[43mkwargs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m-> 4924\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
1094
+ "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\apply.py:1427\u001b[0m, in \u001b[0;36mSeriesApply.apply\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m 1424\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mapply_compat()\n\u001b[0;32m 1426\u001b[0m \u001b[38;5;66;03m# self.func is Callable\u001b[39;00m\n\u001b[1;32m-> 1427\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply_standard\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
1095
+ "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\apply.py:1507\u001b[0m, in \u001b[0;36mSeriesApply.apply_standard\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m 1501\u001b[0m \u001b[38;5;66;03m# row-wise access\u001b[39;00m\n\u001b[0;32m 1502\u001b[0m \u001b[38;5;66;03m# apply doesn't have a `na_action` keyword and for backward compat reasons\u001b[39;00m\n\u001b[0;32m 1503\u001b[0m \u001b[38;5;66;03m# we need to give `na_action=\"ignore\"` for categorical data.\u001b[39;00m\n\u001b[0;32m 1504\u001b[0m \u001b[38;5;66;03m# TODO: remove the `na_action=\"ignore\"` when that default has been changed in\u001b[39;00m\n\u001b[0;32m 1505\u001b[0m \u001b[38;5;66;03m# Categorical (GH51645).\u001b[39;00m\n\u001b[0;32m 1506\u001b[0m action \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(obj\u001b[38;5;241m.\u001b[39mdtype, CategoricalDtype) \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m-> 1507\u001b[0m mapped \u001b[38;5;241m=\u001b[39m \u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_map_values\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 1508\u001b[0m \u001b[43m \u001b[49m\u001b[43mmapper\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcurried\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mna_action\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maction\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconvert\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mconvert_dtype\u001b[49m\n\u001b[0;32m 1509\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1511\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(mapped) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(mapped[\u001b[38;5;241m0\u001b[39m], ABCSeries):\n\u001b[0;32m 1512\u001b[0m \u001b[38;5;66;03m# GH#43986 Need to do list(mapped) in order to get treated as nested\u001b[39;00m\n\u001b[0;32m 1513\u001b[0m \u001b[38;5;66;03m# See also GH#25959 regarding EA support\u001b[39;00m\n\u001b[0;32m 1514\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m obj\u001b[38;5;241m.\u001b[39m_constructor_expanddim(\u001b[38;5;28mlist\u001b[39m(mapped), index\u001b[38;5;241m=\u001b[39mobj\u001b[38;5;241m.\u001b[39mindex)\n",
1096
+ "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\base.py:921\u001b[0m, in \u001b[0;36mIndexOpsMixin._map_values\u001b[1;34m(self, mapper, na_action, convert)\u001b[0m\n\u001b[0;32m 918\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(arr, ExtensionArray):\n\u001b[0;32m 919\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m arr\u001b[38;5;241m.\u001b[39mmap(mapper, na_action\u001b[38;5;241m=\u001b[39mna_action)\n\u001b[1;32m--> 921\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43malgorithms\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmap_array\u001b[49m\u001b[43m(\u001b[49m\u001b[43marr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmapper\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mna_action\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mna_action\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconvert\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconvert\u001b[49m\u001b[43m)\u001b[49m\n",
1097
+ "File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\core\\algorithms.py:1743\u001b[0m, in \u001b[0;36mmap_array\u001b[1;34m(arr, mapper, na_action, convert)\u001b[0m\n\u001b[0;32m 1741\u001b[0m values \u001b[38;5;241m=\u001b[39m arr\u001b[38;5;241m.\u001b[39mastype(\u001b[38;5;28mobject\u001b[39m, copy\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m 1742\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m na_action \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m-> 1743\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mlib\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmap_infer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalues\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmapper\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconvert\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconvert\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1744\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 1745\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m lib\u001b[38;5;241m.\u001b[39mmap_infer_mask(\n\u001b[0;32m 1746\u001b[0m values, mapper, mask\u001b[38;5;241m=\u001b[39misna(values)\u001b[38;5;241m.\u001b[39mview(np\u001b[38;5;241m.\u001b[39muint8), convert\u001b[38;5;241m=\u001b[39mconvert\n\u001b[0;32m 1747\u001b[0m )\n",
1098
+ "File \u001b[1;32mlib.pyx:2972\u001b[0m, in \u001b[0;36mpandas._libs.lib.map_infer\u001b[1;34m()\u001b[0m\n",
1099
+ "Cell \u001b[1;32mIn[24], line 9\u001b[0m, in \u001b[0;36mstem\u001b[1;34m(text)\u001b[0m\n\u001b[0;32m 7\u001b[0m List \u001b[38;5;241m=\u001b[39m []\n\u001b[0;32m 8\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m i \u001b[38;5;129;01min\u001b[39;00m text\u001b[38;5;241m.\u001b[39msplit():\n\u001b[1;32m----> 9\u001b[0m List\u001b[38;5;241m.\u001b[39mappend(\u001b[43mps\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstem\u001b[49m\u001b[43m(\u001b[49m\u001b[43mi\u001b[49m\u001b[43m)\u001b[49m)\n\u001b[0;32m 10\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(List)\n",
1100
+ "\u001b[1;31mTypeError\u001b[0m: PorterStemmer.stem() missing 1 required positional argument: 'word'"
1101
+ ]
1102
+ }
1103
+ ],
1104
+ "source": [
1105
+ "# import nltk\n",
1106
+ "# from nltk.stem.porter import PorterStemmer\n",
1107
+ "\n",
1108
+ "# ps = PorterStemmer\n",
1109
+ "\n",
1110
+ "# def stem(text):\n",
1111
+ "# List = []\n",
1112
+ "# for i in text.split():\n",
1113
+ "# List.append(ps.stem(i))\n",
1114
+ "# return \" \".join(List)\n",
1115
+ "\n",
1116
+ "# movies['tags'] = movies['tags'].apply(stem)"
1117
+ ]
1118
+ },
1119
+ {
1120
+ "cell_type": "code",
1121
+ "execution_count": 25,
1122
+ "id": "a0ee92b1-3376-432e-9d9c-2ee37a236d9d",
1123
+ "metadata": {},
1124
+ "outputs": [
1125
+ {
1126
+ "data": {
1127
+ "text/html": [
1128
+ "<div>\n",
1129
+ "<style scoped>\n",
1130
+ " .dataframe tbody tr th:only-of-type {\n",
1131
+ " vertical-align: middle;\n",
1132
+ " }\n",
1133
+ "\n",
1134
+ " .dataframe tbody tr th {\n",
1135
+ " vertical-align: top;\n",
1136
+ " }\n",
1137
+ "\n",
1138
+ " .dataframe thead th {\n",
1139
+ " text-align: right;\n",
1140
+ " }\n",
1141
+ "</style>\n",
1142
+ "<table border=\"1\" class=\"dataframe\">\n",
1143
+ " <thead>\n",
1144
+ " <tr style=\"text-align: right;\">\n",
1145
+ " <th></th>\n",
1146
+ " <th>movie_id</th>\n",
1147
+ " <th>title</th>\n",
1148
+ " <th>tags</th>\n",
1149
+ " </tr>\n",
1150
+ " </thead>\n",
1151
+ " <tbody>\n",
1152
+ " <tr>\n",
1153
+ " <th>0</th>\n",
1154
+ " <td>19995</td>\n",
1155
+ " <td>Avatar</td>\n",
1156
+ " <td>in the 22nd century, a paraplegic marine is di...</td>\n",
1157
+ " </tr>\n",
1158
+ " <tr>\n",
1159
+ " <th>1</th>\n",
1160
+ " <td>285</td>\n",
1161
+ " <td>Pirates of the Caribbean: At World's End</td>\n",
1162
+ " <td>captain barbossa, long believed to be dead, ha...</td>\n",
1163
+ " </tr>\n",
1164
+ " <tr>\n",
1165
+ " <th>2</th>\n",
1166
+ " <td>206647</td>\n",
1167
+ " <td>Spectre</td>\n",
1168
+ " <td>a cryptic message from bond’s past sends him o...</td>\n",
1169
+ " </tr>\n",
1170
+ " <tr>\n",
1171
+ " <th>3</th>\n",
1172
+ " <td>49026</td>\n",
1173
+ " <td>The Dark Knight Rises</td>\n",
1174
+ " <td>following the death of district attorney harve...</td>\n",
1175
+ " </tr>\n",
1176
+ " <tr>\n",
1177
+ " <th>4</th>\n",
1178
+ " <td>49529</td>\n",
1179
+ " <td>John Carter</td>\n",
1180
+ " <td>john carter is a war-weary, former military ca...</td>\n",
1181
+ " </tr>\n",
1182
+ " <tr>\n",
1183
+ " <th>...</th>\n",
1184
+ " <td>...</td>\n",
1185
+ " <td>...</td>\n",
1186
+ " <td>...</td>\n",
1187
+ " </tr>\n",
1188
+ " <tr>\n",
1189
+ " <th>4804</th>\n",
1190
+ " <td>9367</td>\n",
1191
+ " <td>El Mariachi</td>\n",
1192
+ " <td>el mariachi just wants to play his guitar and ...</td>\n",
1193
+ " </tr>\n",
1194
+ " <tr>\n",
1195
+ " <th>4805</th>\n",
1196
+ " <td>72766</td>\n",
1197
+ " <td>Newlyweds</td>\n",
1198
+ " <td>a newlywed couple's honeymoon is upended by th...</td>\n",
1199
+ " </tr>\n",
1200
+ " <tr>\n",
1201
+ " <th>4806</th>\n",
1202
+ " <td>231617</td>\n",
1203
+ " <td>Signed, Sealed, Delivered</td>\n",
1204
+ " <td>\"signed, sealed, delivered\" introduces a dedic...</td>\n",
1205
+ " </tr>\n",
1206
+ " <tr>\n",
1207
+ " <th>4807</th>\n",
1208
+ " <td>126186</td>\n",
1209
+ " <td>Shanghai Calling</td>\n",
1210
+ " <td>when ambitious new york attorney sam is sent t...</td>\n",
1211
+ " </tr>\n",
1212
+ " <tr>\n",
1213
+ " <th>4808</th>\n",
1214
+ " <td>25975</td>\n",
1215
+ " <td>My Date with Drew</td>\n",
1216
+ " <td>ever since the second grade when he first saw ...</td>\n",
1217
+ " </tr>\n",
1218
+ " </tbody>\n",
1219
+ "</table>\n",
1220
+ "<p>4806 rows × 3 columns</p>\n",
1221
+ "</div>"
1222
+ ],
1223
+ "text/plain": [
1224
+ " movie_id title \\\n",
1225
+ "0 19995 Avatar \n",
1226
+ "1 285 Pirates of the Caribbean: At World's End \n",
1227
+ "2 206647 Spectre \n",
1228
+ "3 49026 The Dark Knight Rises \n",
1229
+ "4 49529 John Carter \n",
1230
+ "... ... ... \n",
1231
+ "4804 9367 El Mariachi \n",
1232
+ "4805 72766 Newlyweds \n",
1233
+ "4806 231617 Signed, Sealed, Delivered \n",
1234
+ "4807 126186 Shanghai Calling \n",
1235
+ "4808 25975 My Date with Drew \n",
1236
+ "\n",
1237
+ " tags \n",
1238
+ "0 in the 22nd century, a paraplegic marine is di... \n",
1239
+ "1 captain barbossa, long believed to be dead, ha... \n",
1240
+ "2 a cryptic message from bond’s past sends him o... \n",
1241
+ "3 following the death of district attorney harve... \n",
1242
+ "4 john carter is a war-weary, former military ca... \n",
1243
+ "... ... \n",
1244
+ "4804 el mariachi just wants to play his guitar and ... \n",
1245
+ "4805 a newlywed couple's honeymoon is upended by th... \n",
1246
+ "4806 \"signed, sealed, delivered\" introduces a dedic... \n",
1247
+ "4807 when ambitious new york attorney sam is sent t... \n",
1248
+ "4808 ever since the second grade when he first saw ... \n",
1249
+ "\n",
1250
+ "[4806 rows x 3 columns]"
1251
+ ]
1252
+ },
1253
+ "execution_count": 25,
1254
+ "metadata": {},
1255
+ "output_type": "execute_result"
1256
+ }
1257
+ ],
1258
+ "source": [
1259
+ "movies"
1260
+ ]
1261
+ },
1262
+ {
1263
+ "cell_type": "code",
1264
+ "execution_count": 26,
1265
+ "id": "9f521536-b865-49f5-8996-dcbc7982e641",
1266
+ "metadata": {},
1267
+ "outputs": [],
1268
+ "source": [
1269
+ "from sklearn.feature_extraction.text import CountVectorizer\n",
1270
+ "\n",
1271
+ "cv = CountVectorizer(max_features=5000, stop_words='english')"
1272
+ ]
1273
+ },
1274
+ {
1275
+ "cell_type": "code",
1276
+ "execution_count": 27,
1277
+ "id": "2a802077-3472-476f-a649-c40158bc97cb",
1278
+ "metadata": {},
1279
+ "outputs": [],
1280
+ "source": [
1281
+ "vectors = cv.fit_transform(movies['tags']).toarray()"
1282
+ ]
1283
+ },
1284
+ {
1285
+ "cell_type": "code",
1286
+ "execution_count": 28,
1287
+ "id": "3aed858d-ed14-4840-ac00-ce11406db70b",
1288
+ "metadata": {},
1289
+ "outputs": [
1290
+ {
1291
+ "data": {
1292
+ "text/plain": [
1293
+ "array([[0, 0, 0, ..., 0, 0, 0],\n",
1294
+ " [0, 0, 0, ..., 0, 0, 0],\n",
1295
+ " [0, 0, 0, ..., 0, 0, 0],\n",
1296
+ " ...,\n",
1297
+ " [0, 0, 0, ..., 0, 0, 0],\n",
1298
+ " [0, 0, 0, ..., 0, 0, 0],\n",
1299
+ " [0, 0, 0, ..., 0, 0, 0]], dtype=int64)"
1300
+ ]
1301
+ },
1302
+ "execution_count": 28,
1303
+ "metadata": {},
1304
+ "output_type": "execute_result"
1305
+ }
1306
+ ],
1307
+ "source": [
1308
+ "vectors"
1309
+ ]
1310
+ },
1311
+ {
1312
+ "cell_type": "code",
1313
+ "execution_count": 30,
1314
+ "id": "174f33cc-34dd-4c8d-86c8-ed7e9a7a01c7",
1315
+ "metadata": {},
1316
+ "outputs": [
1317
+ {
1318
+ "name": "stdout",
1319
+ "output_type": "stream",
1320
+ "text": [
1321
+ "['000' '007' '10' ... 'zone' 'zoo' 'zooeydeschanel']\n"
1322
+ ]
1323
+ }
1324
+ ],
1325
+ "source": [
1326
+ "print(cv.get_feature_names_out())"
1327
+ ]
1328
+ },
1329
+ {
1330
+ "cell_type": "code",
1331
+ "execution_count": 31,
1332
+ "id": "10a5bb71-f03c-4b16-ab13-f7c70b8d7dc7",
1333
+ "metadata": {},
1334
+ "outputs": [],
1335
+ "source": [
1336
+ "from sklearn.metrics.pairwise import cosine_similarity"
1337
+ ]
1338
+ },
1339
+ {
1340
+ "cell_type": "code",
1341
+ "execution_count": 32,
1342
+ "id": "b693799b-7b86-41cf-8299-3e109dd2cc48",
1343
+ "metadata": {},
1344
+ "outputs": [
1345
+ {
1346
+ "data": {
1347
+ "text/plain": [
1348
+ "array([[1. , 0.08964215, 0.05976143, ..., 0.02519763, 0.02817181,\n",
1349
+ " 0. ],\n",
1350
+ " [0.08964215, 1. , 0.0625 , ..., 0.02635231, 0. ,\n",
1351
+ " 0. ],\n",
1352
+ " [0.05976143, 0.0625 , 1. , ..., 0.02635231, 0. ,\n",
1353
+ " 0. ],\n",
1354
+ " ...,\n",
1355
+ " [0.02519763, 0.02635231, 0.02635231, ..., 1. , 0.0745356 ,\n",
1356
+ " 0.04836508],\n",
1357
+ " [0.02817181, 0. , 0. , ..., 0.0745356 , 1. ,\n",
1358
+ " 0.05407381],\n",
1359
+ " [0. , 0. , 0. , ..., 0.04836508, 0.05407381,\n",
1360
+ " 1. ]])"
1361
+ ]
1362
+ },
1363
+ "execution_count": 32,
1364
+ "metadata": {},
1365
+ "output_type": "execute_result"
1366
+ }
1367
+ ],
1368
+ "source": [
1369
+ "movies_cos_sim = cosine_similarity(vectors)\n",
1370
+ "movies_cos_sim"
1371
+ ]
1372
+ },
1373
+ {
1374
+ "cell_type": "code",
1375
+ "execution_count": 33,
1376
+ "id": "f2b04fdc-7764-4fe7-a401-63833468fd85",
1377
+ "metadata": {},
1378
+ "outputs": [
1379
+ {
1380
+ "data": {
1381
+ "text/plain": [
1382
+ "(4806, 4806)"
1383
+ ]
1384
+ },
1385
+ "execution_count": 33,
1386
+ "metadata": {},
1387
+ "output_type": "execute_result"
1388
+ }
1389
+ ],
1390
+ "source": [
1391
+ "movies_cos_sim.shape"
1392
+ ]
1393
+ },
1394
+ {
1395
+ "cell_type": "code",
1396
+ "execution_count": 101,
1397
+ "id": "21306d5b-1bce-4508-a8ed-85e41be6af95",
1398
+ "metadata": {},
1399
+ "outputs": [],
1400
+ "source": [
1401
+ "def recommend(movie):\n",
1402
+ " if movie in movies['title'].tolist():\n",
1403
+ " index = movies[movies['title']==movie].index[0]\n",
1404
+ " ascending_indices = movies_cos_sim[index].argsort()\n",
1405
+ " descending_indices = ascending_indices[::-1]\n",
1406
+ " return movies.iloc[descending_indices[1:21]]['title'].tolist()\n",
1407
+ " else:\n",
1408
+ " return 'movie not found in dataset'"
1409
+ ]
1410
+ },
1411
+ {
1412
+ "cell_type": "code",
1413
+ "execution_count": 133,
1414
+ "id": "be12e60c-1a10-4cb3-a9cd-b3fe1174f91b",
1415
+ "metadata": {},
1416
+ "outputs": [
1417
+ {
1418
+ "name": "stdin",
1419
+ "output_type": "stream",
1420
+ "text": [
1421
+ "movie : My Fault\n"
1422
+ ]
1423
+ },
1424
+ {
1425
+ "name": "stdout",
1426
+ "output_type": "stream",
1427
+ "text": [
1428
+ "m\n",
1429
+ "o\n",
1430
+ "v\n",
1431
+ "i\n",
1432
+ "e\n",
1433
+ " \n",
1434
+ "n\n",
1435
+ "o\n",
1436
+ "t\n",
1437
+ " \n",
1438
+ "f\n",
1439
+ "o\n",
1440
+ "u\n",
1441
+ "n\n",
1442
+ "d\n",
1443
+ " \n",
1444
+ "i\n",
1445
+ "n\n",
1446
+ " \n",
1447
+ "d\n",
1448
+ "a\n",
1449
+ "t\n",
1450
+ "a\n",
1451
+ "s\n",
1452
+ "e\n",
1453
+ "t\n",
1454
+ "----------------------------------------------------------------------------------------------\n"
1455
+ ]
1456
+ },
1457
+ {
1458
+ "name": "stdin",
1459
+ "output_type": "stream",
1460
+ "text": [
1461
+ "movie : God Father\n"
1462
+ ]
1463
+ },
1464
+ {
1465
+ "name": "stdout",
1466
+ "output_type": "stream",
1467
+ "text": [
1468
+ "m\n",
1469
+ "o\n",
1470
+ "v\n",
1471
+ "i\n",
1472
+ "e\n",
1473
+ " \n",
1474
+ "n\n",
1475
+ "o\n",
1476
+ "t\n",
1477
+ " \n",
1478
+ "f\n",
1479
+ "o\n",
1480
+ "u\n",
1481
+ "n\n",
1482
+ "d\n",
1483
+ " \n",
1484
+ "i\n",
1485
+ "n\n",
1486
+ " \n",
1487
+ "d\n",
1488
+ "a\n",
1489
+ "t\n",
1490
+ "a\n",
1491
+ "s\n",
1492
+ "e\n",
1493
+ "t\n",
1494
+ "----------------------------------------------------------------------------------------------\n"
1495
+ ]
1496
+ },
1497
+ {
1498
+ "name": "stdin",
1499
+ "output_type": "stream",
1500
+ "text": [
1501
+ "movie : Godfather\n"
1502
+ ]
1503
+ },
1504
+ {
1505
+ "name": "stdout",
1506
+ "output_type": "stream",
1507
+ "text": [
1508
+ "m\n",
1509
+ "o\n",
1510
+ "v\n",
1511
+ "i\n",
1512
+ "e\n",
1513
+ " \n",
1514
+ "n\n",
1515
+ "o\n",
1516
+ "t\n",
1517
+ " \n",
1518
+ "f\n",
1519
+ "o\n",
1520
+ "u\n",
1521
+ "n\n",
1522
+ "d\n",
1523
+ " \n",
1524
+ "i\n",
1525
+ "n\n",
1526
+ " \n",
1527
+ "d\n",
1528
+ "a\n",
1529
+ "t\n",
1530
+ "a\n",
1531
+ "s\n",
1532
+ "e\n",
1533
+ "t\n",
1534
+ "----------------------------------------------------------------------------------------------\n"
1535
+ ]
1536
+ },
1537
+ {
1538
+ "name": "stdin",
1539
+ "output_type": "stream",
1540
+ "text": [
1541
+ "movie : The Godfather\n"
1542
+ ]
1543
+ },
1544
+ {
1545
+ "name": "stdout",
1546
+ "output_type": "stream",
1547
+ "text": [
1548
+ "Desert Dancer\n",
1549
+ "Take the Lead\n",
1550
+ "Step Up 2: The Streets\n",
1551
+ "Center Stage\n",
1552
+ "Step Up\n",
1553
+ "Footloose\n",
1554
+ "ABCD (Any Body Can Dance)\n",
1555
+ "Step Up Revolution\n",
1556
+ "Tango\n",
1557
+ "Dancin' It's On\n",
1558
+ "Love Me Tender\n",
1559
+ "Sweet Charity\n",
1560
+ "Black Swan\n",
1561
+ "Sunday School Musical\n",
1562
+ "Peaceful Warrior\n",
1563
+ "Mao's Last Dancer\n",
1564
+ "Mr. Holland's Opus\n",
1565
+ "Yentl\n",
1566
+ "Honey\n",
1567
+ "Rize\n",
1568
+ "----------------------------------------------------------------------------------------------\n"
1569
+ ]
1570
+ },
1571
+ {
1572
+ "name": "stdin",
1573
+ "output_type": "stream",
1574
+ "text": [
1575
+ "movie : The Fault in Our Stars\n"
1576
+ ]
1577
+ },
1578
+ {
1579
+ "name": "stdout",
1580
+ "output_type": "stream",
1581
+ "text": [
1582
+ "Easy Money\n",
1583
+ "The Slaughter Rule\n",
1584
+ "Blood Ties\n",
1585
+ "Runner Runner\n",
1586
+ "The Gambler\n",
1587
+ "Hardball\n",
1588
+ "Gridiron Gang\n",
1589
+ "The Rainmaker\n",
1590
+ "Amidst the Devil's Wings\n",
1591
+ "Nine Queens\n",
1592
+ "Casino\n",
1593
+ "The Legend of Bagger Vance\n",
1594
+ "My Big Fat Greek Wedding\n",
1595
+ "Mi America\n",
1596
+ "Blue Like Jazz\n",
1597
+ "Ong Bak 2\n",
1598
+ "Auto Focus\n",
1599
+ "Stonewall\n",
1600
+ "Killer Joe\n",
1601
+ "Jesus' Son\n",
1602
+ "----------------------------------------------------------------------------------------------\n"
1603
+ ]
1604
+ },
1605
+ {
1606
+ "name": "stdin",
1607
+ "output_type": "stream",
1608
+ "text": [
1609
+ "movie : \n"
1610
+ ]
1611
+ },
1612
+ {
1613
+ "name": "stdout",
1614
+ "output_type": "stream",
1615
+ "text": [
1616
+ "function ended\n"
1617
+ ]
1618
+ }
1619
+ ],
1620
+ "source": [
1621
+ "while(True):\n",
1622
+ " movie = input('movie : ')\n",
1623
+ " if movie:\n",
1624
+ " for i in recommend(movie):\n",
1625
+ " print(i)\n",
1626
+ " print('----------------------------------------------------------------------------------------------')\n",
1627
+ " else:\n",
1628
+ " print('function ended')\n",
1629
+ " break"
1630
+ ]
1631
+ },
1632
+ {
1633
+ "cell_type": "code",
1634
+ "execution_count": 136,
1635
+ "id": "afc6fb23-bdc3-47df-9c22-18d303d97d81",
1636
+ "metadata": {},
1637
+ "outputs": [],
1638
+ "source": [
1639
+ "import pickle\n",
1640
+ "\n",
1641
+ "pickle.dump(movies.to_dict(), open('movies_dict.pkl', 'wb'))"
1642
+ ]
1643
+ },
1644
+ {
1645
+ "cell_type": "code",
1646
+ "execution_count": 144,
1647
+ "id": "d5d7a4a7-df6e-4ed6-a04b-5a4b8ba08550",
1648
+ "metadata": {},
1649
+ "outputs": [],
1650
+ "source": [
1651
+ "pickle.dump(movies_cos_sim, open('movies_cos_sim.pkl', 'wb'))\n",
1652
+ "x = pickle.load(open('movies_cos_sim.pkl', 'rb'))\n",
1653
+ "x"
1654
+ ]
1655
+ },
1656
+ {
1657
+ "cell_type": "code",
1658
+ "execution_count": 145,
1659
+ "id": "738cd08d-2232-4008-84c8-9e90f9320698",
1660
+ "metadata": {},
1661
+ "outputs": [
1662
+ {
1663
+ "data": {
1664
+ "text/plain": [
1665
+ "array([[1. , 0.08964215, 0.05976143, ..., 0.02519763, 0.02817181,\n",
1666
+ " 0. ],\n",
1667
+ " [0.08964215, 1. , 0.0625 , ..., 0.02635231, 0. ,\n",
1668
+ " 0. ],\n",
1669
+ " [0.05976143, 0.0625 , 1. , ..., 0.02635231, 0. ,\n",
1670
+ " 0. ],\n",
1671
+ " ...,\n",
1672
+ " [0.02519763, 0.02635231, 0.02635231, ..., 1. , 0.0745356 ,\n",
1673
+ " 0.04836508],\n",
1674
+ " [0.02817181, 0. , 0. , ..., 0.0745356 , 1. ,\n",
1675
+ " 0.05407381],\n",
1676
+ " [0. , 0. , 0. , ..., 0.04836508, 0.05407381,\n",
1677
+ " 1. ]])"
1678
+ ]
1679
+ },
1680
+ "execution_count": 145,
1681
+ "metadata": {},
1682
+ "output_type": "execute_result"
1683
+ }
1684
+ ],
1685
+ "source": []
1686
+ },
1687
+ {
1688
+ "cell_type": "code",
1689
+ "execution_count": null,
1690
+ "id": "1c150443-6419-4b0a-ab1a-92a3952c1bab",
1691
+ "metadata": {},
1692
+ "outputs": [],
1693
+ "source": []
1694
+ }
1695
+ ],
1696
+ "metadata": {
1697
+ "kernelspec": {
1698
+ "display_name": "Python 3 (ipykernel)",
1699
+ "language": "python",
1700
+ "name": "python3"
1701
+ },
1702
+ "language_info": {
1703
+ "codemirror_mode": {
1704
+ "name": "ipython",
1705
+ "version": 3
1706
+ },
1707
+ "file_extension": ".py",
1708
+ "mimetype": "text/x-python",
1709
+ "name": "python",
1710
+ "nbconvert_exporter": "python",
1711
+ "pygments_lexer": "ipython3",
1712
+ "version": "3.12.4"
1713
+ }
1714
+ },
1715
+ "nbformat": 4,
1716
+ "nbformat_minor": 5
1717
+ }