{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Movie Recommendation Project\n", "In this machine learning project, we build a recommendation system from the ground up to suggest movies to the user based on his/her preferences.\n", "\n", "## Dataset\n", "We are using the TMDB dataset available from Kaggle\n", "\n", "## What is a Recommendation System?\n", "Recommendation systems suggest recommendations to users depending on a variety of criteria.\n", "\n", "There are 3 types of recommendation systems.\n", "\n", "1. Demographic Filtering: The recommendations are the same for every user. They are generalized, not personalized. These types of systems are behind sections like “Top Trending”.\n", "2. Content-based Filtering: These suggest recommendations based on the item metadata (movie, product, song, etc). Here, the main idea is if a user likes an item, then the user will also like items similar to it.\n", "3. Collaboration-based Filtering: These systems make recommendations by grouping the users with similar interests. For this system, metadata of the item is not required.\n", "\n", "In this project, we are building a **Content-based** recommendation engine for movies." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from ast import literal_eval\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from sklearn.metrics.pairwise import cosine_similarity\n", "import pickle" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "credits_df = pd.read_csv(\"./data/tmdb_5000_credits.csv\")\n", "movies_df = pd.read_csv(\"./data/tmdb_5000_movies.csv\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
budgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_count
0237000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.avatarmovie.com/19995[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577[{\"name\": \"Ingenious Film Partners\", \"id\": 289...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2009-12-102787965087162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.211800
1300000000[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...http://disney.go.com/disneypictures/pirates/285[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...enPirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...139.082615[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2007-05-19961000000169.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedAt the end of the world, the adventure begins.Pirates of the Caribbean: At World's End6.94500
2245000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.sonypictures.com/movies/spectre/206647[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...enSpectreA cryptic message from Bond’s past sends him o...107.376788[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam...[{\"iso_3166_1\": \"GB\", \"name\": \"United Kingdom\"...2015-10-26880674609148.0[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...ReleasedA Plan No One EscapesSpectre6.34466
3250000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...http://www.thedarkknightrises.com/49026[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...enThe Dark Knight RisesFollowing the death of District Attorney Harve...112.312950[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2012-07-161084939099165.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedThe Legend EndsThe Dark Knight Rises7.69106
4260000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://movies.disney.com/john-carter49529[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...enJohn CarterJohn Carter is a war-weary, former military ca...43.926995[{\"name\": \"Walt Disney Pictures\", \"id\": 2}][{\"iso_3166_1\": \"US\", \"name\": \"United States o...2012-03-07284139100132.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedLost in our world, found in another.John Carter6.12124
\n", "
" ], "text/plain": [ " budget genres \\\n", "0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", "1 300000000 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n", "2 245000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", "3 250000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n", "4 260000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", "\n", " homepage id \\\n", "0 http://www.avatarmovie.com/ 19995 \n", "1 http://disney.go.com/disneypictures/pirates/ 285 \n", "2 http://www.sonypictures.com/movies/spectre/ 206647 \n", "3 http://www.thedarkknightrises.com/ 49026 \n", "4 http://movies.disney.com/john-carter 49529 \n", "\n", " keywords original_language \\\n", "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n", "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... en \n", "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... en \n", "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... en \n", "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... en \n", "\n", " original_title \\\n", "0 Avatar \n", "1 Pirates of the Caribbean: At World's End \n", "2 Spectre \n", "3 The Dark Knight Rises \n", "4 John Carter \n", "\n", " overview popularity \\\n", "0 In the 22nd century, a paraplegic Marine is di... 150.437577 \n", "1 Captain Barbossa, long believed to be dead, ha... 139.082615 \n", "2 A cryptic message from Bond’s past sends him o... 107.376788 \n", "3 Following the death of District Attorney Harve... 112.312950 \n", "4 John Carter is a war-weary, former military ca... 43.926995 \n", "\n", " production_companies \\\n", "0 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... \n", "1 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"... \n", "2 [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam... \n", "3 [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"... \n", "4 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}] \n", "\n", " production_countries release_date revenue \\\n", "0 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2009-12-10 2787965087 \n", "1 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2007-05-19 961000000 \n", "2 [{\"iso_3166_1\": \"GB\", \"name\": \"United Kingdom\"... 2015-10-26 880674609 \n", "3 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2012-07-16 1084939099 \n", "4 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2012-03-07 284139100 \n", "\n", " runtime spoken_languages status \\\n", "0 162.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n", "1 169.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", "2 148.0 [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},... Released \n", "3 165.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", "4 132.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", "\n", " tagline \\\n", "0 Enter the World of Pandora. \n", "1 At the end of the world, the adventure begins. \n", "2 A Plan No One Escapes \n", "3 The Legend Ends \n", "4 Lost in our world, found in another. \n", "\n", " title vote_average vote_count \n", "0 Avatar 7.2 11800 \n", "1 Pirates of the Caribbean: At World's End 6.9 4500 \n", "2 Spectre 6.3 4466 \n", "3 The Dark Knight Rises 7.6 9106 \n", "4 John Carter 6.1 2124 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies_df.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idtitlecastcrew
019995Avatar[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's End[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647Spectre[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight Rises[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John Carter[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", "
" ], "text/plain": [ " movie_id title \\\n", "0 19995 Avatar \n", "1 285 Pirates of the Caribbean: At World's End \n", "2 206647 Spectre \n", "3 49026 The Dark Knight Rises \n", "4 49529 John Carter \n", "\n", " cast \\\n", "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n", "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n", "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n", "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n", "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n", "\n", " crew \n", "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "credits_df.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
budgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularityproduction_companies...revenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countcastcrew
0237000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.avatarmovie.com/19995[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577[{\"name\": \"Ingenious Film Partners\", \"id\": 289......2787965087162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.211800[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1300000000[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...http://disney.go.com/disneypictures/pirates/285[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...enPirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...139.082615[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"......961000000169.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedAt the end of the world, the adventure begins.Pirates of the Caribbean: At World's End6.94500[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2245000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.sonypictures.com/movies/spectre/206647[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...enSpectreA cryptic message from Bond’s past sends him o...107.376788[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam......880674609148.0[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...ReleasedA Plan No One EscapesSpectre6.34466[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
3250000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...http://www.thedarkknightrises.com/49026[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...enThe Dark Knight RisesFollowing the death of District Attorney Harve...112.312950[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"......1084939099165.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedThe Legend EndsThe Dark Knight Rises7.69106[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
4260000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://movies.disney.com/john-carter49529[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...enJohn CarterJohn Carter is a war-weary, former military ca...43.926995[{\"name\": \"Walt Disney Pictures\", \"id\": 2}]...284139100132.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedLost in our world, found in another.John Carter6.12124[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " budget genres \\\n", "0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", "1 300000000 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n", "2 245000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", "3 250000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n", "4 260000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", "\n", " homepage id \\\n", "0 http://www.avatarmovie.com/ 19995 \n", "1 http://disney.go.com/disneypictures/pirates/ 285 \n", "2 http://www.sonypictures.com/movies/spectre/ 206647 \n", "3 http://www.thedarkknightrises.com/ 49026 \n", "4 http://movies.disney.com/john-carter 49529 \n", "\n", " keywords original_language \\\n", "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n", "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... en \n", "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... en \n", "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... en \n", "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... en \n", "\n", " original_title \\\n", "0 Avatar \n", "1 Pirates of the Caribbean: At World's End \n", "2 Spectre \n", "3 The Dark Knight Rises \n", "4 John Carter \n", "\n", " overview popularity \\\n", "0 In the 22nd century, a paraplegic Marine is di... 150.437577 \n", "1 Captain Barbossa, long believed to be dead, ha... 139.082615 \n", "2 A cryptic message from Bond’s past sends him o... 107.376788 \n", "3 Following the death of District Attorney Harve... 112.312950 \n", "4 John Carter is a war-weary, former military ca... 43.926995 \n", "\n", " production_companies ... revenue runtime \\\n", "0 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... ... 2787965087 162.0 \n", "1 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"... ... 961000000 169.0 \n", "2 [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam... ... 880674609 148.0 \n", "3 [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"... ... 1084939099 165.0 \n", "4 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}] ... 284139100 132.0 \n", "\n", " spoken_languages status \\\n", "0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n", "1 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", "2 [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},... Released \n", "3 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", "4 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", "\n", " tagline \\\n", "0 Enter the World of Pandora. \n", "1 At the end of the world, the adventure begins. \n", "2 A Plan No One Escapes \n", "3 The Legend Ends \n", "4 Lost in our world, found in another. \n", "\n", " title vote_average vote_count \\\n", "0 Avatar 7.2 11800 \n", "1 Pirates of the Caribbean: At World's End 6.9 4500 \n", "2 Spectre 6.3 4466 \n", "3 The Dark Knight Rises 7.6 9106 \n", "4 John Carter 6.1 2124 \n", "\n", " cast \\\n", "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n", "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n", "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n", "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n", "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n", "\n", " crew \n", "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "credits_df.columns = ['id', 'title', 'cast', 'crew']\n", "movies_df = movies_df.merge(credits_df, on = \"id\")\n", "movies_df.drop('title_y', axis = 1, inplace = True)\n", "movies_df.rename(columns={'title_x':'title'}, inplace = True)\n", "movies_df.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
castcrewkeywordsgenres
0[{'cast_id': 242, 'character': 'Jake Sully', '...[{'credit_id': '52fe48009251416c750aca23', 'de...[{'id': 1463, 'name': 'culture clash'}, {'id':...[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
1[{'cast_id': 4, 'character': 'Captain Jack Spa...[{'credit_id': '52fe4232c3a36847f800b579', 'de...[{'id': 270, 'name': 'ocean'}, {'id': 726, 'na...[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...
2[{'cast_id': 1, 'character': 'James Bond', 'cr...[{'credit_id': '54805967c3a36829b5002c41', 'de...[{'id': 470, 'name': 'spy'}, {'id': 818, 'name...[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
3[{'cast_id': 2, 'character': 'Bruce Wayne / Ba...[{'credit_id': '52fe4781c3a36847f81398c3', 'de...[{'id': 849, 'name': 'dc comics'}, {'id': 853,...[{'id': 28, 'name': 'Action'}, {'id': 80, 'nam...
4[{'cast_id': 5, 'character': 'John Carter', 'c...[{'credit_id': '52fe479ac3a36847f813eaa3', 'de...[{'id': 818, 'name': 'based on novel'}, {'id':...[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
5[{'cast_id': 30, 'character': 'Peter Parker / ...[{'credit_id': '52fe4252c3a36847f80151a5', 'de...[{'id': 851, 'name': 'dual identity'}, {'id': ...[{'id': 14, 'name': 'Fantasy'}, {'id': 28, 'na...
6[{'cast_id': 34, 'character': 'Flynn Rider (vo...[{'credit_id': '52fe46db9251416c91062101', 'de...[{'id': 1562, 'name': 'hostage'}, {'id': 2343,...[{'id': 16, 'name': 'Animation'}, {'id': 10751...
7[{'cast_id': 76, 'character': 'Tony Stark / Ir...[{'credit_id': '55d5f7d4c3a3683e7e0016eb', 'de...[{'id': 8828, 'name': 'marvel comic'}, {'id': ...[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
8[{'cast_id': 3, 'character': 'Harry Potter', '...[{'credit_id': '52fe4273c3a36847f801fab1', 'de...[{'id': 616, 'name': 'witch'}, {'id': 2343, 'n...[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...
9[{'cast_id': 18, 'character': 'Bruce Wayne / B...[{'credit_id': '553bf23692514135c8002886', 'de...[{'id': 849, 'name': 'dc comics'}, {'id': 7002...[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...
\n", "
" ], "text/plain": [ " cast \\\n", "0 [{'cast_id': 242, 'character': 'Jake Sully', '... \n", "1 [{'cast_id': 4, 'character': 'Captain Jack Spa... \n", "2 [{'cast_id': 1, 'character': 'James Bond', 'cr... \n", "3 [{'cast_id': 2, 'character': 'Bruce Wayne / Ba... \n", "4 [{'cast_id': 5, 'character': 'John Carter', 'c... \n", "5 [{'cast_id': 30, 'character': 'Peter Parker / ... \n", "6 [{'cast_id': 34, 'character': 'Flynn Rider (vo... \n", "7 [{'cast_id': 76, 'character': 'Tony Stark / Ir... \n", "8 [{'cast_id': 3, 'character': 'Harry Potter', '... \n", "9 [{'cast_id': 18, 'character': 'Bruce Wayne / B... \n", "\n", " crew \\\n", "0 [{'credit_id': '52fe48009251416c750aca23', 'de... \n", "1 [{'credit_id': '52fe4232c3a36847f800b579', 'de... \n", "2 [{'credit_id': '54805967c3a36829b5002c41', 'de... \n", "3 [{'credit_id': '52fe4781c3a36847f81398c3', 'de... \n", "4 [{'credit_id': '52fe479ac3a36847f813eaa3', 'de... \n", "5 [{'credit_id': '52fe4252c3a36847f80151a5', 'de... \n", "6 [{'credit_id': '52fe46db9251416c91062101', 'de... \n", "7 [{'credit_id': '55d5f7d4c3a3683e7e0016eb', 'de... \n", "8 [{'credit_id': '52fe4273c3a36847f801fab1', 'de... \n", "9 [{'credit_id': '553bf23692514135c8002886', 'de... \n", "\n", " keywords \\\n", "0 [{'id': 1463, 'name': 'culture clash'}, {'id':... \n", "1 [{'id': 270, 'name': 'ocean'}, {'id': 726, 'na... \n", "2 [{'id': 470, 'name': 'spy'}, {'id': 818, 'name... \n", "3 [{'id': 849, 'name': 'dc comics'}, {'id': 853,... \n", "4 [{'id': 818, 'name': 'based on novel'}, {'id':... \n", "5 [{'id': 851, 'name': 'dual identity'}, {'id': ... \n", "6 [{'id': 1562, 'name': 'hostage'}, {'id': 2343,... \n", "7 [{'id': 8828, 'name': 'marvel comic'}, {'id': ... \n", "8 [{'id': 616, 'name': 'witch'}, {'id': 2343, 'n... \n", "9 [{'id': 849, 'name': 'dc comics'}, {'id': 7002... \n", "\n", " genres \n", "0 [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam... \n", "1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, '... \n", "2 [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam... \n", "3 [{'id': 28, 'name': 'Action'}, {'id': 80, 'nam... \n", "4 [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam... \n", "5 [{'id': 14, 'name': 'Fantasy'}, {'id': 28, 'na... \n", "6 [{'id': 16, 'name': 'Animation'}, {'id': 10751... \n", "7 [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam... \n", "8 [{'id': 12, 'name': 'Adventure'}, {'id': 14, '... \n", "9 [{'id': 28, 'name': 'Action'}, {'id': 12, 'nam... " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features = [\"cast\", \"crew\", \"keywords\", \"genres\"]\n", "for feature in features:\n", " movies_df[feature] = movies_df[feature].apply(literal_eval)\n", "movies_df[features].head(10)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def get_director(input):\n", " for i in input:\n", " if i[\"job\"] == \"Director\":\n", " return i[\"name\"]\n", " return np.nan\n", "\n", "def get_list(x):\n", " if isinstance(x, list):\n", " names = [i[\"name\"] for i in x]\n", " if len(names) > 3:\n", " names = names[:3]\n", " return names\n", " return []" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titlecastdirectorkeywordsgenres
0Avatar[Sam Worthington, Zoe Saldana, Sigourney Weaver]James Cameron[culture clash, future, space war][Action, Adventure, Fantasy]
1Pirates of the Caribbean: At World's End[Johnny Depp, Orlando Bloom, Keira Knightley]Gore Verbinski[ocean, drug abuse, exotic island][Adventure, Fantasy, Action]
2Spectre[Daniel Craig, Christoph Waltz, Léa Seydoux]Sam Mendes[spy, based on novel, secret agent][Action, Adventure, Crime]
3The Dark Knight Rises[Christian Bale, Michael Caine, Gary Oldman]Christopher Nolan[dc comics, crime fighter, terrorist][Action, Crime, Drama]
4John Carter[Taylor Kitsch, Lynn Collins, Samantha Morton]Andrew Stanton[based on novel, mars, medallion][Action, Adventure, Science Fiction]
\n", "
" ], "text/plain": [ " title \\\n", "0 Avatar \n", "1 Pirates of the Caribbean: At World's End \n", "2 Spectre \n", "3 The Dark Knight Rises \n", "4 John Carter \n", "\n", " cast director \\\n", "0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] James Cameron \n", "1 [Johnny Depp, Orlando Bloom, Keira Knightley] Gore Verbinski \n", "2 [Daniel Craig, Christoph Waltz, Léa Seydoux] Sam Mendes \n", "3 [Christian Bale, Michael Caine, Gary Oldman] Christopher Nolan \n", "4 [Taylor Kitsch, Lynn Collins, Samantha Morton] Andrew Stanton \n", "\n", " keywords genres \n", "0 [culture clash, future, space war] [Action, Adventure, Fantasy] \n", "1 [ocean, drug abuse, exotic island] [Adventure, Fantasy, Action] \n", "2 [spy, based on novel, secret agent] [Action, Adventure, Crime] \n", "3 [dc comics, crime fighter, terrorist] [Action, Crime, Drama] \n", "4 [based on novel, mars, medallion] [Action, Adventure, Science Fiction] " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies_df[\"director\"] = movies_df[\"crew\"].apply(get_director)\n", "features = [\"cast\", \"keywords\", \"genres\"]\n", "for feature in features:\n", " movies_df[feature] = movies_df[feature].apply(get_list)\n", "movies_df[['title', 'cast', 'director', 'keywords', 'genres']].head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def clean_data(row):\n", " if isinstance(row, list):\n", " return [str.lower(i.replace(\" \", \"\")) for i in row]\n", " else:\n", " if isinstance(row, str):\n", " return str.lower(row.replace(\" \", \"\"))\n", " else:\n", " return \"\"\n", "\n", "def create_soup(features):\n", " soup = ' '.join(features['keywords'])\n", " soup += ' ' + ' '.join(features['cast'])\n", " soup += ' ' + features['director']\n", " soup += ' ' + ' '.join(features['genres'])\n", " return soup" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 cultureclash future spacewar samworthington zo...\n", "1 ocean drugabuse exoticisland johnnydepp orland...\n", "2 spy basedonnovel secretagent danielcraig chris...\n", "3 dccomics crimefighter terrorist christianbale ...\n", "4 basedonnovel mars medallion taylorkitsch lynnc...\n", "Name: soup, dtype: object" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features = ['cast', 'keywords', 'director', 'genres']\n", "for feature in features:\n", " movies_df[feature] = movies_df[feature].apply(clean_data)\n", "\n", "movies_df[\"soup\"] = movies_df.apply(create_soup, axis=1)\n", "movies_df[\"soup\"].head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4803, 11520)\n", "(4803, 4803)\n" ] } ], "source": [ "count_vectorizer = CountVectorizer(stop_words = \"english\")\n", "count_matrix = count_vectorizer.fit_transform(movies_df[\"soup\"])\n", "print(count_matrix.shape)\n", "\n", "cosine_sim = cosine_similarity(count_matrix, count_matrix) \n", "print(cosine_sim.shape)\n", "\n", "movies_df = movies_df.reset_index()\n", "indices = pd.Series(movies_df.index, index = movies_df['title'])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "title\n", "Avatar 0\n", "Pirates of the Caribbean: At World's End 1\n", "Spectre 2\n", "The Dark Knight Rises 3\n", "John Carter 4\n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "indices = pd.Series(movies_df.index, index = movies_df[\"title\"]).drop_duplicates()\n", "indices.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Save model to file\n", "model_file = open('movie_recommendation_model.pkl', 'wb')\n", "pickle.dump(movies_df, model_file)\n", "pickle.dump(cosine_sim, model_file)\n", "pickle.dump(indices, model_file)\n", "model_file.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The get_recommendations() function takes the title of the movie and the similarity function as input. It follows the below steps to make recommendations.\n", "\n", "- Get the index of the movie using the title.\n", "- Get the list of similarity scores of the movies concerning all the movies.\n", "- Enumerate them (create tuples) with the first element being the index and the second element is the cosine similarity score.\n", "- Sort the list of tuples in descending order based on the similarity score.\n", "- Get the list of the indices of the top 10 movies from the above sorted list. Exclude the first element because it is the title itself.\n", "- Map those indices to their respective titles and return the movies list." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def get_recommendations(title, cosine_sim = cosine_sim):\n", " idx = indices[title]\n", " similarity_scores = list(enumerate(cosine_sim[idx]))\n", " similarity_scores = sorted(similarity_scores, key = lambda x: x[1], reverse = True)\n", " similarity_scores = similarity_scores[1:11]\n", " # (a, b) where a is id of movie, b is similarity_scores\n", " movies_indices = [ind[0] for ind in similarity_scores]\n", " movies = movies_df[\"title\"].iloc[movies_indices]\n", " return movies" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "65 The Dark Knight\n", "119 Batman Begins\n", "4638 Amidst the Devil's Wings\n", "1196 The Prestige\n", "3073 Romeo Is Bleeding\n", "3326 Black November\n", "1503 Takers\n", "1986 Faster\n", "303 Catwoman\n", "747 Gangster Squad\n", "Name: title, dtype: object\n" ] } ], "source": [ "print(get_recommendations(\"The Dark Knight Rises\"))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7 Avengers: Age of Ultron\n", "26 Captain America: Civil War\n", "79 Iron Man 2\n", "169 Captain America: The First Avenger\n", "174 The Incredible Hulk\n", "85 Captain America: The Winter Soldier\n", "31 Iron Man 3\n", "33 X-Men: The Last Stand\n", "68 Iron Man\n", "94 Guardians of the Galaxy\n", "Name: title, dtype: object\n" ] } ], "source": [ "print(get_recommendations(\"The Avengers\"))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2577 Tuck Everlasting\n", "3072 Atlas Shrugged Part II\n", "4691 Yesterday Was a Lie\n", "266 I, Robot\n", "3155 Melancholia\n", "3642 Atlas Shrugged Part III: Who is John Galt?\n", "163 Watchmen\n", "220 Prometheus\n", "365 Contact\n", "461 Lost in Space\n", "Name: title, dtype: object\n" ] } ], "source": [ "print(get_recommendations(\"Dark City\"))" ] } ], "metadata": { "kernelspec": { "display_name": ".ptvenv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 2 }