Movie Recommendation System In Python

Posted on July 22, 2024

Data Science Projects

Docsallover - Movie Recommendation System In Python

Recommendation systems have become an integral part of our digital lives, influencing everything from the movies we watch to the products we buy. In this tutorial, we'll delve into the world of building a movie recommendation system using Python. We'll explore the core concepts, implement a content-based filtering approach, and discuss potential improvements and challenges.

Understanding the Code

To build a recommendation system, we need a dataset that contains information about movies and potentially user ratings. For this tutorial, we'll use the movielens dataset.

Importing Libraries:
- pandas (pd): Used for data manipulation and analysis (reading CSV, dataframes)
- numpy (np): Used for numerical computations (vectorization)
- CountVectorizer from sklearn.feature_extraction.text: Converts text data into numerical features
- cosine_similarity from sklearn.metrics.pairwise: Calculates similarity between vectors

Loading and Exploring the Movie Data:
- Reads the movie data from a CSV file into a pandas dataframe (df).
- Displays the first few rows (head()) and statistical summary (describe()) of the data to understand its content.

Feature Selection and Preprocessing:
- Defines a list of features (features) relevant for recommendations like genres, cast, and director.
- Checks for missing values (NaN) in the cast column.
- Creates a function combine_features to concatenate all chosen features into a single string for each movie, stored in a new column combined_features.
- Fills missing values in the chosen features with an empty string ("") to avoid errors during processing.

Text Vectorization and Similarity Calculation:
- Uses CountVectorizer to convert the combined features text into numerical vectors (count_matrix).
- Calculates the cosine similarity between each movie based on their vector representations using cosine_similarity. Cosine similarity measures how similar two vectors are.

The get_title_from_index Function:

This function takes an index as input and returns the corresponding movie title from the DataFrame df.
- df[df.index == index]: This line filters the DataFrame to rows where the index matches the input index.
- ["title"]: Selects only the title column from the filtered DataFrame.
- .values[0]: Extracts the title as a string from the resulting Series.

The get_index_from_title Function:

This function takes a movie title as input and returns its corresponding index in the DataFrame.
- df[df.title == title]: Filters the DataFrame to rows where the title matches the input title.
- ["index"]: Selects only the index column from the filtered DataFrame.
- .values[0]: Extracts the index as an integer from the resulting Series.

Getting User Input and Finding Recommendations

This code block handles user input and retrieves the movie index:
- Prompts the user to enter their favorite movie title.
- Uses get_index_from_title to obtain the index of the movie.
- Handles potential exceptions (e.g., movie not found) using a try-except block.

Finding Similar Movies & Printing Recommendations

Finding Similar Movies:
- It calculates the similarity scores between the user's favorite movie and all other movies using the cosine_sim matrix.
- Sorts the similarity scores in descending order to identify the most similar movies.
- Stores the indices of the top 10 similar movies in the sorted_similar_movies list.
Printing Recommendations:
- Iterates through the sorted_similar_movies list.
- For each movie index, retrieves the corresponding movie title using get_title_from_index.
- Prints the movie title.
- Breaks the loop after printing 10 recommendations.

This code completes the movie recommendation system, providing a list of movies similar to the user's favorite movie based on content-based filtering.

Output:

Note: This is a basic example and can be improved by incorporating more sophisticated techniques, such as user-based or item-based collaborative filtering, hybrid approaches, and advanced evaluation metrics.

Download