The primary goal is to develop a robust Movie Recommendation System that provides users with personalized movie recommendation based on their previous movie rating and movie reference. The system will consist of a user-friendly frontend for interaction and a powerful backend for processing and generating recommendations.
Netflix Prize Dataset:
The Netflix Prize dataset was a famous dataset released by Netflix for a competition to improve the accuracy of their movie recommendation system.
TMDB 5000 Dataset:
The Movie Database (TMDB), a popular, user-editable database for movies and TV shows.It includes a wide range of data such as titles, genres, release dates, budgets, revenues, production companies, countries, vote counts, and average vote scores.
We divided the project into four modules:
Note: Considering this is a comprehensive project including frontend, backend and recommendation module, we created four different branches, each of which contains the related code. Thus, there is no code in main branch and please click on the links above to go to the corresponding branch and review the codes
.
Team members: Samuel Wang, Shengyi Liu, Rachel Huang, Zoey Zhang, Zitong Li, Guodong Sun
- Frontend: HTML/CSS/Javascript, React, Tailwind CSS
- Backend: Python,Node.js with Express.js
- Database:MongoDB, Neo4j Graph Database, JSON Format
- Rcomendation System: Neo4j Advanced Knowledge Graph, Collaborative Filtering, Cosine Similarity, Fuzzy Matching, Scikit-learn(Old version)
👉🏽 For this project, we will be using:
Python, Node.js, mongoDB, Neo4j Graph Database
- Create a new DBMS and modify recommender_graph.py in backend-module with your own credentials
- Copy all the .csv files in backend-module and paste them into the 'import' folder of the DBMS
- Execute recommender_graph.py to load data to graph database
git clone https://github.com/samuelusc/CSCI596-Project.git
git checkout backend-module
cd user
npm install
node app.js
git clone https://github.com/samuelusc/CSCI596-Project.git
git checkout frontend-module
npm install
npm start
Note: There could be some exceptions for certain movies since the API we used might not provide all the movie info in our database.
*Evaluation Metrics
: Used in the previous version, the evaluation matrix will be integrated into the latest version in the future.
Table of Four modules
The Latest Version
: Neo4j Advanced Knwoledge GraphThe Old Version
: Scikit-surprise
Our recommendation module primarily aims to solve two main problems:
- How to enable new users to quickly discover movies they'll love.
- How to effectively increase the engagement of our existing users.
Relevance
:
Offer movie recommendations as closely aligned as possible with user preferences and needs.
Novelty
:
Suggest films that users might not have encountered but are likely to find intriguing.
Serendipity
:
Ensure that our recommendations exceed user expectations, creating a sense of surprise and delight.
Diversity
:
Provide a diverse range of recommended genres to cater to the varied tastes and requirements of our users.
Personalized Recommendations
:
Utilizing machine learning algorithms, we provide individualized suggestions based on a user's search history, viewing history, and rating data.
New User Questionnaire
:
New users are asked to complete a brief interest survey or rate movies during registration, which will allow us to quickly understand their preferences.
Interactive Interface
:
An intuitive and user-friendly interface is designed to make it easier for users to discover and explore new movies.
Intelligent Sorting
:
Movies are sorted to prominently display those that are likely to align with a user's tastes.
Editor's Picks
:
We showcase a list of movies recommended by editors or based on popular trends.
Tagging System
:
Movies are categorized using tags such as genre, mood, director, or actors, enabling users to swiftly filter according to their interests.
User Reviews
:
Displaying other users' ratings and reviews helps new users discover popular movies.
Latest Version
:
Pandas
: For data handling and analysis.
Neo4j Database
: Using Neo4j, a advanced graph database, to store and manage data.
Dynamic Query Building
: Constructs Cypher queries based on user input, such as filtering movies by genre or calculating similarity.
Cold Start Problem Handling
: The user interacts with the system through the command line, inputting data and receiving recommendations.
Fuzzy Matching for Movie Titles
: To handle partial or imprecise movie title inputs, the script employs a fuzzy matching technique.
Cosine Similarity for User Similarity
: Using Pearson Correlation Coefficient. The Pearson correlation coefficient is used to calculate the similarity between different movies. The movies are represented as vectors of pre-collected user review ratings. For each movie the correlation coefficients of the rating vector with vectors of other movies are collected and sorted. The recommneded movies are selected per largest correlation coefficients.
Collaborative Filtering for Recommendations
: a user behavior-based collaborative filtering recommendation system, specifically for movie recommendations. This system identifies movies to recommend by analyzing user ratings, finding users with similar movie rating habits, and basing suggestions on the preferences of these similar users.
Old Version
:
Scikit-surprise or scikit-learn
:
A python scikit we used to build and analyze recommender systems. It provides some efficient collaborative filtering algorithms, including user-based collaborative filtering, item-based collaborative filtering, and matrix factorization algorithms.
SVD (Singular Value Decomposition / matrix factorization )
:
It’s a powerful matrix factorization technique used for collaborative filtering. This algorithm identifies latent features by decomposing the user-item rating matrix.
(Matrix image sourced from Buomsoo-kim)
Method
:
A movie matrix is assembled based on collected data. Each column of the matrix represents the review pattern of all reviewers of a certain movie. For each column, the correlation coefficients are calculated with all other columns and the columns with highest coefficients are recorded and the movies represented by these columns shall be taken as recommended movie.
- User Matrix: X = (x1, x2, x3…, xn)
- Item matrix: Y = (y1, y2, y3…, ym)
Evaluation Metrics
:
- Personalized Picks: Suggesting 5 movies tailored to individual user preferences.
- Related Discoveries: Presenting 4 related movies based on user input, using advanced filtering methods.
- Trending Now: Showcasing the top 3 trending movies to keep users engaged with popular content.
Part of Output Test
The interactive interfaces are used for user to input any movies for recommendation. The fuzzywuzzy module is used to map user input to one of the movies in MovieMat, and then the interactive interface shows the recommended movies. A sample input/output result is shown as below.
Old Version
MSE
: The average squared difference between the predicted and actual values.RMSE
: Taking the square root of the mean squared error (MSE).Precision
: True Positive / (True Positive + False Positive)Recall
: True Positive / (True Positive + False Negative)
Graph Representation For User 999111
Description: The frontend mainly includes the following pages:
- User Sign In/Sign Up/Forget Password
- Home page displaying top rated movies, recommended movies and providing searching functionality
- Single movie page displaying the basic information of the movie, review(0-5 stars) and related movies
Tech Stacks:
- React.js
- Tailwind CSS
Features:
- User sign up, user sign in, email verification, reset password
- Send movie information to the frontend (user ID, review, recommendation list, popular movie)
- Send search engine data to the frontend
Tech Stacks:
- Node.js
- Express.js
Testing with Postman
Create User:
Mailtrap—Email Verification:
Get User ID from MongoDB:
Email Verification:
Get List of Top Rated Movies:
Get List of Related Movies based on a Movie Title:
Get Movie Rating:
Get Search Engine Results:
Get List of Movie Recommendation for a User:
- MongoDB
- Neo4j Graph Database
- User information
- Movie reviews
- Pre-trained result for movie recommendation
Request movie details (movie title, movie overview, movie poster, etc.) from TMDB.
{
adult: false,
backdrop_path: '/bckxSN9ueOgm0gJpVJmPQrecWul.jpg',
genre_ids: [ 28, 12, 14 ],
id: 572802,
original_language: 'en',
original_title: 'Aquaman and the Lost Kingdom',
overview: "Black Manta, still driven by the need to avenge his father's death and wielding the power of the mythic Black Trident, will stop at nothing to take Aquaman down once and for all. To defeat him, Aquaman must turn to his imprisoned brother Orm, the former King of Atlantis, to forge an unlikely alliance in order to save the world from irreversible destruction.",
popularity: 253.712,
poster_path: '/8xV47NDrjdZDpkVcCFqkdHa3T0C.jpg',
release_date: '2023-12-20',
title: 'Aquaman and the Lost Kingdom',
video: false,
vote_average: 0,
vote_count: 0
}
Display the graph database interface
Present the relationship network by movie keywords
Show the relationship network by movie productors
The result after we input "iron man".
We can see the related movies provided by recommendation system.
We rated Iron Man 3, Iron Man 2 and The Avengers 5 stars. The recommendation system gave us other related sci-fi movies.