*Before we begin – Check out my Youtube tutorial on this very same topic 🙂 *

‘Movies recommended for you’ – Netflix

‘Videos recommended for you’ – YouTube

‘Restaurants recommended for you’ – Some smart restaurant finder app

Notice a trend? Your favorite apps ‘know’ you (or at least they think they do). They gradually learn your preferences over time (or in a matter of hours) and suggest new products which they think you’ll love.

How is this done? I can’t speak for how Netflix *actually* makes movie recommendations, but the fundamentals are largely intuitive, actually.

If you keep ‘five staring’ Stoner Comedy movies like the whole ‘Harold and Kumar’ series on Netflix, it makes sense for Netflix to assume that you may also enjoy ‘Ted’, or any other Stoner Comedy film on Netflix.

To make recommendations in a real world application, let’s take our intuition and apply it to a machine learning algorithm called Collaborative Filtering.

The following guide will be done in Python, using the Math/Science computing packages Numpy and SciPy. I’ll walk you through every single step, so we can properly understand what is going on *under the hood* of collaborative filtering. Let’s get started.

**Step 1 – Initialize The Movie Ratings**

Simple but scalable scenario

- 10 movies
- 5 users
- 3 features (we’ll discuss this in Step 3)

Here is an example diagram of movie ratings. Our rating system is from 1-10:

Let’s initialize a 10 X 5 matrix called ‘ratings’; this matrix holds all the ratings given by all users, for all movies. **Note: Not all users may have rated all movies, and this is okay.**

Note 2: I simply made up some data for ‘ratings’. The point of this step is to simply start off with a dataset that we can work with.

This matrix below contains the same ratings data you saw in the picture above. Here is how we declare it in Python/Numpy:

ratings = array([[8, 4, 0, 0, 4], [0, 0, 8, 10, 4], [8, 10, 0, 0, 6], [10, 10, 8, 10, 10], [0, 0, 0, 0, 0], [2, 0, 4, 0, 6], [8, 6, 4, 0, 0], [0, 0, 6, 4, 0], [0, 6, 0, 4, 10], [0, 4, 6, 8, 8]]) |

Here’s what the ratings matrix looks like:

[[ 8 4 0 0 4] [ 0 0 8 10 4] [ 8 10 0 0 6] [10 10 8 10 10] [ 0 0 0 0 0] [ 2 0 4 0 6] [ 8 6 4 0 0] [ 0 0 6 4 0] [ 0 6 0 4 10] [ 0 4 6 8 8]] |

**Learner’s check:**

- Each column represents all the movies rated by a single user
- Each row represents all the ratings (from different users) received by a single movie

Recall that our rating system is from 1-10. Notice how there are 0’s to denote that no rating has been given.

**Step 2 – Determine Whether a User Rated a Movie**

To make our life easier, let’s also declare a **binary** matrix (0’s and 1’s) to denote whether a user rated a movie.

1 = the user rated the movie.

0 = the user did not rate the movie.

Let’s call this matrix **‘did_rate’**. Note it has the same dimensions as ‘ratings’, 10 X 5:

did_rate = (ratings != 0) * 1; |

This above command should give you the following binary matrix:

[[1 1 0 0 1] [0 0 1 1 1] [1 1 0 0 1] [1 1 1 1 1] [0 0 0 0 0] [1 0 1 0 1] [1 1 1 0 0] [0 0 1 1 0] [0 1 0 1 1] [0 1 1 1 1]] |

**Learner’s check:**

- did_rate[1,2] = 1: This means the 3rd user
*did*rate the 2nd movie. (Note: Python arrays and matrixes are 0 based) - did_rate[5, 3] = 0: This means the 4th user
*did not*rate the 6th movie

**Step 3 – User Preferences and Movie Features/Characteristics**

This is where it gets interesting. In order for us to build a robust recommendation engine, we need to know user preferences and movie features (characteristics). After all, a good recommendation is based off of knowing this key user and movie information.

For example, a user preference could be how much the user likes comedy movies, on a scale of 1-5. A movie characteristic could be to what degree is the movie considered a comedy, on a scale of 0-1.

**Example 1: User preferences (user_prefs) -> Sample preferences for a single user Chelsea**

**Example 2: Movie features (movie_features) -> Sample features for a single movie Bad Boys**

**Note**: The user preferences are the exact same as the movie features; in other words, we can **map** each user preference to a movie feature. This makes sense; if a user has a huge preference for a comedy, we’d like to recommend a movie with a high degree of comedy. If we have add a new preference for the user, for ‘romantic-comedy’, we should also add this as a new feature for a movie, so that our recommendation algorithm can fully use this feature/preference when making a prediction.

**Note 2**: We can use these numbers that I purposely came up with to ‘predict’ ratings for movies. For example, let’s predict what Chelsea would rate Bad Boys, below:

Chelsea's (C) rating (R) of Bad Boys (BB): R |

**5 big problems:** This seems great, but:

- Who has time to sit down and come up with a list of features for users and movies?
- It would be very time consuming to come up with a value for each feature, for each and every user and movie.
- Why did I pick 1-10 as the range for user preferences and 0-1 as the range for movie features? It seems a bit forced.
- How does the product (multiplication) of user_prefs and movie_features magically give us a predicted rating?
- Why did I pick ‘comedy’, ‘romance’ and ‘action’ as the features? This seems manual and forced. There must be a better way to generate features

**The solution: **

Before we dive deep into the collaborative filtering solution to answer our 4 big problems, let’s quickly introduce some key matrixes that we’ll be needing.

The user features (preferences) can be represented by a matrix** ‘user_prefs’**. In our example, we have 5 users and 3 features. So, ‘user_prefs’ is a 5 X 3 matrix.

Here is an ** example** diagram to help

*the data ‘user_prefs’ contains:*

**visualize**The movie features can also be represented by a matrix **‘movie_features’.** In our example, we have 10 movies and 3 features. So, ‘movie_features’ is a 10 X 3 matrix.

Here is an ** example** diagram to help

*the data ‘movie_features’ contains:*

**visualize****Step 4: Let’s Rate Some Movies**

I have a list of 10 movies here, in a text file (movies.txt):

1 Harold and Kumar Escape From Guantanamo Bay (2008) 2 Ted (2012) 3 Straight Outta Compton (2015) 4 A Very Harold and Kumar Christmas (2011) 5 Notorious (2009) 6 Get Rich Or Die Tryin' (2005) 7 Frozen (2013) 8 Tangled (2010) 9 Cinderella (2015) 10 Toy Story 3 (2010) |

Now, let’s rate some movies. Our ratings can be represented by a 10 X 1 column vector **nikhil_ratings** (my name is Nikhil). Let’s initialize it to 0’s and make some ratings:

nikhil_ratings = zeros((10, 1)) nikhil_ratings[0] = 7; nikhil_ratings[4] = 8; nikhil_ratings[7] = 3 |

**Learner’s check:**

- I gave Harold and Kumar Escape From Guantanamo Bay a 7
- I gave Notorious an 8
- I gave Tangled a 3

Let’s update **ratings** and **did_rate** with the our ratings **nikhil_ratings**:

ratings = append(nikhil_ratings, ratings, axis=1) did_rate = append(((nikhil_ratings != 0) * 1), did_rate, axis = 1) |

Here’s what the updated ratings matrix looks like:

[[ 7. 8. 4. 0. 0. 4.] [ 0. 0. 0. 8. 10. 4.] [ 0. 8. 10. 0. 0. 6.] [ 0. 10. 10. 8. 10. 10.] [ 8. 0. 0. 0. 0. 0.] [ 0. 2. 0. 4. 0. 6.] [ 0. 8. 6. 4. 0. 0.] [ 3. 0. 0. 6. 4. 0.] [ 0. 0. 6. 0. 4. 10.] [ 0. 0. 4. 6. 8. 8.]] |

And here’s what the updated did_rate matrix looks like:

[[1 1 1 0 0 1] [0 0 0 1 1 1] [0 1 1 0 0 1] [0 1 1 1 1 1] [1 0 0 0 0 0] [0 1 0 1 0 1] [0 1 1 1 0 0] [1 0 0 1 1 0] [0 0 1 0 1 1] [0 0 1 1 1 1]] |

**Learner’s check**:

- ‘ratings’ is now a 10 X 6 matrix
- ‘did_rate’ is now a 10 X 6 matrix

**Step 5: Mean Normalize All The Ratings**

Once we get to Step 7: Minimize The Cost Function, you may see why mean normalizing the ‘**ratings**‘ matrix is necessary.

**What is mean normalization?**

It is much easier to understand the ‘what’ if we understand the **why. **Why normalize the ‘ratings’ matrix?

Consider the following scenario:

A user (Christie) rated 0 movies. Our collaborative filtering algorithm that we are about to build will then go on to predict that Christie will rate all movies as 0. You may see why in the further steps when we cover the cost function and gradient descent. Don’t worry about it for now.

This is no good, because then we won’t be able to suggest Christie anything. **After all, a recommendation is simply based off of what movie(s) we predict the user to rate the highest. **

So how do recommend a movie to a user who has never placed a rating?

We simply suggest the highest average rated movie. That’s the best we can do, since we know nothing about the user. This is made possible because of mean normalization.

**What is mean normalization?**

Mean normalization, in our case, is the process of making the average rating received by each movie equal to 0.

Take a look at our Step 1 *example* the ‘ratings’ matrix, again:

Each row represents **all** the ratings received by **one** movie. Here’s how to normalize a matrix:

- Find the average of the 1st row. In other words, find the average rating received by the first movie ‘Harold and Kumar Go To Guantanamo Bay’
- Subtract this average from each rating (entry) in the 1st row
- The first row has now been normalized. This row now has an average of 0.
- Repeat steps 1 & 2 for all rows.

Here is my implementation for mean normalization in Python/Numpy:

def normalize_ratings(ratings, did_rate): num_movies = ratings.shape[0] ratings_mean = zeros(shape = (num_movies, 1)) ratings_norm = zeros(shape = ratings.shape) for i in range(num_movies): # Get all the indexes where there is a 1 idx = where(did_rate[i] ==1)[0] # Calculate mean rating of ith movie only from user's that gave a rating ratings_mean[i] = mean(ratings[i, idx]) ratings_norm[i, idx] = ratings[i, idx] - ratings_mean[i] return (ratings_norm, ratings_mean) |

**Note**: This function returns a tuple, containing the normalized ratings matrix, and a column vector storing the mean rating received by each movie.

We can call this function and fetch the results from the returned tuple:

ratings_norm, ratings_mean = normalize_ratings(ratings, did_rate) |

**Learner’s check:**

‘ratings_norm’ contains the normalized ‘ratings’ matrix. Of course, it’s still a 10 X 6 matrix. Here it is below:

[[ 1.25 2.25 -1.75 0. 0. -1.75 ] [ 0. 0. 0. 0.66666667 2.66666667 -3.33333333] [ 0. 0. 2. 0. 0. -2. ] [ 0. 0.4 0.4 -1.6 0.4 0.4 ] [ 0. 0. 0. 0. 0. 0. ] [ 0. -2. 0. 0. 0. 2. ] [ 0. 2. 0. -2. 0. 0. ] [-1.33333333 0. 0. 1.66666667 -0.33333333 0. ] [ 0. 0. -0.66666667 0. -2.66666667 3.33333333] [ 0. 0. -2.5 -0.5 1.5 1.5 ]] |

‘ratings_mean’ is a 10 X 1 column vector whose *ith* row contains the average rating of the *ith* movie. Here it is below:

[[ 5.75 ] [ 7.33333333] [ 8. ] [ 9.6 ] [ 8. ] [ 4. ] [ 6. ] [ 4.33333333] [ 6.66666667] [ 6.5 ]] |

**Step 6: Collaborative Filtering via Linear Regression**

If you are unfamiliar with how a linear regression works, these links should be helpful.

The *simplest* way to think about it is that we are simply fitting a line, (i.e) learning from to a scatter plot (in the case of a *uni-linear *regression):

In our case, we face a multi-linear regression problem. But don’t worry, we’ll briefly cover the intuition in a few seconds.

**Helpful intuition **: A user’s *big* preference for comedy movies (i.e 4.5/5) paired with a *high* movie’s ‘level of comedy’ (i.e 0.8/1) tends to be *positively* correlated with the user’s rating for that movie. For the most part, this correlation is *continuos. *

**Conversely**, a user’s *hate* for* *comedy (1/5), still paired with a *high* movie’s ‘level of comedy’ (i.e 0.8/1) tends to be *negatively* correlated with the user’s rating for that movie. This is another reason for mean normalization. If you notice in the ‘ratings_norm’ matrix above, there are some negative ratings. These ratings are negative because they have been rated below average.

**Collaborative filtering aside:** At the end of this tutorial, you will notice that those movies rated very highly by users tend to make their way into our personal predictions (and hence movie recommendations). Say I rate movie A a 10, you rate movie A a 9, and I rate movie B a 9. If we have similar preferences (represented by the user_prefs matrix, in this case) you might also like movie B

If you are familiar with a linear regression, you may know that the goal of a linear regression is to **minimize the sum of squared errors** (absolute difference between our predicted values and observed values), in order to come up with the best learning algorithm for predicting new outputs, or in the case of a uni-linear regression, the *best* ‘line of best fit’.

**Note**: In our case, we face a *multi-linear regression* problem, since we have many more than 1 feature.

A linear regression is associated with some cost function; our goal is to minimize this cost function (Step 7), and thus minimize the sum of squared errors.

A vectorized implementation of a linear regression is as follows (not Python, just pseudocode):

Y = X * θ |

**Learner’s check:**

- θ is our parameter (user preferences, in our case) vector
- X is our vector of features (movie features, in our case)

To fit our example, we can rename the variables as such:

all_predictions = movie_features.dot(user_prefs.T) |

We want to *simultaneously* find optimal values of movie_features and user_prefs such that the sum of squared errors (cost function) is minimized. How can we do this?

**Step 7: Minimize The Cost Function**

We will allow our collaborative filtering algorithm to simultaneously come up with the appropriate values of ‘movie_features’ and ‘user_prefs’, by minimizing the sum of squared errors, through a process called gradient descent.

**Note:** If you are unfamiliar with gradient descent, worry not. All you need to understand is that gradient descent is an iterative algorithm that helps us minimize a continuous and convex function. In our specific case we refer to this convex function as the cost function, or the sum of squared errors.

After multiple iterations of gradient descent, we would have found the values of matrixes ‘user_prefs’ and ‘movie_features’ that **minimize** our cost function. Essentially, we will have *‘learned’* the appropriate values of ‘user_prefs’ and ‘movie_features’ to make accurate predictions on movie ratings for every user.

**What is our cost function?**

Here is my cost function in Python/Numpy, with regularization (to prevent overfitting, i.e high variance):

def calculate_cost(X_and_theta, ratings, did_rate, num_users, num_movies, num_features, reg_param): # Retrieve the X and theta matrixes from X_and_theta, based on their dimensions (num_features, num_movies, num_movies) # -------------------------------------------------------------------------------------------------------------- # Get the first 30 (10 * 3) rows in the 48 X 1 column vector first_30 = X_and_theta[:num_movies * num_features] # Reshape this column vector into a 10 X 3 matrix X = first_30.reshape((num_features, num_movies)).transpose() # Get the rest of the 18 the numbers, after the first 30 last_18 = X_and_theta[num_movies * num_features:] # Reshape this column vector into a 6 X 3 matrix theta = last_18.reshape(num_features, num_users ).transpose() # we multiply by did_rate because we only want to consider observations for which a rating was given # we calculate the sum of squared errors here. # in other words, we calculate the squared difference between our hypothesis (predictions) and ratings cost = sum( (X.dot( theta.T ) * did_rate - ratings) ** 2 ) / 2 # we get the sum of the square of every element of X and theta regularization = (reg_param / 2) * (sum( theta**2 ) + sum(X**2)) return cost + regularization |

Great, we have our cost function. In order for gradient descent to work, we need to calculate the **gradients** (i.e derivate/slope) of our cost function.

First, let’s calculate the gradient of the cost with respect to X (i.e movie_features) and theta (i.e user_prefs):

def calculate_gradient(X_and_theta, ratings, did_rate, num_users, num_movies, num_features, reg_param): # Retrieve the X and theta matrixes from X_and_theta, based on their dimensions (num_features, num_movies, num_movies) # -------------------------------------------------------------------------------------------------------------- # Get the first 30 (10 * 3) rows in the 48 X 1 column vector first_30 = X_and_theta[:num_movies * num_features] # Reshape this column vector into a 10 X 3 matrix X = first_30.reshape((num_features, num_movies)).transpose() # Get the rest of the 18 the numbers, after the first 30 last_18 = X_and_theta[num_movies * num_features:] # Reshape this column vector into a 6 X 3 matrix theta = last_18.reshape(num_features, num_users ).transpose() # we multiply by did_rate because we only want to consider observations for which a rating was given difference = X.dot( theta.T ) * did_rate - ratings # we calculate the gradients (derivatives) of the cost with respect to X and theta X_grad = difference.dot( theta ) + reg_param * X theta_grad = difference.T.dot( X ) + reg_param * theta # wrap the gradients back into a column vector return r_[X_grad.T.flatten(), theta_grad.T.flatten()] |

**Learner’s check:**

- X_grad is the derivative of the calculate_cost function with respect to X (movie_features)
- theta_grad is the derivative of the calculate_cost function with respect to theta (user_prefs)

Before we perform gradient descent using our 2 functions above, we need to initialize our parameters user_prefs (theta) and movie_features (X) to *random small numbers*.

To do this in Python/Numpy, I have used the *np.random.rand* function. This function returns a matrix of random elements that are normally distributed, with a mean of 0 and a variance of 1:

num_movies, num_users = shape(ratings) num_features = 3 # Initialize Parameters theta (user_prefs), X (movie_features) movie_features = random.randn( num_movies, num_features ) user_prefs = random.randn( num_users, num_features ) |

**Recall**

- movie_features is a 10 X 3 matrix
- user_prefs is a 6 X 3 matrix

Lastly, let’s roll movie_features and user_prefs into a 48 X 1 column vector:

initial_X_and_theta = r_[movie_features.T.flatten(), user_prefs.T.flatten()] |

**Gradient Descent: In Simple Terms**

In our case, our cost function is convex. The image you see above is an example of a convex function. Since our cost function is a function of X and theta, the **goal** of gradient descent is to find the values of X and theta that minimize this cost function.

As you can see, we need to reach the **global minimum**. To gradually get us to the global minimum, x and theta must be updated per every iteration of gradient descent. We will use an advanced optimization algorithm to do this, by using the SciPy function **scipy.optimize.fmin_cg()**

fmin_cg() takes the our calculate_cost and calculate_gradient functions as paramters, as well as the number of iterations:

# Regularization paramater reg_param = 30.0 # fprime simply refers to the derivative (gradient) of the calculate_cost function # We iterate 100 times minimized_cost_and_optimal_params = scipy.optimize.fmin_cg(calculate_cost, fprime=calculate_gradient, x0=initial_X_and_theta, \ args=(ratings, did_rate, num_users, num_movies, num_features, reg_param), \ maxiter=100, disp=True, full_output=True ) |

**Learner’s check**:

- If you are unfamiliar with regularization, you don’t need to worry about what reg_param means.
- Each iteration, we calculate the cost and its gradients. We used the gradients to update X and theta

Let’s grab the minimized cost and the optimal values of the movie_features (X) and user_prefs (theta) matrices:

# Retrieve the minimized cost and the optimal values of the movie_features (X) and user_prefs (theta) matrices cost, optimal_movie_features_and_user_prefs = minimized_cost_and_optimal_params[1], minimized_cost_and_optimal_params[0] |

Let’s extract movie_features and user_prefs from optimal_movie_features_and_user_prefs:

first_30 = optimal_movie_features_and_user_prefs[:num_movies * num_features] movie_features = first_30.reshape((num_features, num_movies)).transpose() last_18 = optimal_movie_features_and_user_prefs[num_movies * num_features:] user_prefs = last_18.reshape(num_features, num_users ).transpose() |

**Step 8: Make Movie Predictions!…Finally**

Recall Step 4: Let’s Rate Some Movies. We rated some movies.

Now, let’s use our learning algorithm we just built to predict ratings that we would give movies, based on our learning algorithm, and our ‘nikhil_ratings’ row vector.

Let’s calculate the dot product of the movie_features and user_prefs matrices

all_predictions = movie_features.dot( user_prefs.T ) |

**Learner’s check**:

- all_predictions is a 10 X 6 matrix

Here’s what all_predictions looks like:

[[ 0.0620198 0.34417418 0.39180877 0.30865685 0.35675296 0.4402625 ] [ 0.07483434 0.41528746 0.47276431 0.37243154 0.43046527 0.53122956] [ 0.08869726 0.4922187 0.56034303 0.4414238 0.51020818 0.62963886] [ 0.16391762 0.90964837 1.03554604 0.81577648 0.94289396 1.1636087 ] [ 0.00508621 0.02822554 0.03213203 0.02531279 0.02925712 0.0361057 ] [ 0.04386498 0.24342535 0.27711604 0.21830488 0.25232199 0.31138609] [ 0.06129902 0.34017427 0.38725526 0.30506971 0.35260687 0.43514588] [ 0.03426023 0.19012456 0.21643829 0.17050451 0.19707319 0.24320452] [ 0.07780538 0.43177505 0.49153383 0.38721768 0.44755545 0.55232024] [ 0.09182245 0.50956171 0.58008636 0.45697709 0.52818504 0.65182379]] |

Now I’ll get my predictions by extracting the first column vector from all_predictions.

predictions_for_nikhil = all_predictions[:, 0:1] + ratings_mean |

**Learner’s check**:

- Recall in Step 5 where we mean normalized all the ‘ratings’. Since we subtracted the mean of the movie’s ratings from each rating for that movie, we
**added back**‘ratings_mean’ to our predicted ratings.

Let’s display our predictions. First, we need to have our movies in an iterable and index-accessible Python data structure, like a **dictionary**. Recall that in step 4 we had all of our movies in a text file called ‘movies.txt’. Here’s how to get these movies into a Python dictionary:

def loadMovies(): movie_dict = {} movie_index = 0 with open('/Users/nikhilbhaskar/Desktop/SmoothOperator/movies.txt', 'rb') as yo: file_contents = yo.readlines() for content in file_contents: movie_dict[movie_index] = content.strip().split(' ', 1)[1] movie_index += 1 return movie_dict |

Let’s call this function and store the returned dictionary in a variable called ‘all_movies’:

all_movies = loadMovies() |

Here’s what the python dictionary all_movies looks like:

{0: 'Harold and Kumar Escape From Guantanamo Bay (2008)', 1: 'Ted (2012)', 2: 'Straight Outta Compton (2011)', 3: 'A Very Harold and Kumar Christmas (2011)', 4: 'Notorious (2009)', 5: "Get Rich Or Die Tryin' (2005)", 6: 'Frozen (2013)', 7: 'Tangled (2010)', 8: 'Cinderella (2015)', 9: 'Toy Story 3 (2010)'} |

Before we display our predictions, let’s sort the ‘predictions_for_nikhil’ column vector:

# we use argsort; we cannot simply use sort(predictions_for_nikhil) sorted_indexes = predictions_for_nikhil.argsort(axis=0)[::-1] predictions_for_nikhil = predictions_for_nikhil[sorted_indexes] |

Let’s display our predictions:

# since we only have 10 movies, let's display all ratings for i in range(num_movies): # grab index (integer), which remember, are all sorted based on the prediction values index = sorted_indexes[i, 0] print "Predicting rating %.1f for movie %s" % (predictions_for_nikhil[index], all_movies[index]) |

The result looks as follows:

Predicting rating 7.4 for movie A Very Harold and Kumar Christmas (2011) Predicting rating 8.0 for movie Straight Outta Compton (2011) Predicting rating 6.7 for movie Notorious (2009) Predicting rating 8.1 for movie Ted (2012) Predicting rating 4.4 for movie Cinderella (2015) Predicting rating 4.0 for movie Toy Story 3 (2010) Predicting rating 6.1 for movie Frozen (2013) Predicting rating 9.8 for movie Harold and Kumar Escape From Guantanamo Bay (2008) Predicting rating 5.8 for movie Tangled (2010) Predicting rating 6.6 for movie Get Rich Or Die Tryin' (2005) |

**Step 9: Take It Further**

You should try to build your own recommendation engine. Perhaps not just for movies, but for anything else you can think of. We can’t always find what are looking for by ourselves. Sometimes a good recommendation is all we need.

Perhaps you can implement a **clustering** algorithm such as k-means or DBSCAN to group users with similar features together, and thereby recommend the same movies to users belonging to the same cluster.

In our example, the more you rate movie movies, the more ‘personalized’ (and possibly accurate) your recommendations will be. This is because you are giving the recommendation engine (learning algorithm) more of your data to observe and learn from.

So, maybe if you* actually* ‘Netflix and chill’ed more often, Netflix will know you better and make better movie recommendations for you 😉

PS: The entire code for my tutorial can be found here, in my Github repository

## One thought on “Movie Recommendations? How Does Netflix Do It? A 9 Step Coding (Python) & Intuitive Guide Into Collaborative Filtering”