Sunday, March 4, 2012

Adding recommendations in Ruby

I am developing a small movies web application, and I wanted to add a simple but functional recommendation engine to it. So I started to work on it.

Algorithms and formulas from this post are based on the great book Algorithms of the Intelligent Web


Recommendations in applications are based on similarities. Similarities between two or more users, or similarity between two or more items.

I wanted a simple Recommender that would look something like:

  1. class Recommender
  2.  def recommendations_for(user)
  3.  end
  4. end

the recommendations_for method would return the recommended movies for a user.

I wanted to start the recommendation first with the User similarity.
This means that we will compare each user with each other looking at the movies they like (and don’t like) based on the ratings they give to these movies. This way we establish similarities between users based on their common tastes.

Most Recommendation implementations are based on the concept of similarity. Mainly similarity between users and similarity between items.
So first let's see the concept of User Similarity.

User similarity simply means how much a user's rating patterns compare with another user's. So if for example two users rank the same 10 movies with the same rankings we can think that they movie tastes are similar so if we can recommend movies between this users.

For having the best values when looking at user similarity we compare every user with each other, and rank their mutual similarities in order so we have a full arrangements of users and their similar users.

Comparing each user with each other can be a long task in systems with many users and many reviews per user. So the best idea is probably to precalculate this values and not do it in real time. But for now let's just focus on the algorithm I need as a first step in creating recommendations for my site

So let's see how this is done with some code:

The first step is as said find a way to figure out if my users are similar between them.

My current structure is a MongoDB database with the user reviews embedded inside the Movies document, I would need to extract that, but I won't do it right now. Instead I assume that I already extracted it and ended up with the same structure as the file downloaded from movielens.

The structure of this file is:

user id | item id | rating | timestamp

So what exactly we need?

Our user similarity measure between two users will be based on the ranking they each give to a common movie, and the ratio of how many common movies they have watched. so for example

Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5

Mario and Luigi are more similar as they have more movies, from all their movies, in common with each other, even though in the common movie (Titanic) all three of the users have the same ranking.

We wan't to create a result of the form:

user_1_id | user_2_id | similarity

For now we will store this in a simple memory list

So we'll start with the most trivial but expensive algorithm and later see if we can improve it somehow:

For doing this for all the users and all the items we need the expensive n^2 * m^2 algorithm:

The following algortihm and formulas can be found in the great book Algorithms of the Intelligent Web

class Similarity

 attr_reader :user_similarity_list

 def initialize(extractor)

    @extractor = extractor

    @user_similarity_list = []


 def generate_similarities

    users = @extractor.extract_users

    users.each do |user_1|

     user_1_rankings = @extractor.extract_movies_with_rankings(user_1)

     users.each do |user_2|

       common_items = 0


       user_2_rankings = @extractor.extract_movies_with_rankings(user_2)

       user_1_rankings.each do |ranking_1|

         user_2_rankings.each do |ranking_2|

           if ranking_1['movie'] == ranking_2['movie']

             common_items += 1

             similarity += (ranking_1['rating']-ranking_2['rating'])**2




       if common_items>0

         similarity = Math.sqrt(similarity/common_items)

         similarity = 1.0 - Math.tan(similarity)

         max_common_items =  [user_1_rankings.size, user_2_rankings.size].min

         similarity = similarity * (common_items.to_f/max_common_items.to_f)


       @user_similarity_list <<, user_2, similarity)





class UserSimilarityRelation

 def initialize(user_1, user_2, similarity)





 def to_s

    "#{@user_1}, #{@user_2} : #{@similarity}"



So let's test our algorithm with a mock extractor and see what results we get:

class MockExtractor

 def initialize

    @rankings = {'Koopa'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'LOTR', 'rating'=>5}], 'Mario'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}], 'Luigi'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}]}


 def extract_users

    ['Mario', 'Luigi', 'Koopa']


 def extract_movies_with_rankings(user)




similarity =


puts similarity.user_similarity_list

we run that and we get the following result:

Mario, Mario : 1.0
Mario, Luigi : 1.0
Mario, Koopa : 0.5
Luigi, Mario : 1.0
Luigi, Luigi : 1.0
Luigi, Koopa : 0.5
Koopa, Mario : 0.5
Koopa, Luigi : 0.5
Koopa, Koopa : 1.0

We can see that Mario and Luigi are as similar as possible while Koopa is not as similar based on the different movie he reviewed.

So how to make recommendations based on this similarity information:

The recommendation algorithm:

We would want a Recommender of the form

First we create an implementation in our extractor to extract the movie names. In our MockExtractor we just do:

  1.   def extract_movies
  2.     ['Titanic', 'LOTR', 'Braveheart', 'Harry Potter']
  3.   end

We extend our data to have the following now:

Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5
Luigi | LOTR | 4
Luigi | Harry Potter | 4
Koopa | Rambo | 3
Mario | Rocky | 4

When we run the similarity we now get:

Mario, Mario : 1.0
Mario, Luigi : 0.6666666666666666
Mario, Koopa : 0.3333333333333333
Luigi, Mario : 0.6666666666666666
Luigi, Luigi : 1.0
Luigi, Koopa : 0.09699304532693201
Koopa, Mario : 0.3333333333333333
Koopa, Luigi : 0.09699304532693201
Koopa, Koopa : 1.0

We will see how the Users with more similarity to other users affect the recommendation more than users that are not that similar:
then our recommendations_for implementation is:

class Recommender

 def initialize(extractor, similarity_data)

    @extractor = extractor

    @data_set = extractor.rankings



 def recommendations_for(user)

    movies = @extractor.extract_movies

    recommendations = []

    movies.each do |movie|

     recommendations << [movie, get_estimated_rating_for_movie(movie, user)]


    recommendations.delete_if do |recommendation|



    recommendations.sort! do |e1,e2|






 def find_user_rating_for_movie(user, movie)

    the_movie_rating = @data_set[user].find { |movie_rating| movie_rating['movie']==movie }

    if the_movie_rating.nil?

     return nil




 def get_estimated_rating_for_movie(movie, user)

    users = @extractor.extract_users

    similarity_sum = 0.0

    weighted_rating_sum = 0.0

    estimated_rating = 0.0

    if find_user_rating_for_movie(user, movie) == nil

     users.each do |other_user|

       next if other_user == user

       if find_user_rating_for_movie(other_user, movie) != nil

         similarity = @similarity_data.find { |similarity_relation| similarity_relation.user_1 == user and similarity_relation.user_2 == other_user }

         similarity = similarity.similarity

         other_user_rating = find_user_rating_for_movie(other_user, movie)

         weighted_rating = similarity * other_user_rating

         weighted_rating_sum += weighted_rating

         similarity_sum += similarity



     if (similarity_sum > 0.0)

       estimated_rating = weighted_rating_sum / similarity_sum



     estimated_rating = nil





Then we execute:

similarity =


puts similarity.user_similarity_list

recommender =,similarity.user_similarity_list)

recommendations = recommender.recommendations_for('Mario')

puts recommendations

And we get:

Harry Potter

If we change now the rating Koopa gives to LOTR to 4 and the rating Luigi gives to LOTR to 5 we get:

Harry Potter

We see that because Mario and Luigi are more similar than Mario and Koopa, Luigi's ratings have a bigger impact on the recommended movies to Mario.

This is it. We now have a recommendation engine based on user similarities and rating systems.

Again the performance of the algorithms presented are not good and not suitable for real time recommendation generation, and maybe not even for pre-calculated recommendations. However this is the system (with a couple of fast performance improvements like changing a couple of the iterations to Hash access) I will start to use in my own application and will evolve it if I see it is not a good fit.

No comments: