carlo scarioni: Adding recommendations in Ruby

I am developing a small movies web application, and I wanted to add a simple but functional recommendation engine to it. So I started to work on it.

Algorithms and formulas from this post are based on the great book Algorithms of the Intelligent Web

Recommendations in applications are based on similarities. Similarities between two or more users, or similarity between two or more items.

I wanted a simple Recommender that would look something like:

class Recommender
 def recommendations_for(user)
 end
end

the recommendations_for method would return the recommended movies for a user.

I wanted to start the recommendation first with the User similarity.
This means that we will compare each user with each other looking at the movies they like (and don’t like) based on the ratings they give to these movies. This way we establish similarities between users based on their common tastes.

Most Recommendation implementations are based on the concept of similarity. Mainly similarity between users and similarity between items.
So first let's see the concept of User Similarity.

User similarity simply means how much a user's rating patterns compare with another user's. So if for example two users rank the same 10 movies with the same rankings we can think that they movie tastes are similar so if we can recommend movies between this users.

For having the best values when looking at user similarity we compare every user with each other, and rank their mutual similarities in order so we have a full arrangements of users and their similar users.

Comparing each user with each other can be a long task in systems with many users and many reviews per user. So the best idea is probably to precalculate this values and not do it in real time. But for now let's just focus on the algorithm I need as a first step in creating recommendations for my site

So let's see how this is done with some code:

The first step is as said find a way to figure out if my users are similar between them.

My current structure is a MongoDB database with the user reviews embedded inside the Movies document, I would need to extract that, but I won't do it right now. Instead I assume that I already extracted it and ended up with the same structure as the file downloaded from movielens.

The structure of this file is:

user id | item id | rating | timestamp

So what exactly we need?

Our user similarity measure between two users will be based on the ranking they each give to a common movie, and the ratio of how many common movies they have watched. so for example

Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5

Mario and Luigi are more similar as they have more movies, from all their movies, in common with each other, even though in the common movie (Titanic) all three of the users have the same ranking.

We wan't to create a result of the form:

user_1_id | user_2_id | similarity

For now we will store this in a simple memory list

So we'll start with the most trivial but expensive algorithm and later see if we can improve it somehow:

For doing this for all the users and all the items we need the expensive n^2 * m^2 algorithm:

The following algortihm and formulas can be found in the great book Algorithms of the Intelligent Web

class Similarity

 attr_reader :user_similarity_list

 def initialize(extractor)

    @extractor = extractor

    @user_similarity_list = []

 end

 def generate_similarities

    users = @extractor.extract_users

    users.each do |user_1|

     user_1_rankings = @extractor.extract_movies_with_rankings(user_1)

     users.each do |user_2|

       common_items = 0

       similarity=0.0;

       user_2_rankings = @extractor.extract_movies_with_rankings(user_2)

       user_1_rankings.each do |ranking_1|

         user_2_rankings.each do |ranking_2|

           if ranking_1['movie'] == ranking_2['movie']

             common_items += 1

             similarity += (ranking_1['rating']-ranking_2['rating'])**2

           end

         end

       end

       if common_items>0

         similarity = Math.sqrt(similarity/common_items)

         similarity = 1.0 - Math.tan(similarity)

         max_common_items =  [user_1_rankings.size, user_2_rankings.size].min

         similarity = similarity * (common_items.to_f/max_common_items.to_f)

       end

       @user_similarity_list << UserSimilarityRelation.new(user_1, user_2, similarity)

     end

    end

 end

end

class UserSimilarityRelation

 def initialize(user_1, user_2, similarity)

    @user_1=user_1

    @user_2=user_2

    @similarity=similarity

 end

 def to_s

    "#{@user_1}, #{@user_2} : #{@similarity}"

 end

end

So let's test our algorithm with a mock extractor and see what results we get:

class MockExtractor

 def initialize

    @rankings = {'Koopa'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'LOTR', 'rating'=>5}], 'Mario'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}], 'Luigi'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}]}

 end

 def extract_users

    ['Mario', 'Luigi', 'Koopa']

 end

 def extract_movies_with_rankings(user)

    @rankings[user]

 end

end

similarity = Similarity.new(MockExtractor.new)

similarity.generate_similarities

puts similarity.user_similarity_list

we run that and we get the following result:

Mario, Mario : 1.0

Mario, Luigi : 1.0

Mario, Koopa : 0.5

Luigi, Mario : 1.0

Luigi, Luigi : 1.0

Luigi, Koopa : 0.5

Koopa, Mario : 0.5

Koopa, Luigi : 0.5

Koopa, Koopa : 1.0

We can see that Mario and Luigi are as similar as possible while Koopa is not as similar based on the different movie he reviewed.

So how to make recommendations based on this similarity information:

The recommendation algorithm:

We would want a Recommender of the form

First we create an implementation in our extractor to extract the movie names. In our MockExtractor we just do:

  def extract_movies
    ['Titanic', 'LOTR', 'Braveheart', 'Harry Potter']
  end

We extend our data to have the following now:

Mario | Titanic | 4

Luigi | Titanic | 4

Koopa | Titanic | 4

Mario | Braveheart | 5

Luigi | Braveheart | 5

Koopa | LOTR | 5

Luigi | LOTR | 4

Luigi | Harry Potter | 4

Koopa | Rambo | 3

Mario | Rocky | 4

When we run the similarity we now get:

Mario, Mario : 1.0

Mario, Luigi : 0.6666666666666666

Mario, Koopa : 0.3333333333333333

Luigi, Mario : 0.6666666666666666

Luigi, Luigi : 1.0

Luigi, Koopa : 0.09699304532693201

Koopa, Mario : 0.3333333333333333

Koopa, Luigi : 0.09699304532693201

Koopa, Koopa : 1.0

We will see how the Users with more similarity to other users affect the recommendation more than users that are not that similar:
then our recommendations_for implementation is:

class Recommender

 def initialize(extractor, similarity_data)

    @extractor = extractor

    @data_set = extractor.rankings

    @similarity_data=similarity_data

 end

 def recommendations_for(user)

    movies = @extractor.extract_movies

    recommendations = []

    movies.each do |movie|

     recommendations << [movie, get_estimated_rating_for_movie(movie, user)]

    end

    recommendations.delete_if do |recommendation|

     recommendation[1].nil?

    end

    recommendations.sort! do |e1,e2|

     e1[1]<=>e2[1]

    end

    recommendations.reverse!

 end

 private

 def find_user_rating_for_movie(user, movie)

    the_movie_rating = @data_set[user].find { |movie_rating| movie_rating['movie']==movie }

    if the_movie_rating.nil?

     return nil

    end

    the_movie_rating['rating']

 end

 def get_estimated_rating_for_movie(movie, user)

    users = @extractor.extract_users

    similarity_sum = 0.0

    weighted_rating_sum = 0.0

    estimated_rating = 0.0

    if find_user_rating_for_movie(user, movie) == nil

     users.each do |other_user|

       next if other_user == user

       if find_user_rating_for_movie(other_user, movie) != nil

         similarity = @similarity_data.find { |similarity_relation| similarity_relation.user_1 == user and similarity_relation.user_2 == other_user }

         similarity = similarity.similarity

         other_user_rating = find_user_rating_for_movie(other_user, movie)

         weighted_rating = similarity * other_user_rating

         weighted_rating_sum += weighted_rating

         similarity_sum += similarity

       end

     end

     if (similarity_sum > 0.0)

       estimated_rating = weighted_rating_sum / similarity_sum

     end

    else

     estimated_rating = nil

    end

    estimated_rating

 end

end

Then we execute:

similarity = Similarity.new(MockExtractor.new)

similarity.generate_similarities

puts similarity.user_similarity_list

recommender = Recommender.new(MockExtractor.new,similarity.user_similarity_list)

recommendations = recommender.recommendations_for('Mario')

puts recommendations

And we get:

Harry Potter
5.0
LOTR
4.333333333333333

If we change now the rating Koopa gives to LOTR to 4 and the rating Luigi gives to LOTR to 5 we get:

Harry Potter
5.0
LOTR
4.666666666666666

We see that because Mario and Luigi are more similar than Mario and Koopa, Luigi's ratings have a bigger impact on the recommended movies to Mario.

This is it. We now have a recommendation engine based on user similarities and rating systems.

Again the performance of the algorithms presented are not good and not suitable for real time recommendation generation, and maybe not even for pre-calculated recommendations. However this is the system (with a couple of fast performance improvements like changing a couple of the iterations to Hash access) I will start to use in my own application and will evolve it if I see it is not a good fit.

carlo scarioni

Sunday, March 4, 2012

Adding recommendations in Ruby

No comments:

About Me