Algorithms and formulas from this post are based on the great book Algorithms of the Intelligent Web
Recommendations in applications are based on similarities. Similarities between two or more users, or similarity between two or more items.
I wanted a simple Recommender that would look something like:
- class Recommender
- def recommendations_for(user)
- end
- end
the recommendations_for method would return the recommended movies for a user.
I wanted to start the recommendation first with the User similarity.
This means that we will compare each user with each other looking at the movies they like (and don’t like) based on the ratings they give to these movies. This way we establish similarities between users based on their common tastes.
Most Recommendation implementations are based on the concept of similarity. Mainly similarity between users and similarity between items.
So first let's see the concept of User Similarity.
User similarity simply means how much a user's rating patterns compare with another user's. So if for example two users rank the same 10 movies with the same rankings we can think that they movie tastes are similar so if we can recommend movies between this users.
For having the best values when looking at user similarity we compare every user with each other, and rank their mutual similarities in order so we have a full arrangements of users and their similar users.
Comparing each user with each other can be a long task in systems with many users and many reviews per user. So the best idea is probably to precalculate this values and not do it in real time. But for now let's just focus on the algorithm I need as a first step in creating recommendations for my site
So let's see how this is done with some code:
The first step is as said find a way to figure out if my users are similar between them.
My current structure is a MongoDB database with the user reviews embedded inside the Movies document, I would need to extract that, but I won't do it right now. Instead I assume that I already extracted it and ended up with the same structure as the file downloaded from movielens.
The structure of this file is:
user id | item id | rating | timestamp
So what exactly we need?
Our user similarity measure between two users will be based on the ranking they each give to a common movie, and the ratio of how many common movies they have watched. so for example
Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5
Mario and Luigi are more similar as they have more movies, from all their movies, in common with each other, even though in the common movie (Titanic) all three of the users have the same ranking.
We wan't to create a result of the form:
user_1_id | user_2_id | similarity
For now we will store this in a simple memory list
So we'll start with the most trivial but expensive algorithm and later see if we can improve it somehow:
For doing this for all the users and all the items we need the expensive n^2 * m^2 algorithm:
The following algortihm and formulas can be found in the great book Algorithms of the Intelligent Web
class Similarity
attr_reader :user_similarity_list
def initialize(extractor)
@extractor = extractor
@user_similarity_list = []
end
def generate_similarities
users = @extractor.extract_users
users.each do |user_1|
user_1_rankings = @extractor.extract_movies_with_rankings(user_1)
users.each do |user_2|
common_items = 0
similarity=0.0;
user_2_rankings = @extractor.extract_movies_with_rankings(user_2)
user_1_rankings.each do |ranking_1|
user_2_rankings.each do |ranking_2|
if ranking_1['movie'] == ranking_2['movie']
common_items += 1
similarity += (ranking_1['rating']-ranking_2['rating'])**2
end
end
end
if common_items>0
similarity = Math.sqrt(similarity/common_items)
similarity = 1.0 - Math.tan(similarity)
max_common_items = [user_1_rankings.size, user_2_rankings.size].min
similarity = similarity * (common_items.to_f/max_common_items.to_f)
end
@user_similarity_list << UserSimilarityRelation.new(user_1, user_2, similarity)
end
end
end
end
class UserSimilarityRelation
def initialize(user_1, user_2, similarity)
@user_1=user_1
@user_2=user_2
@similarity=similarity
end
def to_s
"#{@user_1}, #{@user_2} : #{@similarity}"
end
end
attr_reader :user_similarity_list
def initialize(extractor)
@extractor = extractor
@user_similarity_list = []
end
def generate_similarities
users = @extractor.extract_users
users.each do |user_1|
user_1_rankings = @extractor.extract_movies_with_rankings(user_1)
users.each do |user_2|
common_items = 0
similarity=0.0;
user_2_rankings = @extractor.extract_movies_with_rankings(user_2)
user_1_rankings.each do |ranking_1|
user_2_rankings.each do |ranking_2|
if ranking_1['movie'] == ranking_2['movie']
common_items += 1
similarity += (ranking_1['rating']-ranking_2['rating'])**2
end
end
end
if common_items>0
similarity = Math.sqrt(similarity/common_items)
similarity = 1.0 - Math.tan(similarity)
max_common_items = [user_1_rankings.size, user_2_rankings.size].min
similarity = similarity * (common_items.to_f/max_common_items.to_f)
end
@user_similarity_list << UserSimilarityRelation.new(user_1, user_2, similarity)
end
end
end
end
class UserSimilarityRelation
def initialize(user_1, user_2, similarity)
@user_1=user_1
@user_2=user_2
@similarity=similarity
end
def to_s
"#{@user_1}, #{@user_2} : #{@similarity}"
end
end
So let's test our algorithm with a mock extractor and see what results we get:
class MockExtractor
def initialize
@rankings = {'Koopa'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'LOTR', 'rating'=>5}], 'Mario'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}], 'Luigi'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}]}
end
def extract_users
['Mario', 'Luigi', 'Koopa']
end
def extract_movies_with_rankings(user)
@rankings[user]
end
end
similarity = Similarity.new(MockExtractor.new)
similarity.generate_similarities
puts similarity.user_similarity_list
def initialize
@rankings = {'Koopa'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'LOTR', 'rating'=>5}], 'Mario'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}], 'Luigi'=>[{'movie'=>'Titanic', 'rating'=>4}, {'movie'=>'Braveheart', 'rating'=>5}]}
end
def extract_users
['Mario', 'Luigi', 'Koopa']
end
def extract_movies_with_rankings(user)
@rankings[user]
end
end
similarity = Similarity.new(MockExtractor.new)
similarity.generate_similarities
puts similarity.user_similarity_list
we run that and we get the following result:
Mario, Mario : 1.0
Mario, Luigi : 1.0
Mario, Koopa : 0.5
Luigi, Mario : 1.0
Luigi, Luigi : 1.0
Luigi, Koopa : 0.5
Koopa, Mario : 0.5
Koopa, Luigi : 0.5
Koopa, Koopa : 1.0
We can see that Mario and Luigi are as similar as possible while Koopa is not as similar based on the different movie he reviewed.
So how to make recommendations based on this similarity information:
The recommendation algorithm:
We would want a Recommender of the form
First we create an implementation in our extractor to extract the movie names. In our MockExtractor we just do:
- def extract_movies
- ['Titanic', 'LOTR', 'Braveheart', 'Harry Potter']
- end
We extend our data to have the following now:
Mario | Titanic | 4
Luigi | Titanic | 4
Koopa | Titanic | 4
Mario | Braveheart | 5
Luigi | Braveheart | 5
Koopa | LOTR | 5
Luigi | LOTR | 4
Luigi | Harry Potter | 4
Koopa | Rambo | 3
Mario | Rocky | 4
When we run the similarity we now get:
Mario, Mario : 1.0
Mario, Luigi : 0.6666666666666666
Mario, Koopa : 0.3333333333333333
Luigi, Mario : 0.6666666666666666
Luigi, Luigi : 1.0
Luigi, Koopa : 0.09699304532693201
Koopa, Mario : 0.3333333333333333
Koopa, Luigi : 0.09699304532693201
Koopa, Koopa : 1.0
We will see how the Users with more similarity to other users affect the recommendation more than users that are not that similar:
then our recommendations_for implementation is:
class Recommender
def initialize(extractor, similarity_data)
@extractor = extractor
@data_set = extractor.rankings
@similarity_data=similarity_data
end
def recommendations_for(user)
movies = @extractor.extract_movies
recommendations = []
movies.each do |movie|
recommendations << [movie, get_estimated_rating_for_movie(movie, user)]
end
recommendations.delete_if do |recommendation|
recommendation[1].nil?
end
recommendations.sort! do |e1,e2|
e1[1]<=>e2[1]
end
recommendations.reverse!
end
private
def find_user_rating_for_movie(user, movie)
the_movie_rating = @data_set[user].find { |movie_rating| movie_rating['movie']==movie }
if the_movie_rating.nil?
return nil
end
the_movie_rating['rating']
end
def get_estimated_rating_for_movie(movie, user)
users = @extractor.extract_users
similarity_sum = 0.0
weighted_rating_sum = 0.0
estimated_rating = 0.0
if find_user_rating_for_movie(user, movie) == nil
users.each do |other_user|
next if other_user == user
if find_user_rating_for_movie(other_user, movie) != nil
similarity = @similarity_data.find { |similarity_relation| similarity_relation.user_1 == user and similarity_relation.user_2 == other_user }
similarity = similarity.similarity
other_user_rating = find_user_rating_for_movie(other_user, movie)
weighted_rating = similarity * other_user_rating
weighted_rating_sum += weighted_rating
similarity_sum += similarity
end
end
if (similarity_sum > 0.0)
estimated_rating = weighted_rating_sum / similarity_sum
end
else
estimated_rating = nil
end
estimated_rating
end
end
def initialize(extractor, similarity_data)
@extractor = extractor
@data_set = extractor.rankings
@similarity_data=similarity_data
end
def recommendations_for(user)
movies = @extractor.extract_movies
recommendations = []
movies.each do |movie|
recommendations << [movie, get_estimated_rating_for_movie(movie, user)]
end
recommendations.delete_if do |recommendation|
recommendation[1].nil?
end
recommendations.sort! do |e1,e2|
e1[1]<=>e2[1]
end
recommendations.reverse!
end
private
def find_user_rating_for_movie(user, movie)
the_movie_rating = @data_set[user].find { |movie_rating| movie_rating['movie']==movie }
if the_movie_rating.nil?
return nil
end
the_movie_rating['rating']
end
def get_estimated_rating_for_movie(movie, user)
users = @extractor.extract_users
similarity_sum = 0.0
weighted_rating_sum = 0.0
estimated_rating = 0.0
if find_user_rating_for_movie(user, movie) == nil
users.each do |other_user|
next if other_user == user
if find_user_rating_for_movie(other_user, movie) != nil
similarity = @similarity_data.find { |similarity_relation| similarity_relation.user_1 == user and similarity_relation.user_2 == other_user }
similarity = similarity.similarity
other_user_rating = find_user_rating_for_movie(other_user, movie)
weighted_rating = similarity * other_user_rating
weighted_rating_sum += weighted_rating
similarity_sum += similarity
end
end
if (similarity_sum > 0.0)
estimated_rating = weighted_rating_sum / similarity_sum
end
else
estimated_rating = nil
end
estimated_rating
end
end
Then we execute:
similarity = Similarity.new(MockExtractor.new)
similarity.generate_similarities
puts similarity.user_similarity_list
recommender = Recommender.new(MockExtractor.new,similarity.user_similarity_list)
recommendations = recommender.recommendations_for('Mario')
puts recommendations
similarity.generate_similarities
puts similarity.user_similarity_list
recommender = Recommender.new(MockExtractor.new,similarity.user_similarity_list)
recommendations = recommender.recommendations_for('Mario')
puts recommendations
And we get:
Harry Potter
5.0
LOTR
4.333333333333333
If we change now the rating Koopa gives to LOTR to 4 and the rating Luigi gives to LOTR to 5 we get:
Harry Potter
5.0
LOTR
4.666666666666666
We see that because Mario and Luigi are more similar than Mario and Koopa, Luigi's ratings have a bigger impact on the recommended movies to Mario.
This is it. We now have a recommendation engine based on user similarities and rating systems.
Again the performance of the algorithms presented are not good and not suitable for real time recommendation generation, and maybe not even for pre-calculated recommendations. However this is the system (with a couple of fast performance improvements like changing a couple of the iterations to Hash access) I will start to use in my own application and will evolve it if I see it is not a good fit.
No comments:
Post a Comment