Diversified Algorithm

The Algorithm

The goal of this algorithm is to recommend content which will broaden a user's horizon (i.e. to educate them).

Like Matrix-Factorisation-Based Collaborative Filtering, the diversified algorithm also gives higher scores to content which other users similar to you like, assuming that you have similar tastes. Hence recommendations should feel relevant, but at the same time the diversified algorithm selects the most mutually different items from these high-scored items (where distance is measured based on latent factors), so the resulting set is both diverse and still relevant.


This algorithm, just like classical collaborative filtering, does not make use of content metadata, which is a great advantage in case of the common problem of bad metadata quality.


Collaborative approaches tend to suffer from both item and user cold-start, which means that before enough users watched a certain item, it cannot be recommended, and that before a user watched enough content, they do not get reasonable recommendations.


Diversified recommendations alone might appear more random than a user would expect recommendations to be. Therefore they should be annotated properly to establish user expectations (e.g. "Something new for you"). Moreover, this should not be the exclusive algorithm used everywhere on your website. Create several blocks of recommendations produced by different algorithms and annotate them. To learn about other approaches, see this tutorial on content-based filtering, this one on collaborative filtering and this one on trending.


To be able to successfully follow this tutorial, you need to be familiar with notebooks and the pipe Manager Interface. In case you are not, follow this tutorial first. You should also be familiar with classical collaborative filtering and it's implementation in PEACH. Refer to this tutorial to learn about it.

The Structure

The algorithm is based on ALS and uses the exact same model. Items sorted by score (for the user) are used as input, but in contrast to the normal CF algorithm which returns random/top X recommendations, this algorithm finds the most diverse X items to be recommended. This is done by first selecting candidates from the centroid of the initial recommendation set, then iteratively adding items to the final set based on their distance to the rest of the current set. The distance metric used is Manhattan distance.

The algorithm is divided into two notebooks: a task and a query notebook. To find out more about model training, please refer to this tutorial.


If you have both classical collaborative filtering and the diversified algorithm, you do not need to train and maintain two models - the algorithms can work on the same one.


Below we will omit the part about adapting the @pipe_task as it is described in he tutorial on collaborative filtering. You don't need to do anything if you already adapted the task for your classical collaborative filtering algorithm.

The Workflow

1. Recommendation query

Go to the notebook which contains @pipe_query. You might notice that this part is quite different than what you saw in the collaborative filtering notebook. However, you don't need to make any new changes if you followed the other tutorial.


You've just diversified your recommendations! Now you can train a model by running the @pipe_task function from the collaborative filtering notebook (or a separate one in case you didn't have it) and then request recommendations for some test user by calling the @pipe_query on the query notebook.

Final notes

Here you can find a blog post on the motivation behind the diversified algorithm as well as a detailed description of it.

If something is still unclear or you have any problem, please don't hesitate to contact us.

There will soon be an example of diversified collaborative filtering which uses the MovieLens dataset.