Daniel Kluver - Hidden Factors and Hidden Topics

by Daniel Kluver
on Thu 13 August 2015

Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text

While I was looking around for recommender systems data sets I stumbled upon an excellent reviews dataset. One of the things about the dataset that really caught my eye was actually the paper that is associated with this dataset: "Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text" By Julian McAuley and Jure Leskovec. I decided to give the paper a read and was delighted to find that this paper looks at a class of algorithms that I've been interested in for a while. These are algorithms that combine ratings with reviews in recommender systems. A copy of the paper can be found From the same page as the dataset (sidenote: its worth pointing out that this dataset has since been updated and the up to date version can be found at here).

The basic insight of the authors is that many websites now collect ratings and reviews for their items. These websites use the ratings to help recommend, but often ignore the reviews, which is kind of silly. Reviews are a very valuable way to know what people thought of a movie, book, or restaurant. Therefore the authors designed an algorithm that would use both ratings and reviews to provide better predictions. To do this the authors combined two ideas: latent factor models for modeling ratings, and latent topic models for modeling review text.

Latent factor models

Latent factor models are one of the most popular approaches for recommender system algorithms. For a great coverage of the classic SVD recommender algorithm based on this technique see Simon Funk's influential blog post from 2006. These algorithms assume that there are a set of latent factors that explain why people rate the way they do. For example, when rating books one reader might like a romantic side plot in their novel, where another would not. One latent feature could measure the degree of romantic side plot and explain why the two readers disagree on a given book, but otherwise tend to agree.

One core issue of these algorithms is that we have no way of understanding latent features. In our previous example, it could be that feature five measures romantic side plots, but we would have no way of knowing. Without also giving the algorithm access to descriptions of the books, all an algorithm can do is attempt to model patterns in the ratings. This labeling problem limits the usefulness of the latent factors to a handy algorithmic abstraction that has no other use to the interface.

Latent factor models have also been used to understand text. Algorithm's like LDA (Latent Dirichlet Analysis) cluster text and topics in a collection of documents to try to make sense of the documents. These algorithms assume that there are a collection of latent topics (factors) and that each document has its own unique mix over these topics. After running LDA on a body of text we not only have estimated labels of what topics are relevant to each document, but also what words are associated with which topic. For example, I could run LDA on this blog to automatically extract the latent topics that I tend to write about, automatically label each post with its relevant topics, and generate a list of words associated with each topic. From the word list I would then be able to label the topics (since tagging a blog with "topic #6" doesn't really help readers).

The authors noticed that these two models both produce a topic / feature vector description of each document / item. By connecting these, so that the features that help predict ratings are also be the topics that help explain text, we can combine these algorithms and hopefully improve performance. Not only will this allow us to bring more data to bear on the problems of recommending and classifying text, but also it will help solve the labeling problem of using latent factor models for recommendation. After presenting this background and insight the authors demonstrate how to derive and optimize a joint loss function for both models that allows these two models to be learned simultaneously. More information on these details can be found in the paper.

Thoughts

Overall I think this is an early paper looking into a really cool direction for new algorithms. The writing on the paper was very clear, and they did an excellent job of motivating this algorithm direction and explain their algorithm without getting so deep in the math that it became hard to follow. Having seen some of the work that followed in 2014 this algorithm feels a little primitive, but that doesn't really matter. The authors were successfully able to show that these two things could be combined into a single algorithm, and that the combined algorithm outperforms its simple predecessors. With that shown I expect to see continuing work refining the combination in the next few years.

I don't have a lot of thoughts specifically about this paper, but I do have some ideas about ways we can apply these "ratings and reviews" algorithms. I have two major questions that I think would be cool if addressed. The first has to do with another paper I heard about a while ago: "Reading Tea Leaves: How Humans Interpret Topic Models" (which I will probably blog about down the line at some point). The tea leaves paper looks at how people interpret topic models and how much sense they really make. By manipulating the models and asking humans to detect the manipulations the authors demonstrated that these models don't seem to make as much sense to humans as we think they do. Many authors in the recommender systems field would probably say "that's all fine and good, but as long as we improve recommender performance I don't care if the topics make sense" but I think that understanding this issue is important, especially if we want to do cool stuff with these algorithms.

My second concern is more theoretic, and I don't actually know how to tackle it: how do we know that we should combine these models? What evidence do we have that what people write about in reviews actually captures what people did or did not like about something? Sure the data makes the algorithm work better, but why? I would love to see a careful argument that lays out what we are assuming about human behavior when we say that combining these algorithms is a good idea.

With all that said, I still think that these algorithms are interesting. They are interesting because they map all three things, users, items, and descriptions into a common space. This opens up a new avenue of algorithms on the latent space that couldn't have worked without words and descriptions. It would be cool to see if we can learn about users, not just items, by using these algorithms. For example, could we learn to separate different reasons that a user didn't like a film from the complaints that they make? Could we learn to predict what someone will or will not like, allowing us to make more nuanced predictions like "You will like the action and cool robot in this film, but you will dislike the acting and story, overall you will probably think its only OK, and would have been a lot better with a better lead actor"? At the other side of the system, if we can learn about users from their reviews, can we compare reviews with ratings? What is the best way to learn about a user, asking for more ratings, or more in-depth reviews? What makes a review useful to the system? Obviously, some reviews ("I liked it") don't contain any extra information of the rating, but other probably do. I would love to see work exploring these options down the line, I think it could open up an interesting new avenue in recommender system interactions.