Photograph
Daniel Kluver

about

blog

research

CV


contact

email

twitter

Learning to Learn to Rank

Learning to rank algorithms

Traditional machine learning thinking tends to focus on the regression and classification tasks. In the recommendation domain, this would be the predict task: given one user and one item, tell me what that user will think of that item. This task is great and has led to great leaps in our ability to automatically support users, but it doesn't match how we ultimately use the algorithms. Most systems (search engines, recommender systems, etc.) don't actually care about the one user, one item task. Most systems use the predict task as a stepping stone to allow them to rank items, to make recommendations or search results.

Learning to rank is a (relatively, like last 10 years) new idea in machine learning. Learning to rank approaches try to sidestep using prediction as a stepping stone to make good recommendations and instead optimize a model for its quality as a ranking function. By forsaking the "one item, one user" mindset researchers can create new ways to optimize models that directly (or at least approximately) optimize ranking quality metrics. While no longer capable of making predictions, these algorithms have shown significant improvements against recommendation quality metrics in the recommender system field in the last few years.

Learning to learn to rank

Learning about learning to rank algorithms was an interesting experience. It had been a while since I had to learn a truly new algorithm, and I had this expectation in my mind that learning to rank algorithms would be some unassailable fortress, foreign from all algorithms I have previously known. I was, fortunately, mistaken. There are a few learning to rank algorithms that are quite different from the recommender algorithms I'm used to, but as a whole I was surprised to find that most learning to rank algorithms, once built, are equivalent to traditional models.

That last statement requires some pulling apart. How can a model focused on recommendation be equivalent to one focused on predictions? Arn't learning to rank algorithms different and new?

well... yes and no.

Looking at almost every learning to rank recommender algorithms I know of, I find that the core labor of algorithm development has not been designing a new model (new arrangements of variables) instead the work has been focused on designing new metrics.

The problem with ranking metrics is that they look at, well, ranking. Ranking tasks are inherently non-linear, non discreet, non continuous. There is no easy "smooth optimization" for metrics like mean reciprocal rank, which are in no meaningful way smooth. Therefore the real work of learning to rank is building metrics and evaluations that are smooth, but are still tied to real ranking functions. This can be done by finding upper or lower bounds, or by simplifying the function with a smooth approximation.

Once found, these new smooth error metrics can be used as a loss function for optimizing any standard model. In the recommender system domain this normally means we see a metric looking at, say the area under the ROC curve metric, which we use to optimize a standard matrix factorization model. When implementing these algorithms all you need is a new tool for training the model. This tool may be different than other model trainers, as the definition of one "training sample" will be different, but once built the model can be used in the same recommendation logic you were already using.

I found all this quite surprising. In some ways I was expecting more. Maybe my expectations were screwed up, or maybe its just been too long since I bothered to learn a new algorithm. Do other people find "new and different" algorithms a little intimidating?

The next step

Learning to rank algorithms are cool, and I'm looking forward to seeing these approaches explored more in the coming years. However, I'm also looking forward to the big thing after that. My biggest takeaway from learning to rank is that if we have a metric, we can optimize it. Give a human a number that they can manipulate, and convince them it should be bigger and they will make that number go up. So why stop at ranking? I want to see learning to recommend.

Recommendation and ranking are two very similar tasks. If you can't accurately predict that a user will like on item better than another, or that one item should be at the front of a list instead of another, then recommendation is impossible. Simply ranking, however, isn't the same as recommendation; Its necessary, but not sufficient. An example: we know from past work that properties like the diversity of the recommendation matters to user satisfaction. We can measure diversity (many ways), we can manipulate diversity (many ways), and we have some ideas about how diversity relates to satisfaction. Current learning to rank recommender algorithms make no account of the diversity of their recommendations. Furthermore, I don't see how to account for diversity in a deep way in these algorithms and metrics.

What I want to see is learning to recommend algorithms, that optimize directly (well, approximately, through well founded proxies I suppose) the system's users' satisfaction with the recommendations, or the amount of support the recommendations actually offer the user. These algorithms will need to grow off of the learning to rank algorithms, but go further to account for properties like diversity. I think we can do it; I think it will be cool.