Movie Rating Data with Side Information on Movies


The attachments below are two movie recommendation data sets. Each contains a movie rating matrix and side information for movies. The movie rating matrix is from Movielens data, and the movie's side information, including movie's genre, plot, and cast, are from IMDB.

Please cite H. Shan and A. Banerjee. Generalized Probabilistic Matrix Factorization for Collaborative Filtering. ICDM, 2010 for the data set.

Each zip file contains 8 files:

rating:                   N*M matrix for N users and M movies.
movie_names:       Names for M movies.
genre:                   M*G matrix. G is the total number of genres. If movie m belongs to genre g, entry (m,g) is 1, otherwise 0.
genre_names:        G Genre names.
plot:                      M*P matrix for word count. P is the total number of words in the dictionary.
plot_words:           P words in the dictionary.
cast:                      M*C matrix. C is the total number of actors. If actor c performs in movie m, entry (m,c) is 1, otherwise 0.
cast_names:           C actor names.


100kml_imdb.zip:        100K ratings from 1000 users on 1700 movies.
1mml_imdb.zip:            1M ratings from 6000 users on 4000 movies.

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.