I have a PhD in Computer Science from GroupLens Research at the University of Minnesota. My research is focused on the production and consumption of quality content in peer production communities like Wikipedia and OpenStreetMap. One part of this is studying what is meant by content quality in a given community, and then applying machine learning to determine this at scale. A second part is studying how these communities produce quality content to see if there are ways to improve the process, for example by using recommender systems to point contributors to contribution oppportunities. Lastly, I look at whether peer production communities are successful at producing quality content where their audience is, finding that there is commonly a mismatch between the demand for and the supply of quality content.
Misalignment Between Supply and Demand of Quality Content in Peer Production Communities (ICWSM 2015)
Warncke-Wang, M., Ranjan, V., Terveen, L., and Hecht, B. "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" in the proceedings of the The 9th International AAAI Conference on Web and Social Media (ICWSM).
In peer production communities, individual community members typically decide for themselves where to make contributions, often driven by factors such as “fun” or a belief that “information should be free”. However, the extent to which this bottom-up, interest-driven content production paradigm meets the needs of consumers of this content is unclear. In this paper, we introduce an analytical framework for studying the relationship between content production and consumption in peer production communities. Applying our framework to four large Wikipedia language editions, we find extensive misalignment between production and consumption in all of them. We also show that this misalignment has an enormous effect on Wikipedias readers. For example, over 1.5 billion monthly pageviews in the English Wikipedia go to articles that would be of much higher quality if editors optimally distributed their work to meet reader demand. Examining misalignment in more detail, we observe that there is an excess of high-quality content about certain specific topics, and that the majority of articles with insufficient quality are in a stable state (i.e. not breaking news). Finally, we discuss technologies and community practises that can help reduce the misalignment between the supply of and demand for high-quality content in peer production communities.
You can download a pre-print PDF of this paper.
User Session Identification Based on Strong Regularities in Inter-activity Time
Halfaker, A., Keyes, O., Kluver, D., Thebault-Spieker, J., Nguyen, T., Shores, K., Uduwage, A., and Warncke-Wang, M. "User Session Identification Based on Strong Regularities in Inter-activity Time", in the proceedings of the 24th International World Wide Web Conference (WWW).
Abstract: Session identification is a common strategy used to develop metrics for web analytics and behavioral analyses of user-facing systems. Past work has argued that session identification strategies based on an inactivity threshold is inherently arbitrary or advocated that thresholds be set at about 30 minutes. In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. video gaming, search, page views and volunteer contributions). We describe a methodology for identifying clusters of user activity and argue that regularity with which these activity clusters appear implies a good rule-of-thumb inactivity threshold of about 1 hour. We conclude with implications that these temporal rhythms may have for system design based on our observations and theories of goal-directed human activity.
A PDF of this paper can be downloaded from arXiv.
The Success and Failure of Quality Improvement Projects in Peer Production Communities (CSCW 2015)
Warncke-Wang, M., Ayukaev, V. R., Hecht, B., and Terveen, L. "The Success and Failure of Quality Improvement Projects in Peer Production Communities", in the proceedings of the 18th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW).
Abstract: Peer production communities have been proven to be successful at creating valuable artefacts, with Wikipedia as a prime example. However, a number of studies have shown that work in these communities tends to be of uneven quality and certain content areas receive more attention than others. In this paper, we examine the efficacy of a range of targeted strategies to increase the quality of under-attended content areas in peer production communities. Mining data from five quality improvement projects in the English Wikipedia, the largest peer production community in the world, we show that certain types of strategies (e.g. creating artefacts from scratch) have better quality outcomes than others (e.g. improving existing artefacts), even if both are done by a similar cohort of participants. We discuss the implications of our findings for Wikipedia as well as other peer production communities.
You can download a PDF of this paper.
Tell Me More: An Actionable Quality Model for Wikipedia (WikiSym 2013)
Warncke-Wang, M., Cosley, D., and Riedl, J. "Tell Me More: An Actionable Quality Model for Wikipedia", in the proceedings of WikiSym 2013.
Abstract: In this paper we address the problem of developing actionable quality models for Wikipedia, models whose features directly suggest strategies for improving the quality of a given article. We first survey the literature in order to understand the notion of article quality in the context of Wikipedia and existing approaches to automatically assess article quality. We then develop classification models with varying combinations of more or less actionable features, and find that a model that only contains clearly actionable features delivers solid performance. Lastly we discuss the implications of these results in terms of how they can help improve the quality of articles across Wikipedia.
You can download a PDF of this paper.
In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network (WikiSym 2012)
Warncke-Wang, M., Uduwage, A., Dong, Z., and Riedl, J. "In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network", in the proceedings of WikiSym 2012.
Abstract: Wikipedia has become one of the primary encyclopaedic information repositories on the World Wide Web. It started in 2001 with a single edition in the English language and has since expanded to more than 20 million articles in 283 languages. Criss-crossing between the Wikipedias is an interlanguage link network, connecting the articles of one edition of Wikipedia to another. We describe characteristics of articles covered by nearly all Wikipedias and those covered by only a single language edition, we use the network to understand how we can judge the similarity between Wikipedias based on concept coverage, and we investigate the flow of translation between a selection of the larger Wikipedias. Our findings indicate that the relationships between Wikipedia editions follow Tobler's first law of geography: similarity decreases with increasing distance. The number of articles in a Wikipedia edition is found to be the strongest predictor of similarity, while language similarity also appears to have an influence. The English Wikipedia edition is by far the primary source of translations. We discuss the impact of these results for Wikipedia as well as user-generated content communities in general.
You can download a PDF of this paper.
From the introduction: "Wikipedia’s best content is mainly where its readers aren’t. For instance, the article about weddings is seen thousands of times every day, yet the community labels it “quite incomplete”, its prose “distinctly unencyclopedic”, and a call for additional sources to verify its content has been featured prominently at the top of the article for over four years. It turns out that this is not uncommon; each month Wikipedia’s articles are viewed billions of times, and over 40% of these views are to articles that would be of significantly higher quality if the encyclopaedia’s contributors followed their readers." Read more...