The Algorithmic Foundations of Data Privacy
CSCI 8980
Fall 2018

Monday/Wedsnesday 4:00 PM -- 5:15 PM
Amundson Hall 116

Instructor

Steven Wu, Keller 6-225E. Office hour: Monday 9:00 AM -- 10:00 AM.

Canvas

Please sign up for the course page on Canvas and monitor it throughout the term. We will be using Canvas to post assignments and updates, and to conduct class discussions. Please post your questions there (you can also post only for the instructor to see).

Overview

With the help of computers and mobile devices, companies, researchers and government can now easily collect vast amounts of personal data, including medical records, GPS locations, web-search logs, and social exchanges. The analysis of such a wide range of data can reveal important insights that can benefit healthcare, research, and policy-making, but it also imposes the risk of leaking sensitive information about the individual citizens. This tension between utility and privacy motivates us to study the following question in this course:

How do we perform useful analysis on a data set that contains sensitive information about individuals without compromising the privacy of those individuals?

To study this question, we will introduce the notion of differential privacy, which provides a framework of designing data analysis algorithms with strong, meaningful, and mathematically provable privacy guarantees. We will survey a set of algorithmic tools that allow us to privately perform a wide range of statistical analyses. Of course, privacy does not come for free, and we will also study some of the fundamental limitations imposed by the requirement of differential privacy. Through the discussion of these results, we will also demonstrate some of the most novel and surprising connections between differential privacy and other areas of theoretical computer science, including machine learning theory, cryptography, convex geometry, and game theory.

Prerequisites

This is a theory-oriented course, intended for graduate students and advanced undergraduates. The (informal) prerequisites are mathematical maturity, ability to read and engage with original research, and familiarity with probability and introductory algorithms. Prior coursework in algorithms, computational complexity, machine learning, and probability will be helpful.

Textbooks

Most of the course will be based on the excellent textbook The Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth. For certain topics, we might also be using Salil Vadhan's fantastic survey The Complexity of Differential Privacy.

Evaluation

Your grade in the course will be based on a mix of work completed individually and that completed in cooperation with your reading and research group. This is a preliminary breakdown that may change during the term:

2 Homeworks (10% each)
Attendance (15% total)
Participation (e.g., speaking up in class, participation in your reading and research group discussions) (5%)
Group research/reading project (10% for interim writeup; 30% for final writeup)
Project Presentation (20%)

Resources from similar courses

This course's design, content, and website are based in part on similar courses:

The Algorithmic Foundations of Adaptive Data Analysis, co-taught by Aaron Roth at Penn and Adam Smith at Boston University.
Special Topics in Data Privacy, taught by Katrina Ligett at Caltech, who generously offered her html and css code for her course page.
The Algorithmic Foundations of Data Privacy, taught by Aaron Roth at Penn.
Algorithmic Challenges in Data Privacy, taught by Sofya Raskhodnikova and Adam Smith at Penn State.
Foundations of Privacy, taught by Moni Naor, at the Weizmann Institute.

The Algorithmic Foundations of Data PrivacyCSCI 8980Fall 2018