Adapted from the Fall 2016 Data8 course at Berkeley (http://data8.org/)
The course was designed as a multidisciplinary course for noncomputer science majors. The first several weeks of the course introduce programming concepts, as well as Python packages created for the course. Sprinkled among programming are concepts related to data sciences and classification. The goal over the 3 weeks of the REU DataSciences BootCamp will be to complete the three course projects, in which you will learn classification techniques. The last week of BootCamp will explore Weka, Matlab, and processing.org. The pace with which you move through this material will depend on your experience.
The textbook and lecture slides present identical material designed to prepare you for the labs. Much of the material is presented within Jupyter Notebook, but we will implement everything in native Python. Outside of lab time, it is expected that you read the textbook and/or watch the video. In class, you can implement the labs to build skills or work directly on a project.
If you are using your own laptop, then you will need to install Jupyter Notebook, as well as Python. Instructions can be found here: http://jupyter.org/install.html.
If you need help getting started with Python, try Dr. Larson's tutorial on the language: http://wwwusers.cs.umn.edu/~larson/repowebsiteresources/website/examples/csresources/python.html
Download or clone the repo data8assets from GitHub. If you don't know git, start with Dr. Larson's tutorial: http://wwwusers.cs.umn.edu/~larson/repowebsiteresources/website/examples/csresources/git.html
Download or clone the repo datasciences from GitHub (https://github.com/data8/datascience)
The Berkeley course schedule can be found here: http://data8.org/fa16/ . Lab material can be found in the data8assets repo under materials/fa16/lab. See the readme in the forked repo (https://github.com/lars1050/data8assets/tree/ghpages/materials/fa16) for directions on how to open these in Jupyter notebook.
LAB

READING

VIDEO


1

Introduction, Data Science

20160824
starts at 5:15
NEW: 15:33, Data Science


2


Causality and Experiments

20160826
starts at 22:00

3


Programming in Python

20160829
starts at 11:30

4

Data Types

20160831
starts at 8:55


5


Tables

20160902
starts at 16:15

6

Tables

20160907
starts at 17:50


7


Visualization

20160909
starts at 9:20

8


Visualization

20160912
starts at 8:05

9

Functions and Tables

20160914
starts at 13:03 (Regression and Nearest Neighbor)


10


Functions and Tables

20160916
starts at 10:00

11


Functions and Tables

20160919
starts at 5:30
(This video is more supplemental to reading.)

12

California Water Usage

Randomness Intro, Conditional Statements

20160921
starts at 11:00 (NEW)

13


Iteration, Monty Hall

20160923
starts at 4:30 (PROJECT)

14


Finding Probabilities, Sampling

20160926
starts at 7:30
20:40 (Probabilities)

15


Empirical Distributions

20160928
starts at 10:40 (NEW: Sampling)
22:55 (Distributions)

16


Empirical Distributions

20160930

17


Testing Hypotheses

20161003

18


Testing Hypotheses

20161005

19

Testing Hypotheses

20161007


20


Testing Hypotheses

20161010

21


Estimation

20161017

22

Estimation

20161019


23


Estimation

20161021

24


Why the Mean Matters

20161024

25

Inference and Capital Punishment

Why the Mean Matters

20161026

26


Why the Mean Matters

20161028

27  Prediction  20161031  
28  lab07: Regression  Prediction  20161102 
29  Prediction  20161104  
30  Inference for Regression  20161107  
31  lab08: Age of the Universe  Inference for Regression  20161109 
32  Classification  20161114  
33  Classification  20161116  
34  lab09: classification discussion  Comparing Two Samples  20161118 
35  Comparing Two Samples  20161121  
36  Comparing Two Samples  20161128  
37  lab10: Conditional Probability  Updating Predictions  20161130 