Big Data Algorithms

This course provides an introduction to algorithms and techniques for processing very large data sets, including those that may not be suitable or even available for offline processing. A big data algorithm usually tries to solve a problem in data mining (if one is trying to obtain an appropriate statistical model, or summary features for the data), machine learning (if one is trying to use data samples or data repositories as training sets for discovering models), or online processing of large datasets or data streaming.

Week 1: Preliminaries; probability and elementary statistics; working with iPython and iPython notebooks; data structures and algorithms basics.

Week 2: Introduction to Data Mining (Chapter 1 of MMDS); The Python ecosystem for scientific and numerical data processing and exploratory data analysis

Weeks 3 and 4: Detecting Similarity (Chapter 3 of MMDS) 

Week 5: Streaming algorithms (Chapter 4 of MMDS) 

Week 6: Link analysis of web documents (Chapter 5 of MMDS) 

Week 7: Frequent itemsets and association rules (Chapter 6 of MMDS) 

Week 8: Clustering (Chapter 7 of MMDS); Exam 1 

Week 9: Recommendation systems (Chapter 9 of MMDS) 

Weeks 10 and 11: Dimension reduction (Chapter 11 of MMDS) 

Weeks 12 and 13: Classical machine learning including linear & logistic regression, support-vector machines (Chapter 12 of MMDS)

Week 14: Project demonstrations.