Skip to content


Data-driven modeling

Spring 2012
Department: Applied Mathematics, Columbia University
Instructor: Jake Hofman
Course number: E4990
Time: Mondays, 4:10-6:40pm
Location: 627 Seeley W. Mudd Building

This course is an introduction to applied problems in statistics and machine learning. Lectures will cover the theory behind simple but effective methods for supervised and unsupervised learning as well as tools and techniques for acquiring, cleaning, and utilizing data to solve real-world problems. Emphasis will be on formulating real-world modeling and prediction tasks as optimization problems and comparing methods in terms of practical efficacy and scalability. Students will gain direct experience in acquiring data from online sources and will develop the necessary computing skills to address problems such as spam filtering and recommendation systems. The course will also feature guest lectures from prominent local practioners in academia and industry highlighting how these skills are used in both research and business settings.

Linear algebra (APMA E3101 or equivalent), Probability & Statistics (SIEO W4150 or equivalent). Previous exposure to a high-level programming language such as Python, MATLAB (e.g. COMS W1005), Ruby, Perl, R or similar is recommended.