Skip to content

Syllabus

The following is a tentative list of topics that will be covered in the course, application topics for assignments, and related reading materials. (Subject to change.)

Techniques/topics:
Data representation / Feature spaces
K-nearest neighbors
Naive bayes
Maximum likelihood inference
Linear regression
Logistic regression
Kernel methods
Support vector machines
K-means
Mixture models
Principal components analysis
Collaborative filtering
Matrix factorization
Model Assessment

Practical applications:
Image recognition
Spam filtering
Recommendation systems
Document clustering

Tools
Bash / Unix tools (sed/awk/grep, etc.)
APIs / Screen scraping
Visualization / R
Hadoop / Pig

Reading materials:
(note: a subset of the listed chapters will be selected)
“Collective Intelligence”, Toby Segaran [2007] (Chapters 3,5,6,9)
“The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, Trevor Hastie, Robert Tibshirani, & Jerome Friedman [2009] (Chapters 1,3,4,6,12,14)
“Pattern Recognition and Machine Learning”, Christopher Bishop [2006] (Chapters 3,4,7,8,9,11,12)