This week we had our second guest lecture. Max Shron presented a live demo of using Google Transit data to analyze the effects of budget cuts on passenger wait times, adapting his original analysis for Chicago to more current New York City MTA data. Max highlighted several useful python modules, including csv.DictReader for easily parsing CSV files, doctest for simple unit testing, and itertools for efficient group-by operations. See the course github repository for the source code.
In the second half of class we discussed unsupervised learning. Specifically, we introduced k-means as a simple but effective clustering method, with applications in clustering image data. We then discussed Gaussian mixture models and expectation-maximization as a more flexible clustering framework. See the slides for more details on k-means (notes on GMMs coming soon).