PCA (worked example) and Logistic Regression for handwritten digit recognition

Notebooks

In this topic's lab, we consider two topics

  1. Principal Components Analysis (PCA) is a powerful technique for reducing the dimensionality of a feature space while keeping as much as possible of the predictive power of the full feature space. This notebook (right-click and download as Digits/PCA_introduction.ipynb) walks through the procedure and describes how it works by applying it to a very simple example, where the number of dimensions is low enough to be visualised easily.

  2. Logistic Regression was covered in class. This notebook gives students an opportunity to walk through this classification procedure, using the implementation in scikit-learn. The data we choose to classify is the well-known NIST handwriting recognition dataset. This notebook (right-click and download as Digits/LogisticRegression.ipynb) brings together train-test split, cross-validation, classfier performance analysis and ridge regression, so students are encouraged to study it carefully.

The second of these notebooks suggests the following exercise:

Additional resources

More information on how forward selection works can be obtained from this document.