In this topic's lab, we consider two topics
Principal Components Analysis (PCA) is a powerful technique for reducing
the dimensionality of a feature space while keeping as much as possible of
the predictive power of the full feature space. This
notebook (right-click and download as
Digits/PCA_introduction.ipynb) walks through the procedure and describes how
it works by applying it to a very simple example, where the number of
dimensions is low enough to be visualised easily.
Logistic Regression was covered in class. This notebook gives students an
opportunity to walk through this classification procedure, using the
implementation in scikit-learn. The data we choose to classify is the
well-known NIST handwriting recognition dataset. This
notebook (right-click and download as
Digits/LogisticRegression.ipynb) brings together train-test split,
cross-validation, classfier performance analysis and ridge regression, so
students are encouraged to study it carefully.
The second of these notebooks suggests the following exercise:
More information on how forward selection works can be obtained from this document.