Clustering: k-means, Gaussian Mixtures, DBSCAN, AGNES, K-modes and K-prototypes

As with previous weeks, the working environment for this week's practicals is a set of Jupyter notebooks.

Each of these notebooks should be downloaded as a text file and run using jupyter-lab or VS-code.

The notebooks can be obtained from the following links:

Students should right-click and download each file to the same directory, and then follow the advice below.

Each notebook is extensively annotated. Students are advised to run each notebook one-by-one, and to consider how they can contribute to their CA3 attempts.

Installation

You may need to update your python packages by installing hdbscan and python-graphviz, using the following commands:

1
2
3
conda install -c conda-forge hdbscan

conda install -c conda-forge python-graphviz

Resources

The datafiles needed for the practicals can be obtained from here:

Students should download these data files to a data/ folder (they might need to create this first) that is placed in the same folder as where they downloaded the notebooks.

There is also a zipped folder of shared python support functions. This file should be unzipped and its folder should be placed in the same folder as where they placed the notebooks.