Module Overview

Welcome to Data Mining,
We will start this module with a warm up lab to set up software and some basic data analysis. The formal lecture will give an overview of the module — what will be covered, how it will be delivered and assessed.

Motivating Example

This week we review some of the most useful pandas commands and look at how to classify iris plants by species

Data Handling

Introduction to Python and Numpy

Python features for data manipulation
Array handling with numpy

Exploratory Data Analysis

Before we begin to think about constructing models representing our data we need to see what kind of data we have, what is being measured, how clean it is, etc.

Data and metadata
Statistics for understanding

Exploratory Data Analysis2

Continuing our review of Exploratory Data analysis, we consider richer analytics on data, leading to identification of features for prediction

Review of EDA Phase 1
Analysing features and targets

Data Modelling

This week we will discuss general concepts/issues in the construction of data mining models.

Type of models
Model building as a process
Modelling concerns

Regression1

Sometimes we need to predict a numeric value or set of such values, given existing (training) data

Motivation - fitting a line to data
Perspectives - optimisation, linear algebra, statistics
Measuring the quality of the results

Classification1

Given labeled training data, develop models to classify new data based on what we have seen in the training data

Contrast with regression
Classification metrics
Classification using logistic regression

Regression2

We continue our introduction to regression, considering how to make it more robust.

Regularisation
Transformation of variables
Regression in practice

Classification2

Classification using techniques that use probability-based models

Decision Trees
Naive Bayes

Clustering

Given unlabeled data, look for subsets that help to improve understanding of the overall data set

Partitioning
Hierarchies