Using python-based tools to investigate the Auto-MPG dataset

Last week's practical included an exercise to investigate the well-known Auto-MPG dataset relating to cars available to American customers during the 1970s and 1980s.

Several questions were posed, and students were invited to answer them using tools they were familiar with.

In today's lab, I have shared a notebook that uses python-based tools to investigate the Auto-MPG data set and is heavily annotated, to explain what it is doing and why.

For convenience, the Auto-MPG dataset that was shared previously can also be obtained from here.

Your task is to step through the notebook, running each cell in turn, learning about how the outputs are generated.

Notice how analysis is based on operations applied to the Auto-MPG data after it has been loaded into a pandas dataframe.

The notebook also shows how matplotlib and seaborn libraries can be used to visualise the data.

For the rest of this module, we will use such python libraries to perform data exploration, model building, prediction and validation.