I recommend that you create an Advertising folder and save the notebook you are working on in that folder.
This is where the pandas is very useful, as it provides much of the basic data handling required for python machine learning.
1 2 3 | |
This is an opportunity to use the exploratory data analysis (EDA) techniques that we covered earlier in the module.
Use scatter plots, overlaid on the same plot, to get a sense of how Sales varies with Newspaper, Radio and TV.
As we did in class, we are going to compare different linear models, beginning with the full set of natural predictors. You might find the following helpful:
1 2 | |
As with the diamond data, you will also need the R2 and root-mean-square residual error (RMSRE) of each fit as metrics for this and other OLS fits to the Advertising data.
You should review the model summary, residuals, etc, to look for evidence that the fit satisfies the assumptions of linear regression. You should make a note of your findings in nearby markdown cells.
The slides presented in class describe how to proceed. This is a "greedy" search technique (in the sense that you choose the best available option at each stage, without trying to decide what would be the best option over all stages). You should find that the best model with the existing predictors has the formula Sales ~ TV + Radio but that it can be improved if the TV-radio interaction term TV:Radio is added as a candidate feature.
With the Diamond data from Regression 1, there was only one feature so it was easy to visualise the residuals, since they were functions of one variable (weight). For the advertising data, with more than one feature (such as the best model above), plotting residuals is more difficult. It is possible to suppress a variable (i.e., TV or Sales ) one at a time and this provides some indication of their distribution, but interpreting the the resulting pair of plots requires care.
Plotting multivariate residuals directly requires advanced visualisation techniques.
The data file (right-click and download as `Advertising/data/Advertising.csv) is available for your use.