Objective: Practice regression by using data to defeat wine snobs.
Defeat of the wine snobs
The book SuperCrunchers1 tells the story of Orley Ashenfelter, an economist who is really into wine. Most red wines are not drunk until years after they are made, and the problem with marketing these wines is that it is difficult to determine the quality of the wine until it is opened. This means that a wine producer may bottle a new harvest of wine in 2012, and from that wine will be able to know something about its future quality. However, even though she has spent perhaps a million dollars producing the 2012 wine, it will not be sold and consumed until 2016, and she doesn't know how much revenue to expect. Acquiring a loan using the wine as collateral is difficult if the bank doesn't know the wine's true quality, and the amount of money spent promoting a wine needs to be determined in advance, before the wine quality is known!
Predictions of future wine are available by having wine experts taste the wine in 2012 and make a guess at its future quality, but they are not very reliable. Now Orley is not only a wine enthusiast but enjoys crunching data, and wondered if perchance regression might provide useful forecasts. What explanatory variables could help predict the quality of wine in the future?
The climatic conditions when the grapes were grown should have some impact. The best wines are produced from grapes with concentrrated juices, and that requires hot temperatures and dry conditions. Rain might be helpful in the winter when the vines are growing, however. Certainly, then rainfall and temperature might predict wine quality, and Orley constructed the following regression to see if this was the case.
[Equation 1] predicted wine quality = a0 + a1(winter rainfall) + a2(average temperature) + a3(harvest rainfall)
Orley estimated his regression and like what he saw, and decided to not only predict wine quality into the future but to announce his predictions in advance though a newsletter called Liquid Assets. His regression received some attention, including a front-page article in the New York Times, but it was his prediction about 1989 Bordeaux wines that really generated interest.
It was still 1989, the wine was freshly bottled and it would not be drunk until years later, but he predicted that the 1989 Bordeaux would be unbelievably good, then voicing even more praise for 1990 wines. That alone wasn't shocking. What was shocking was the fact that all the wine snobs disagreed, and loudly criticized Orley and his methods, seemingly insulted that an economist would question the opinions of the world's greatest wine authorities.
Orley stoically accepted the criticism, waiting for the future to vindicate him, and it did. When the 1989 and 1990 Bordeaux was opened and consumed, it was exactly what Orley predicted. An economist with one equation defeated the entire wine-snob industry! Lucky for us, Orley made his data available to me, and we will estimate his regression now.
The wine regression
Download Orley's wine data here. The dependent variable is the wine price index, as quality is generally measured by the price it is sold. The other three variables are explanatory variables, and you estimate the regression exactly like the softball regression in the previous article, except that there are three instead of two explanatory variables. Those who forgot how to execute a regression can consult the YouTube tutorial here.
Figure 1—Regression Predicting Wine Quality
from regression predicted wine price = a0 + a1(winter rainfall)
+ a2(average temperature) + a3(harvest rainfall)
The first thing we ask from the regression output is whether each variable is statistically significant, meaning, can we really say the variable affects wine quality? Our rule is to deem a variable statistically significant if its p-value is 0.05 or less, and according to this rule they are all significant. Greater amounts of winter rainfall and growing season temperature really does increase the wine's future price, and more rainfall at harvests weakens its future expected price.
Orley did not just assess the influence of each climatic variables; he gave precise predictions, which means he plugged in the variables of winter rainfall, average temperature, and harvest rainfall for 1989 into the regression and it computed a very high price. The regression summarizes what happened in the past, and it found that in past years when climatic conditions were similar to 1989, those wines ended up selling for a high price. Naturally, 1989 should sell for a high price also.
Graphing the relationshiop between harvest rainfall and wine quality
If someone wanted to understand how harvest rainfall affects wine quality, but finds looking at the coefficients provides little insight, they are likely to want a graph of wine quality and harvest rainfall. This would be easy if harvest rainfall was the only explanatory variable, but what do with do with the other two?
The typical strategy is to assume all other variables equal their average from the data. This requires us to calculate the average winter rainfall using the same data the regression came from, and setting winter rainfall equal to this average. After doing the same thing for growing season temperature, the regression is reduced to one explanatory variable: harvest rainfall.
A quick remark about calculating averages, which can be done easily in Excel without a formula. Simply select all the data for the variable of interest and then look in the lower right-hand corner where it tell you the sum, the count (number of observations), and the average. Do this, and the winter rainfall average will equal 608.41 and the average temperature will be 16.48. Plug these numbers into the regression for the following result.
[Equation 2] predicted wine quality = -3.612871 + 0.000602(608.41) + 0.222877(16.48) - 0.000954(harvest rainfall) = 0.426457 - 0.000954(harvest rainfall)
Now that the regression is reduced to one explanatory variable it can be easily graphed by inserting different values for harvest rainfall and plotting the results. One might pause and ask what values of harvest rainfall should be used? A logical answer, I believe, is to assume the minimum value is zero and the maximum value in the chart to be a little beyond the maximum value in the sample. That maximum value in the sample is 292, so plotting rainfall from 0 to 300 seems reasonable.
Figure 2—Graph From The Wine Regression
References
(1) Ayres, Ian. 2007. SuperCrunchers. Bantam Dell.