Each year we get bombarded with projections, predictions, and betting lines about season win totals. Such an endeavor is often foolhardy because injuries and depth charts are exceptionally difficult to estimate. Nonetheless, we try year in and year out. I think it is useful in that it gives us the chance to temper our expectations each year. The problem often is though that we utilize the projections/predictions at the beginning of the year and then promptly forget them and never assess how well they fit the actual result. Also, one thing I want to make clear . . . I know nothing about betting. I just grabbed the over/under. In no way am I suggesting that the Vegas line is the only line or that any differences in accuracy between the systems results in you making any money. None of that is my concern. I am more focused on the Vegas line as being representative of a generic mob of people model.
One thing to recognize is the usage of the terms prediction and projection. A projection in this exercise is an estimation of what would happen given a set of assumptions. When you read about games won using the PECOTA, CHONE, or CAIRO projections . . . it is not a prediction. No one is saying that the Orioles will win 79 games. They are saying they are projected to win 79 games. A prediction is an estimate of the actual outcome, a foretelling of a future event or series of events. Does that make sense? Projections are often used within the framework of a prediction, but they really are not synonymous. Anyway, I digress.
I will be using the current Vegas over/under for season wins as a sort of crowd model, while using a composite of ZiPS, CHONE, CAIRO, MARCEL, and PECOTA from the Replacement Level Yankees Blog to represent an projection-based model. First, I will present the current Vegas projections and the current model projections (only CAIRO, so far). After the jump, a few graphs and analysis discussing how 2006-2009 performed for each model.
Current Vegas MLB 2010 OVER/UNDER SEASON WIN TOTALS
In parentheses are the current projected wins based on CAIRO (this will be adjusted when all projection systems are incorporated)
AL East
YANKEES 95.5 (99)
RED SOX 94.5 (95)
RAYS 90 (95)
ORIOLES 76 (71)
BLUE JAYS 72.5 (70)
AL Central
TWINS 84.5 (82)
WHITE SOX 82.5 (87)
TIGERS 78.5 (72)
INDIANS 75.5 (77)
ROYALS 72 (70)
AL West
ANGELS 85 (80)
RANGERS 83.5 (82)
MARINERS 82.5 (80)
A'S 79 (78)
NL East
PHILLIES 92.5 (90)
BRAVES 85.5 (85)
MARLINS 81 (72)
METS 81 (80)
NATIONALS 70.5 (71)
NL Central
CARDINALS 87.5 (91)
CUBS 83.5 (86)
BREWERS 80.5 (81)
REDS 79.5 (84)
ASTROS 73.5 (66)
PIRATES 69 (73)
NL West
DODGERS 85.5 (91)
ROCKIES 84 (83)
GIANTS 82.5 (77)
DIAMONDBACKS 82 (83)
PADRES 72.5 (78)
The first graph shows the predictive ability of both the Vegas and Projection systems for every data point generated from 2006 to 2009. As you can see the R2 for either the Vegas (0.28) or Projection (0.30) systems are pretty much equivalent. In a raw sense, they predict equally well. As most projection systems can predict about 75% of performance, it is understandable that quantitative systems and qualitative systems would be pretty similar.
The next graphic is a table of the standard deviation of the difference between the Vegas/Projection systems against the actual value. It varied from around 6 to 11 during these three years with the Projection system being narrowly more accurate for three of the four year, but not significantly so. A rough estimate is that 95% of all teams will fall within two standard deviations of the mean. So, using the composite standard deviation, a team with 79 wins would range between 64 and 94 wins. That is the realm of possibility. Since 2001, only the 2007 Yankees were able to get the Wild Card with as few as 94 wins. A 79 win team effectively is out of the post season based on these numbers although there is a 1 in 40 chance for a team to defy those odds and win more than 94 games. The only example of this would be the Vegas prediction for the 2008 Tampa Bay Rays.
Another way to evaluate this data in order to evaluate the potential for teams to outproduce their expected wins is to see when systems incorrectly project wins by a large amount. The second graph shows all data points that are greater than one standard deviation off the actual value. As one would expect, there is a line where predicted and actual values cannot exceed. In other words, if your team is predicted to win 90 games then it is very difficult to win more than one standard deviation above that. In fact, the highest prediction on this graph that ever resulted in a somewhat significant underestimation was it has ever happened was 89 wins. Once you hit that level, there really is not much room to break out. It is exceptionally difficult to win more than 97 games. The opposite end is not as much of a hard line as team predicted to win as few as 68 games still underperformed. This may be a result of a dispersal of assets at the trade deadline or an influx of substandard talent that is permitted to play more often in September.
Part II will go a bit more into the data looking at situation where the Vegas and Projection system disagree and whether this disagreement isolates one system as being more accurate than the other.
No comments:
Post a Comment