03 December 2015

Why ERA Doesn't Predict The Future

There is a recurrent debate about the value of ERA. Recently, it flared up when Ryan Pollack wrote about Miguel Gonzalez. Ryan argued that Miguel performed as expected because his FIP and xFIP were within his career norms. Others argued that Miguel Gonzalez underperformed this season because his ERA was so much worse than in previous seasons. They note that he had a strong ERA in the first half of the season and was bad in the second half of the season due to injuries.

On the one hand, a starter with a good ERA is generally successful and therefore this stat quantifies past performance. On the other hand, year-to-year ERA has low correlation and therefore a good ERA has little predictive value. Correlation, while important, is only a single number and doesn’t fully elucidate how drastically ERA changes from year to year. More analysis is necessary to comprehend its relevance to the future.

In order to test ERAs volatility, I developed a dataset consisting of each starter’s ERA and innings pitched from 1998 to 2015. For each season, I parse these starters into eight groups ensuring that the pitchers in each group threw the same amount of innings while ranked by ERA so that the most successful pitchers were in the first group while the least successful pitchers were in the eighth and last group. I use eight groups instead of five to more readily mimic real life because teams are forced to use substandard options when a member of their original starting rotation suffers an injury or performs unsatisfactorily. These options wouldn’t ideally be in the rotation and therefore shouldn’t be considered #5 starters. Finally, I take all starters that threw at least 100 innings as a starter in a given season and merge with how they performed in the following season provided they threw 100 innings the following season. Pitchers that throw fewer than 100 innings as a starter in the next season are considered not to qualify.

The results from 1998 to 2015 indicate that on average pitchers in the top groups regress to the mean while qualified pitchers in the lower groups progress to the mean. The difference between pitchers in the first and second group is 6.4 runs, between the second and third group is 5 runs and between the third and fourth group is 4.6 runs over 180 innings in each case. There is minimal difference between qualified pitchers in the fourth through eighth groups.  The chart below illustrates performance for these groups.





Qualified starters in the top group have an average ranking of 2.84 in the following year while qualified starters in the second group have an average ranking of 3.54. Qualified starters in the third through sixth as well as eighth group have an average ranking between 4.2 and 4.9 while qualified starters in the seventh group have an average ranking of 5.21. In short, as demonstrated by the second chart, there is little difference between qualified starters in the third through eighth grouping.






ERA does seem to be relevant in determining which starters are most likely to throw 100 innings in the following season. Over 70% of starters ranked in the top group remain in one of the first five groups in the following year which means that they’re still worth having in the rotation even if their performance is disappointing. That number drops to roughly 66% for starters in the second group, slightly more than 50% for starters in the third group, roughly 40% for starters in the fourth through sixth group, around 30% for the seventh group and 20% for the eighth group. This suggests that starters that do poorly in the previous year generally don’t receive an opportunity to throw 100 innings the following year if they struggle.

This becomes even clearer when looking just at pitchers that do throw 100 innings in both seasons. About 85% of #1 starters and 78% of #2 starters are in one of the first five groups. It drops to about 65% for #3 to #6 starters and #50% for #7 to #8 starters. Better pitchers have a higher success rate but only by a limited amount.




This next chart shows how pitchers in each group perform when we disregard the inning limit for the following season. As discussed above, it seems reasonable that successful pitchers are more likely to throw 100 innings then non-successful pitchers and therefore looking at starters that threw 100 innings the following season may be a mistake. This chart indicates that there’s minimal difference between a #4, #5 or #6 starter and only a small difference between these starters and starters ranked either #7 or #8.  Over a 180 inning period, the difference between the average #4 starter and the average #8 starter is 5.6 runs or about $4 million.





All in all, this shows that using ERA to project future performance has limited utility because only the best pitchers have better ERAs in the following year than any other pitchers. Such an analysis indicates that #1 and #2 starters should be highly valued because they’ll probably perform well for at least a year or two in the future but that there’s little difference between back-end starters. ERA is able to predict which pitchers will be more likely to throw a considerable amount of innings in future years as a starter.

This further suggests that pitcher performance is highly variable. One shouldn’t expect a pitcher with the skill level to be a #5 starter to have that performance in a given year. Indeed, that happens only about 10% of the time from 1998-2015. A #5 starter that actually throws 100 innings in a season is more likely to regress to the mean and improve in the following year. Such a starter is also just as likely to perform as well as a #2 starter as he is to perform as a #7 starter. A #1 starter performs like a #1 starter the next year only 28% of the time.  It’s very possible, and should perhaps be expected for decent starters to either have good or bad pitching performances.

It’s worth noting that this has been the experience for the Orioles since 2012. By this metric, the Orioles have had three starter seasons ranked in groups 1, 2 or 3 and those are Hammel in 2012, Miguel Gonzalez in 2014 and Chris Tillman in 2014. All three of these pitchers had an ERA around 5 in the following season and were ranked in the seventh group.

If teams do use ERA to predict future performance, then they should pay high premiums to get elite arms. Failing that, they should look at other metrics to see whether they can find some statistics with better results.

1 comment:

Anonymous said...

Recently non-tendered folks of interest include Henderson Alvarez. Seems like he might make a good cheap addition to the rotation if he can get healthy. Whaddya' think?