Camden Depot: Why WAR-Based Systems Underestimate Elite Relievers

20 April 2017

Why WAR-Based Systems Underestimate Elite Relievers

Posted by Matt Perez

People using WAR based systems to value players typically think that relievers are overvalued by baseball clubs. Way back in 2010, an article printed on Fangraphs made the following claim:

WAR, as you probably know, doesn’t think much of relief pitchers. The very best relievers in the game are generally worth +2 to +2.5 wins over a full season, or about the same as an average everyday player. This has caused quite a few people to state that WAR doesn’t work for relievers, because the results of the metric don’t match what they believe to be true about relief pitcher value. I think it works just fine.
While the quality of their work is very high, the quantity is low, which limits their total value. It’s nearly impossible to rack up huge win values while facing less than 300 batters per season. Yes, each of those batters faced are more critical to a win than a regular batter faced, but this is accounted for in WAR.

Since that article, there appears to be a consensus that the WAR based systems still seem to undervalue relievers compared to actual baseball clubs, but people are now trying to understand the reasoning for that gap. Ron Arthur of 538 noted that teams with a good bullpen or more likely to hold a one-run lead and therefore made a hypothesis that this may be a reason why relievers seem to be overvalued by front offices relative to the sabermetric consensus. An article in Baseball Prospectus asked why teams seem to be willing to spend millions on relievers when WAR(P) tells them that spending on relievers is a mistake. The author argues that WAR(P) doesn’t do a good job measuring reliever value and that Win Probability Added (WPA) helps explain teams reasoning. Finally, a more recent article in Fangraphs notes that there’s a sizable and growing gap between the public’s valuation of elite relief arms and the industry’s valuation.

I think the reason why WAR-based systems struggle to quantify the value of relievers is because of a focus on value over replacement rather than value over average. If one focuses solely on value over replacement, than a 1 WAR starter who pitches 200 innings is just as valuable as a 1 WAR reliever who pitches 60 innings. In this scenario, since a replacement player by definition produces 0 WAR, each player contributed 1 WAR more than a replacement player. And since a replacement level starter is better than a replacement level reliever, it makes sense to argue that the 1 WAR starter is even more valuable than the 1 WAR reliever.

However, the results are different if one focuses on value over average. Suppose there are two pitchers on two teams; one pitcher produces 2 WAR over 60 innings and another that produces 2 WAR over 200 innings. If we presume that each team’s pitchers throw 1450 innings total and that the average team produces 14.3 pitching WAR, then the team with the pitcher that threw 60 innings needs to earn 12.3 WAR over 1390 innings to be average while the other team needs to earn 12.3 WAR over 1250 innings to be average. This example illustrates how two pitchers can earn the same amount of WAR, but one pitcher can help his team more than the other pitcher.

Focusing on value over average is probably a better strategy than focusing on value over replacement. Each year, pitchers earn roughly 430 fWAR split out among all 30 teams. From 2000-2016, there has been only one team with negative pitching WAR. Ultimately, most teams have a large population of pitchers that are above replacement and therefore need to consider not only how best to maximize WAR but to minimize innings pitched. Innings are a finite constraint that aren’t given enough consideration in WAR-based systems.

Recently, maybe even this week, Fangraphs updated its methodology for how to determine pitching WAR. It’s complicated but a simple and incomplete version of their method is that they determine Wins Per Game Above Replacement (WPGAR), multiply this by the innings pitched and then throw in a few more adjustments. For our purposes, the only relevant fact is that their methodology focuses on replacement rather than average.

In order to determine a Wins Above Average metric, I determined that on average from 2010-2016, an average pitcher earns 1 WAR per 101 innings. Therefore, to determine Wins Above Average, I take a pitchers WAR and subtract from it (Innings/101). For a more complicated Wins Above Average metric, I’d use Fangraphs methodology for determining wins above average and see whether our numbers are similar, but I didn’t learn about their updated methodology in enough time to perform the necessary calculations.

This metric is friendlier towards elite relievers than Wins Above Replacement. For example, from 2010-2016, 2016 Zach Britton ranks 402nd in WAR but 184th in Wins Above Average.
There aren’t so many surprises in the top ten pitchers. The absolutely amazing Clayton Kershaw is ranked #1 despite throwing only 149 innings. Rich Hill ranks 10th with a strong 110 innings. But in the next top 10 pitchers, there are relief pitchers Jansen, Miller and Betances. Chapman is ranked 21st and Britton is ranked 25th.

I built a basic model converting Wins Above Average to projected salary to see how elite reliever salaries might compare with this metric instead of using Wins Above Replacement. The model could use some improvement and isn’t ready for primetime, but it seemed to indicate that relievers might be 30-40% more valuable using Wins Above Average instead of Wins Above Replacement. If so, this could help explain why elite relievers are receiving higher salaries than WAR-based systems suggest.

WAR-based systems presume that players replacing other players are only at replacement level. This presumption means that relievers that can earn 2 WAR while throwing in just sixty innings are valued equally to starters that earn 2 WAR if they throw in 200 innings. Until WAR-based systems can find a way to take production over a limited period into account, it seems plausible that they will continue to underestimate the value of elite relievers.

1 comment:

Anonymous said...: "WAR, uhhuh yah huh, what is it good for, a absolutely nothin', say it again!"; April 20, 2017 at 10:30 PM

Post a Comment

Contributors

Jon Shepherd - Founder/Editor
@CamdenDepot
Started Camden Depot in the summer of 2007. By day, a toxicologist and by night a baseball analyst. His work is largely located on this site, but may pop up over at places like ESPN or Baseball Prospectus.

Matt Kremnitzer - Assistant Editor
@mattkremnitzer
Matt joined Camden Depot in early 2013. His work has been featured on ESPN SweetSpot and MASNsports.com.

Patrick Dougherty - Writer
@pjd0014
Patrick joined Camden Depot in the fall of 2015, following two years writing for Baltimore Sports & Life. He is interested in data analysis and forecasting, and cultivates those skills with analysis aimed at improving the performance of the Orioles (should they ever listen).

Nate Delong - Writer
@OriolesPG
Nate created and wrote for Orioles Proving Ground prior to joining Camden Depot in the middle of 2013. His baseball resume includes working as a scorer for Baseball Info Solutions and as a Video Intern for the Baltimore Orioles. His actual resume is much less interesting.

Matt Perez - Writer
@FanOfLaundry
Matt joined Camden Depot after the 2013 season. He is a data analyst/programmer in his day job and uses those skills to write about the Orioles and other baseball related topics.

Joe Reisel - Writer
Joe has followed the Norfolk Tides now for 20 seasons. He currently serves as a Tides GameDay datacaster for milb.com and as a scorer for Baseball Info Solutions (BIS). He is computer programmer/analyst by day.

Joe Wantz - Writer
Joe is a baseball and Orioles fanatic. In his spare time, he got his PhD in political science and works in data and analytics in Washington DC.

Statistics Cheat Sheet (2016 percentiles)

Batters (Q)
	OBP	ISO	wOBA	UZR
90th	.385	.252	.383	12
70th	.358	.217	.354	4
50th	.340	.180	.340	0
30th	.321	.158	.326	-5
10th	.304	.118	.304	-10
SP (Q)
	FIP	GB%	K/9	BB/9
90th	3.30	53	10.0	1.8
70th	3.76	48	8.7	2.3
50th	3.95	44	7.8	2.6
30th	4.27	41	7.4	3.1
10th	4.88	34	6.6	3.7