11 September 2014

The Problem With WAR

Jeff Passan recently wrote an article about WAR that sparked some major debates on Twitter. In response, Dave Cameron wrote this article. Reading these articles got me thinking about WAR and one of the major issues with it.

When I was a kid I didn’t know about WAR, wRC+, ISO, wOBA or other advanced stats that quantify a players offensive contributions. I’m not even sure I was familiar with OBP, SLG or OPS. I judged a players’ offensive ability based on his batting average, home runs and RBIs. If I really wanted to look closely at a player I would maybe consider how many doubles and triples he hit as well as how many times he walked and struck out. 

By themselves these stats have limited utility but they are still useful. Batting average does give a basic idea of how often a player gets on base even if it's not as good as OBP. Home Runs and RBI do indicate how much power a player has even if they're not as useful as ISO or even SLG. Most people would have agreed back than that a home run is more valuable than a triple, a triple more valuable than a double, a double more valuable than a single and a single more valuable than a walk.

Due to the creation of these basic statistics one understands the necessity of comparing different facets of offense. For example, is a player batting .260/30/80 more valuable than a player batting .310/15/50? You can't really answer the question with those tools and in fact the answer doesn't really matter. What’s important is that these basic statistics allow us to ask the question. It becomes clear that the number of hits as well as the quality of hits matter.

The fact that these basic offensive statistics exist means that it’s easier for us to understand more advanced offensive statistics. Take SLG for example. It makes sense that certain hits are more valuable than others. And it makes logical sense that a home run that is worth four bases is four times the value of a single that is worth one base. Basic offensive statistics allow us to build that model. Once we start assigning arbitrary values to home runs, triples, doubles, singles, walks etc it becomes easier to understand how we can assign values to them based on historical data.

Furthermore, most basic offensive statistics are easy to define. There’s little difficulty in defining when a player hits a double. It’s reasonably straightforward because either the batter is on second or he isn’t. One can argue about the value of a double but in the vast majority of cases it’s hard to argue whether or not a double occurred.

Decades of statistics has accustomed us to quantifying the value of offensive production as well as give us easily defined and understood tools to do so.

But basic statistics aren’t nearly as helpful quantifying defensive contributions. The only basic defensive statistics are errors, putouts, assists and fielding percentage. Putouts and assists are fairly straightforward but errors are often subjective. Basic defensive statistics aren’t as objective as basic offensive statistics.

Furthermore, you can’t really use fielding percentage to compare two players playing the same position let alone compare players playing at different positions. Fielding percentage simply doesn't quantify range. It doesn’t quantify whether one fielder made a lot of excellent plays or a few excellent plays. I would say that it’s similar to batting average. Batting average is a helpful basic statistical stat but one wouldn’t use it by itself to quantify offensive contributions.

All I’m saying is that I’d feel far more comfortable claiming that a player with a .330/30/120 line is better offensively than a player with a .210/15/50 line than I would claiming a player with a .985 fielding percentage is better defensively than a player with a .975 fielding percentage.

Basic defensive statistics tell us very little about defense. It was pretty much impossible for the casual fan to objectively quantify defense prior to advanced defensive statistics for large populations of players. After all, most casual fans simply aren’t watching a majority of the games for a majority of teams. It would be pretty time consuming.

This means that advanced statistics like UZR were really the first attempts to actually objectively quantify defense. The problem is that basic defensive statistics don’t give people a frame of reference. It’s hard to explain why defense is as valuable as it is considered by UZR because it has never been quantified before. People watching games probably would agree that Cal Ripken is a good defender. No one watching games would have claimed that Cal Ripken’s glove is worth 15 runs a season and seriously meant exactly 15 runs. And without using an advanced statistic like UZR it would be hard to tell whether being worth 15 runs defensively is good or bad.

This makes explaining defensive statistics a challenge because people don't really have the background to understand them. The way to explain them is by understanding that the concept is difficult and by openly disclosing all of the data and the formulas. The concept behind UZR has been explained as comparing the play that actually happened (hit/out/error) to data on similarly hit balls in the past to determine how much better or worse the fielder did than the "average" player.

But the public isn’t informed of how likely it is for a fielder to successfully field a certain play. For a given player I can see whether he was successful at any given at bat. I can’t tell whether a player was successful defensively on any given play. If a ball is hit to the outfield I don't know how likely it is that a different fielder would have made the play according to UZR. Unlike with offense, there is no play index for UZR. It is impossible to determine a fielders’ UZR when at home or away. It is impossible to determine a fielders’ UZR for a given month. In short, UZR and most of the rest of advanced defensive metrics pretty much boil down to one number for a given player and you can either take it or leave it. They may be the best numbers that we have available. But we’re pretty much taking their creators word that they’re accurate. It shouldn’t come as a surprise that many people simply choose to leave it.

The problem with WAR is that it relies on defensive statistics that haven't been adequately explained. Unlike statistics like FIP and WAR it is impossible to derive UZR numbers on our own. It doesn't really matter whether UZR is accurate or not. The point is that we have no background to determine its accuracy and it is impossible to test it ourselves.
The discussion about WAR has nothing to do with whether position players deserve 57% of WAR or 52% of WAR. It isn't about whether pitchers are responsible for 93% of run preventation. It's about the fact that defensive metrics aren't adequately available to the public for study.

Without further disclosure people are going to resist accepting advanced defensive metrics and as a result are going to question WAR. If the public is unable to repeat the methodology then these metrics have similarities to opinion. People aren’t going to fully trust a metric that can’t be fully understood or duplicated and simply shouldn't be asked to do so.


Anonymous said...

The population of baseball fans who understand WAR is in the minority. The population of those same individuals who have a basic understanding of how UZR, DRS, etc. are measured are even less present. That doesn't mean that WAR is flawed. I don't understand the entire science behind how every piece of my car works, but that doesn't make my car defective.
Currently, there is no conceivable means of measuring defensive impact with great statistical precision. Sources citing defensive metrics often caution that the numbers are subject to some controversy. To discredit WAR, in its entirety, as a result of this shortcoming is naive.

Matt Perez said...

Fully agree that even though there's a problem with WAR it doesn't mean the stat is wrong, flawed or should be discredited.

Pat Holden said...

I don't think the car analogy fairly represent Matt's point. His point isn't that he doesn't understand UZR, as you say you don't about your car, but that the information to fully understand UZR isn't available to us.

FearItself said...

Whether the aspect of advanced defensive metrics you describe can be called a "flaw" depends on what you consider the function of those metrics. If they're intended (1) primarily as a way for professional insiders (GMs, managers, journalists) to objectively assess player value, it's not a flaw (as long as the metrics are accurate). If they're intended to also (2) appeal to hardcore fans willing and able to understand how they're derived (like many of this blog's readers and most/all of its writers), and as a signalling mechanism to build a sense of community among those hardcore fans, then it's not a flaw.
If they're intended to also (3) enhance the casual baseball fan's appreciation of the game, then it is a flaw. In this last case, the fans in question probably need to not only know how the metrics are derived, but be both able AND WILLING to do it themselves (at least a few times) to buy into the reality and importance of the metrics. Frankly, that doesn't seem all that likely. How much of a problem is that, though? Not every fan engages the game in the same way, and advanced metrics don't have to make sense to everyone in order to be valuable for purposes (1) and (2). (I speak as a fan with only a marginal interest in such metrics myself; I'm interested enough to read an enjoy the posts here, but not enough to want to learn how to write one.)

Jon Shepherd said...

I would agree that this is not a flaw of that metric.

It is more a limitation to public acceptance of the metric.

I think the diction difference there is minor, but quite important.

Matt Perez said...

The word "flaw" was not used in this post. It is a problem that UZR and other advanced defensive metrics are "closed-source". I agree that doesn't mean they're flawed. Jon had a good summary of my point.

It is possible to understand the theory behind UZR. It is impossible to understand how it works practically.

The importance is huge. It is possible to explain stats like OPS or wRC+ to the average fan that is willing to listen. How are you possibly supposed to explain why a player has such and such UZR?

You can discuss the theory but in the final analysis you have to say that we use UZR numbers or whatever because they're the best numbers available even though they could have been created in a random-number generator for all we can prove. And if that's the case, how can you possibly know whether they're right? Or whether there isn't something like chaining for fielding?