08 January 2016

The Challenge Of Quantifying Outfield Defense

A few weeks ago, a commentator asked whether the Orioles should consider having Wieters play in the outfield. Jon asked for predictions of Wieters defensive ability as an outfielder via twitter and 54% of respondents thought he’d have a -50 UZR/150. While the Orioles' aren't going to use Wieters in the outfield, this discussion piqued my interest because that is a large amount of runs for a corner outfielder to concede.

UZR seems to suggest that such an outcome is reasonable. Hanley Ramirez was terrible defensively last year in LF and had a -30 UZR/150. Manny Ramirez had a few years where he had a -30 UZR/150 in LF. Unlike these players, Wieters has no experience playing in a corner outfield position and could very possibly struggle to make even simple defensive plays. If the worst corner outfielders are worth -30 runs, then Wieters could be worth -50 runs.

On the other hand, there are a limited number of balls hit to corner outfielders in a game and most of them are either easy or impossible plays. According to Inside Edge Data, only 40 to 50 balls hit at corner outfielders per season are even in question. A defensive outfielder would need to consistently botch the easiest plays that require minimal range to be worth 50 runs below average. Even a slow professional baseball player would be challenged to perform that poorly.

For context, Chris Davis’ offense was ranked at only 40 runs above average and Machado’s offense was ranked at 30 runs above average meaning that having Wieters in the outfield would have the same impact as replacing Machado's offense with a league average third baseman.

Furthermore, Machado is an excellent fielder and is only about 10 runs above average defensively at third base.  Simmons is an excellent shortstop and is usually 20 runs above average defensively. If UZR values the worst corner outfielders as worth -30 runs defensively, does this mean that there's a problem with UZR? It seems difficult to see how an experienced corner outfielder could cost his team that many runs.

Fangraphs argues that WAR works because there is a strong correlation between a team’s total WAR and their actual record. Glenn Dupaul from the Hardball Times argues that if WAR does a correct job at explaining where wins come from, the linear regression equation should have an intercept equal to roughly replacement level (47.7 wins) while each WAR should be worth roughly one win.

When I did this test with data from 2002-2015, the resulting formula returned had an intercept of 47.6 wins with each WAR being worth 1.00093 wins. When I split WAR into pitching and position WAR, the resulting formula was 46.84+.89*Position WAR+1.20*Pitching WAR which has interesting implications when thinking about this article. In any event, there can no debate about whether WAR works even if it could potentially be improved.

This is relevant because UZR is part of the WAR formula. One way to determine whether UZR works is by breaking WAR up into its components and seeing whether UZR is a significant factor for predicting teams’ total wins. I did such a test using a stepwise regression and found that it is significant. It was the third most important factor behind pitching and hitting although considerably less important than either of the other two. This implies that UZRs’ efficacy is limited, but still better than nothing.

The next question is whether UZR works for each position and specifically LF or RF. For this test, I input each teams’ defense score at each position (in order to replicate this it’s necessary to grab the data from the Fielding Tab rather than the Batting Tab at Fangraphs) for each team and season.  For most positions, UZR is helpful. Surprisingly, first base defense seems to have the highest correlation with wins (which possibly implies something about how defense is measured at other positions), while third base and center field data are also helpful. Data from shortstops, catchers, left fielders and right fielders are less helpful but still add some certainty. Surprisingly, second base defense as measured by UZR doesn’t add predictive power to the model.

The last question is which aspects of UZR are relevant for each position. Just because one aspect of UZR is relevant for a given position doesn’t mean that all of the aspects are relevant. In order to test this, I used each teams’ range, error, arm and double play range for each position to see which ones can be used to model wins. None of these factors apply to catchers and therefore this method can’t measure their performance.

My results suggested that range is relevant for first basemen, third basemen, center fielders, shortstops, extremely minimally for second basemen and has no relevance for corner outfielders. This is problematic because range is the largest component to UZR and therefore suggests that defense values for three positions are valued improperly and in a format that doesn't help us predict team wins.

Likewise, errors are only relevant for third basemen and shortstops. The average team’s left fielders have 5.2 errors over a season. The 75th percentile is 7 errors while the 25th percentile is 3 errors. Right fielders average 5.75 errors over an entire season with the 75th percentile having 7 errors and the 25th percentile having 4 errors. Even if an outfield error costs a team .8 runs, that still means errors from left fielders cost most teams about three runs a season and about two runs for right fielders.  Similar stories can be told for center fielders (average=4.8, 75th percentile = 6, 25th percentile = 3), first base (average = 10, 75th percentile = 13, 25th percentile = 7) and second base (average = 13, 75th percentile = 16, 25th percentile = 10).

The only statistic that is relevant for corner outfielders is arm strength. This would work for a player like Wieters as he’d likely struggle with range and errors but has a strong arm. It also means that teams are right if they discount UZR range values for corner outfielders and players like Heyward could be overvalued due to possibly overvalued defensive ratings.

The bottom line is that it is hard for us to accurately measure corner outfield defense and therefore can't determine how much damage a bad defensive outfielder inflicts on his team.


Hot Dog said...

I was one of the people who voted that Weiters would be worth -40 runs in left field. You have done nothing that makes me think I might be wrong. "According to Inside Edge Data, only 40 to 50 balls hit at corner outfielders per season are even in question." What does this even mean? A surgeon will tell you different types of surgery have different levels of risk. That said, you get a family doctor to do that surgery and what you think is a risk-free surgery suddenly becomes very risky. You cannot take the Inside Edge data at face value because you are applying it to a situation that it is not designed to address.

Wieters is big, slow, and has not seen anything other than a couple games at first base since little league. Hanley Ramirez had some trouble, but he has speed to make up for many of his problems with taking routes to the ball. How many balls are hit to left field in a year? 800-1000? I have trouble thinking that Matt Wieters would have trouble with only 50 of those balls.

You also confuse relevant with required. What may separate corner outfielders is the value of their arms, it does not mean the other attributes are not important. Hanley Ramirez has a strong arm. A very strong arm. The article disagrees with itself. I like what you guys do here, but sometimes you all get lost in your spreadsheets and need to look up at the actual game being played.

Jon Shepherd said...

I would not say that the article is without merit. I think it does raise a good point about what kind of skills might separate outfield talent and how that relates to overall winnings. That said, one great challenge of all of this is that the metrics are tied around who plays the positions. Management already weeds out players it finds unacceptable for certain positions. It may well be that Wieters is unlike left fielders and that metrics designed to describe left fielders may poorly describe non-left fielders.

Matt Perez said...

"I like what you guys do here, but sometimes you all get lost in your spreadsheets and need to look up at the actual game being played."

You're seriously arguing that I should look at the actual game being played to attempt to measure a situation that would never happen in a real game? Is a player like Wieters ever going to play in the outfield? Don't you realize the only way to possibly judge this situation would be to "look in spreadsheets"? Well, maybe a scout could answer it.

"You have done nothing that makes me think I might be wrong."

I have no opinion whether you're right or wrong about Wieters. The only things I know is that -50 runs is a large number and that using UZR to gauge his value is a mistake. If UZR doesn't work then Wieters (presuming other metrics also fail) could be worth -10 or -100 runs. Of course, the same goes for other corner outfielders also.

To a large extent, you're thinking that this article is making a point that simply never was intended.

"How many balls are hit to left field in a year? 800-1000? I have trouble thinking that Matt Wieters would have trouble with only 50 of those balls."

I was presuming familiarity with Inside Edge data. It's probably closer to 500 or 600. The vast majority of balls (practically all of them) fall into one of three categories.

The first category doesn't show up on Inside Edge. Basically, suppose a guy hits a ball between left and center field. This ball is uncatchable and can't be credited to either the left or center fielder. Therefore, the data is omitted. Needless to say, Wieters and no one else will catch these balls.

The second category are labeled impossible. These can be credited to a fielder but no one is going to catch them. Wieters won't catch these balls but neither will anyone else. It's not that he won't have trouble with them, it's that you can replace him with anyone and they won't catch them.

The third category are labeled routine. These are easy plays and typically converted at a 99% success rate. The question to judging Wieters defense is whether he'll convert these easy plays at a 99% success rate, a 97-98% success rate or a 60% success rate. Realistically, any outfielder that will actually play in the corners will have at least a 98% success rate or will play at a different situation.

The other categories are the balls in question. Presumably, an outfielder that will stick at a position will convert at least some of these plays.

Philip said...

What is the most valuable skill of an outfielder? Eye-choosing routes to the ball, Speed-covering ground, or arm strength/accuracy.
I'm no expert but I think speed is far and away the most important quality.
And Wieters has none. He Lumbers, he Trundles, he does not Dash.

Matt Perez said...

Phillip - Speed/Routes are probably the most important.

But the real question is whether any of these skills actually have much of an impact. Outfield defense should be given one weight if it's potentially worth 60 runs (on a -30 to 30 scale) as UZR suggests and another if it's potentially worth 20 runs (on a -10 to 10 scale).

If the public considers outfield defense to be worth -70 runs and teams consider it to be worth -20 runs, then there's going to be a disagreement about the proper emphasis to place on defense.

Jon - If corner outfield UZR doesn't have a relationship to overall winnings, then it's meaningless. WAR works because it describes the relationship between performance and wins.

Jon Shepherd said...

Matt - Well, I have issues with some of your statements. A lack of significance does not mean that something is without meaning. A variable may lack statistical significance, but still improve the accuracy of a model. That is part of the nitty, gritty dirty nature of variable co-dependencies that we often see. As such, it can be difficult to tease apart confounding factors. We can certainly say that given the population and the metrics at hand certain aspects appear to drive the relationship, but I think it is premature to say the converse.

I also think the underlying nature of the statistic probably means more consideration. What does it mean to miss a ball, runwise? What really is a catchable ball? How does the definition of replacement level player as it reflects defensive ability impact WAR, in general? How does the year-in sample size issue impact UZR and, in turn, WAR? How does positional alignment impact individual metrics? Without going into this on a cursory level, I think the safe thing we can say is that UZR is noisy and that it has some trouble being as clearly related to wins as better established metrics like pitching and batting metrics.

This brings us back to whether a foggy or scratched spyglass has any utility. Issuing ultimates like "meaningless" just do not seem justified yet.

Matt Perez said...

I just have a few minutes and probably won't have a chance to respond until after the weekend.

Sure, I could have included some more of the analysis that I did on the subject. I felt that it would have been overkill but it seems you feel differently.

I think UZR is general is noisy but unquestionably has utility. I'm less convinced when it comes to corner outfield UZR (specifically range and error). I would also note that I'm not arguing that corner outfield defense has no impact on wins. Certainly a good outfielder has some impact even if I'm not convinced we know how to quantify that number.

Anonymous said...

There is so much to discuss and comment on here that I cannot keep it all straight in my head. Basically, I side with Matt on this because the statistics back up what you should expect from common sense. For example, infielders should get more chances than outfielders because the ball has to go a shorter distance to get to them. 2nd base is less important than shortstop because there are fewer LH hitters. Are ground balls to the outfield counted as routine or impossible? Either way, they are hits.... The only reason you need an "arm" in RF is to throw to 3rd. Except RF get fewer chances than LF because there are fewer RH hitters. Elsewise, RF and LF are exactly the same. So one would want speed in LF and arm in RF. To compensate for range, one would tend to shift the OF so speedier players have more area to cover. Wieters would make a perfectly serviceable RF as well as Davis. I think the difference between them and Hayward is not as large as often stated. 30-50 DRS is an outrageous number - an idea borne out by Matt's statistics. Isn't arm strength's importance reduced by cutoff men? Heyward's low .200s spring batting averages hurt a team more than his defense saves. Davis' tears (and J Upt's for that matter) can do more to carry a team for a short time than any measure of defense which doesn't vary much from day to day. Atlanta rode an Upton tear in APRIL in 2013 to a division championship they could not blow no matter how they tried. The O's rode Davis' bat to respectability in September last year. Machado's and Hardy's defense is vastly more important than Jones'.

Jon Shepherd said...

It is not pure chances always. A muffed play in the outfield is worse than a miffed play in the infield. Likewise...a miffed play by the 3B is worse than one by the SS. Where the ball goes matters quite a bit.

Anonymous said...

Matt's statistics are telling us how much it matters.

Jon Shepherd said...

Or, more accurarely, how well this methodology can discern significance.

Jon Shepherd said...

Just estimating here...but I would imagine that secondary power would largely come up as not being related to winning percentage. Second, we might see secondary power as important for some positions rather than others because of selection bias.

It reminds me when I was breaking apart performance of hitters by batting position. The final model found it statistically significant that speed related measures were negative for a lead off man. I provided some conjecture why this failed the sniff test and my conclusion was that teams value speed so highly for a lead off spot that it appears to be a significant variable to negatively impact a lineup. At face value, that is likely true as it might show a major issue in team decision making. However, it does not mean speed is bad even though it directly suggested that.

Matt Perez said...

My stats tell us nothing about how many runs Wieters would cost his club. I just think that an actual corner outfielder costing his team 30 runs is awfully high and that Wieters wouldn't cost his team 50 runs unless he showed an inability to convert more than 98% of routine plays (average is 99%+).

It makes complete sense that a player with Billy Hamilton's offensive numbers wouldn't be able to start unless they also had his speed although I'd be curious to know whether you included stats like stolen bases in your analysis. If, after taking speed and defense into account, the numbers were still negative then this would suggest that teams are overvaluing speed and should focus more on hitting ability.

Fortunately for me, the case I'm focusing on is relatively simple. Fangraphs has already defined my variables. All I need to do is the proper testing to ensure relevance. Your situation, where you need to actually define the relevant variables yourself, is far more complex.

Anonymous said...

Leadoff is another good subject for this kind of analysis. I really do think speed is overvalued at leadoff. There should be no question that OBP is the primary statistic for leadoff. A primo example of this (admittedly anecdotal) is the case of Markakis at leadoff on a very successful O's offense (and to a lesser degree Machado). He seemed out of place but he was always on base for Machado, Davis, and Jones. I also agree, again, with the assessment that 30-50 DRS (or lost) is an awfully high number for a corner OF of any sort. Besides, Buck is likely to rotate players a lot in ways where their defense does the least damage. I don't really think Cruz in LF or Davis in RF cost the O's much in runs. Rotating Wieters, Trumbo, Davis, Paredes through DH/RF/1B and C (in Wieters' case) is not going to result in defensive disaster at any of those positions. The O's could still pick up an Austin Jackson on the cheap or use Joey Rickard - these would be solid additional options much better than last year. And Reimold has potential to be a Markakis-type leadoff guy - good OBP and not much else.

Anonymous said...

I'm the reader who raised the idea of moving Wieters to LF (but only as a last resort if all else fails)... and I must say that you've done quite a bit of extra work for a literally-out-of-left-field suggestion that will never be adopted in reality. This analysis does raise an interesting question for GMs, though:

* To what extent can elite hitters (which definitely does not include Wieters) be shoehorned into positions that they can't play? For example, how well does a corner outfielder need to hit for the 50-run defensive penalty to drop him to about +2.0 WAR?

* As a corollary, can any team afford to take this idea to its logical conclusion-- say, by putting nine 1B/DH types into their starting lineup and forgoing defense completely?

Matt Perez said...

It's more a question for analysts honestly. Lots of analysts think that a player like Lough has the potential to be an average LF despite his inability to hit due to his defense. Many GMs tend to differ. If you believe that UZR accurately depicts defensive value, then this supports the analysts view. If UZR doesn't, then this supports the GMs view.

Mike Trout was worth 55 runs offensively last year. If his defense was valued at -50 runs below average in a corner, then he'd on pace to be about a 2 win player in a corner (after taking positional adjustments into account). So, we're talking something like a .300/.400/.590 line.

It is unlikely that no team would take this idea to its logical conclusion unless they want to lose 100 games.

Anonymous said...

Re: Matt Perez

Thanks for the response-- the "you can't forgo defense completely even in LF" conclusion is pretty much what people would expect from common sense anyway.

As a follow-up question, what would the minimum offensive output required for +2.0 WAR look like if the defensive penalty is more realistic, say -20 runs (e.g. Miguel Cabrera at 3B in 2013) or -30 runs (your Hanley Ramirez example)?

Alternatively, what if we're talking about a -50 UZR/150 defender at 2B/SS? (Theoretically, this information could be useful for teams that are stuck with a very poor hitter at a position where defense is traditionally valued...)

Matt Perez said...

In general, ten runs is worth roughly 1 win or 1 WAR.

Replacement Value is +2 WAR.
Position Value is based on the position and can be found via Fangraphs.

That means in your first case, Wieters would have a -7.5 position penalty for playing corner outfield, a +20 replacement bonus, roughly a -5 base running score (he's slow) and maybe a +2 league score. If he was a -20 defender, then he'd need to be a +30 hitter to be worth 2 WAR or do slightly better than Machado did last year .286/.360/.502.

If he was a -30 defender, he'd have to be a +40 hitter or put up a similar line to Davis's .260/.360/.560.

In your second case, presuming he was a SS, he'd get a 7.5 position bonus, a +20 replacement bonus, I'll presume a +0 base running bonus and a +2 league score. Figure he's at +30. A -50 defender would need to put up Chris Davis numbers to be worth 2 WAR.