14 January 2015

Orioles' Defense Isn't as Good as You Might Think

On Monday, Ryan wrote an article discussing how the Orioles' pitching isn't as good as you think. Commenters reasonably argued that FIP consistently penalizes teams with good defense and therefore using that metric will underestimate the Orioles' pitching. However, this does beg a few questions. Do teams with good defenses usually have an ERA lower than what their FIP suggests? If so, how large is the average impact? Steamer projects the Orioles' pitching staff to have an ERA of 4.04 and a FIP of 4.30, or that the O's defense will prevent 42 runs over an entire season. Does this underestimate or overestimate the projected impact of the defense?

In order to answer this question, I looked at each team from 2000-2014 and determined their ERA, FIP, and fielding score defined by Fangraphs. I then split them into quintiles based on fielding score. I didn’t use Fangraphs' defense metric because that punishes AL teams for having a DH. For this kind of analysis it simply isn’t as accurate as the fielding score metric. I also included the 2014 Orioles as their own special category to see whether they were an outlier. This chart shows the results:


Group ERA FIP E-F Difference Between ERA and FIP Fielding
1_Worst 4.55 4.32 0.24 38.32 -48.64
2_Bad 4.36 4.28 0.08 12.74 -17.05
3_Average 4.29 4.25 0.04 7.20 -0.27
4_Good 4.03 4.17 -0.14 -22.84 18.92
5_Best 4.08 4.29 -0.21 -34.69 47.93
2014 Os 3.44 3.96 -0.52 -84.24 56.40

There does appear to be a relationship between teams’ fielding scores and whether or not they do better than their FIP suggests. The teams with the worst defense had an ERA about .24 points larger than their FIP, which meant they allowed over 38 runs per year more than their FIP suggested. Teams with the best defenses had a FIP that was nearly .21 more than their ERA, which meant they allowed nearly 35 fewer runs than their FIP suggests. Fangraphs' fielding metric successfully predicts which teams will do better or worse than their FIP.

However, one would expect the difference between ERA and FIP to be similar to the average fielding score for each group.  As the number of teams sampled increases, it should be expected that potential issues such as sequencing and luck are less of a relevant factor. However, as the fielding score increases (whether negative or positive) the difference between it and a teams’ ERA-FIP becomes more pronounced. For example, the worst clubs have an average fielding score of -49 runs but the difference between their ERA and FIP is only .24 runs or about -38 runs. The best clubs have an average fielding score of 48 runs but a difference between ERA and FIP of only .21 runs or about 35 runs. This suggests that either the Fangraphs' fielding metric inflates the value of defense or that defense has diminishing returns.

The 2014 Orioles had a stunning .52 run difference between FIP and ERA. This is more than twice as high as the difference between FIP and ERA for teams that have even the best fielding scores. In fact, the 2014 Orioles had the third-largest difference between ERA and FIP from 2005-2014. This suggests that the Orioles' defense could be elite in 2015 and still wouldn’t be expected to outperform their FIP by such a drastic amount. The Orioles' defense does explain why they are better than their FIP but not why they were better by nearly 85 runs. It seems that the difference should be closer to 40 runs.

The other test that I did was model the difference using a regression between ERA and FIP for each team from 2000 to 2014 based on Fangraphs' fielding metric.  My results were statistically significant with an R^2 of .3836 or an R of .6194. When I tested all data from 1935 to 2014, my results were statistically significant with an R^2 of .4365 (R of .66).  These are moderate to high correlations and suggest that Fangraphs' fielding metric can be used to predict which pitching staffs will do better than their FIPs suggest.  However, it also suggests that there are other relevant variables and that just using Fangraphs' fielding metric may not consider all relevant factors.  Furthermore, it is highly unlikely that a sample consisting of nearly 2,000 seasons would see an impact from uncontrollable factors such as hit sequencing. As Beyond the Box Score notes, it is likely that disparities between ERA and FIP could be impacted by pitching performance as well as fielding. This could potentially further explain why the 2014 Orioles had such a large difference between their ERA and FIP and could potentially suggest even more inflation in Fangraphs' fielding metric.

Having an excellent defense means that pitchers will give up potentially 30 to 40 runs fewer than their FIP suggests, which is roughly the impact that Steamer projects. This doesn’t explain why the 2014 Orioles were able to allow 85 fewer runs than their FIP suggests, and suggests that unless other factors can explain this discrepancy we should expect significant regression in 2015.

32 comments:

  1. Statistics Don't LieJanuary 14, 2015 at 9:56 AM

    Nice analysis. The basic premise of defense justifying higher FIP than actual ERA holds up, but the 2014 Orioles are a statistical outlier on the scale being used.

    The judgment factor here is whether the analysis is pointing to a presently undefined synergy captured in last year's Os which could be repeated, or that some sort of regression is inevitable. Also, if team defensive data is in some way deficient, perhaps the conclusions could change?

    I think back to the Os historic bullpen of 2012, which reverted in 2013, but was still pretty decent in 2014. Could it be that the Os FIP-ERA differential will again be skewed far to the right of average in 2015, just not as extreme as 2014?

    ReplyDelete
  2. Yet, it is possible to have more playing time from Manny and a better outfield defense. While the luck factor may decrease, disappear or reverse, the skill factor could improve taking some bite out of the regression. If Gausman makes some serious improvements in his third pitch, and Bundy finds a way to legitimately make the big leagues, the total picture could be rosier in August/September.

    ReplyDelete
  3. Not to mention, the possibility of a total turn-around from Ubaldo, which while too much to ask for, is no less likely than his total collapse.

    ReplyDelete
  4. "The judgment factor here is whether the analysis is pointing to a presently undefined synergy captured in last year's Os which could be repeated, or that some sort of regression is inevitable. Also, if team defensive data is in some way deficient, perhaps the conclusions could change?"

    A .52 difference between ERA and FIP is the 51st best over the past sixty years. There have been eight times in the last sixty years where a team has overperformed their FIP to that extent for two consecutive years (and zero for three straight years). Logic indicates that there's consider luck involved to overachieve by that extent even with an elite defense.

    Then again over the past five years, the Oakland Athletics have had three years when they've outperformed their FIP by more than -.40 runs. In those three years their fielding has been worth 51.7 runs total or about 17.3 runs on average. Over the past five years, the As have outperformed their FIP by an average of .35 runs despite their fielding being worth 35 runs. To put that in perspective, the Orioles defense in 2014 was worth 56.4 runs. Presumably reality is noting that there are a few other relevant factors than merely the fielding metric.

    This is an example that illustrates that team fielding data isn't exact and therefore the Orioles could legitimately be good defensively in a way that this metric doesn't measure. I question whether the As have done that well for five years just due to luck.

    I think regression should be expected. But based on the results of other unknown factors it's possible that the extent of it will be less than I expected. Not necessarily likely but probably a 5-10% chance.

    ReplyDelete
  5. Erik - That could happen. It's also possible that Machado and Hardy go down and we're forced to use Flaherty at shortstop with unfortunate results.

    My point isn't whether Steamer is accurate or not. It's that expecting us to outperform our FIP by .52 runs for a second year straight is unlikely.

    Pat - Thanks

    ReplyDelete
  6. Statistics Don't LieJanuary 14, 2015 at 12:15 PM

    Matt, thanks for addressing my thoughts on the question. I am impressed that you've got the facts ready to share, and that you're able to consider opinions so objectively. Too many writers see their opinions as definitive, rather than the beginning of exploration on a topic.

    ReplyDelete
  7. Statistics Don't LieJanuary 14, 2015 at 2:23 PM

    As long as you have the historical FIP-ERA differentials handy, I am wondering how the 1970s Orioles compare to the 1990s Orioles?

    Specifically, Palmer's WHIP and K/BB ratios don't indicate as much dominance as Mussina's stats. (Ignoring CGs.) Trying to make a stronger case for HOF Mussina by showing that if Mussina had the same quality defense behind him as Palmer, his HOF worthiness would be even clearer.

    ReplyDelete
  8. I think my comments last season along the lines that there was a lot of magic in the results the Orioles were getting was something I was happy to admit and not question showed I knew it at the time.

    I just don't expect my lightning to strike twice in the same place. I can still pray for two lightning strikes.

    ReplyDelete
  9. Thanks Statistics.

    "As long as you have the historical FIP-ERA differentials handy, I am wondering how the 1970s Orioles compare to the 1990s Orioles?"

    I plan to discuss something related to that in more detail. Might take a few weeks.

    Fair enough Erik. And certainly there are different types of lightning strikes. After all, in 2012 it was wins in one-run games.

    ReplyDelete
  10. How much difference would be considered "regression"?
    Wouldn't it be logical to expect improvement, as good as things were?
    Machado is better than Flaherty, Schoop has a year of experience and working with Hardy, Lough and De Aza consistently are better than what we had last year.
    It really seems as if there are too many unknown factors to say much with confidence, except that the Orioles were outstanding on defense last year and we should expect more outstanding defense.

    ReplyDelete
  11. Phillip - The question isn't whether the Orioles defense will be elite. We're presuming that as a condition.

    The problem is that even elite defenses (as defined by Fangraphs) aren't good enough to prevent 80 runs. Rather they usually prevent 30 to 40 runs.

    The Orioles defense can improve and it still wouldn't be expected to prevent 80 runs like it did last year.

    ReplyDelete
  12. This team has consistently beat the numbers that are projected by these type of numbers. The team as a unit works and no numbers can measure it. Try to make sense of the 2012 or 2014 teams. Other then they just win games. As they will again this year. The favs in the east period. That is before they add players and they will before and during the year.SMH

    ReplyDelete
  13. If no numbers can measure it, then what are wins? I think it is off based to be so dismissive of these things.

    I think including 2014 with 2012 is a bit liberal in grouping things. It is also peculiar why we should ignore 2013. That year happened.

    We can wave hands and believe in magic, but really we are talking about uncertainty which is much more tangible than magic.

    ReplyDelete
  14. This entire blog is a joke

    ReplyDelete
  15. You mean, like, the whole entire blog?

    ReplyDelete
  16. Yes, the entire thing. No one cares about these bogus made up statistics that mean nothing to the average Orioles fan. Every article on this blog is the same gibberish. Also, nearly every article is negative towards the Orioles.

    ReplyDelete
  17. Do you get different results if you use RA/9 rather than ERA? Seems intuitive that good defenses usually make fewer errors, so the FIP-RA/9 difference might be greater than the ERA-RA/9 difference.

    ReplyDelete
  18. Actually, I think this is a pretty cool blog. It uses statistics that are pretty common and kind of old school now in front offices. Always am surprised when Orioles fans get upset about advanced metrics because this is exactly what Dan Duquette brought to the organization. Perhaps fans should be open-minded about them because these ways to measure things are common now.

    ReplyDelete
  19. I think we need to abandon the notion that FIP is a good indicator of "true pitching ability," because to be frank there is no solid justification (statistical or otherwise) for that claim.

    The orioles are outperforming their FIP because FIP is a flawed (to say the least) stat and it so happens that it is flawed in a way that undervalues the Orioles' pitching staff, which does not abound in strikeouts.

    ReplyDelete
  20. FIP is a significantly better predictor of future ERA until about 200 IP then they are pretty equivalent. So here is your evidence. In fact if one could swing a dead cat on the internet, you would find a lot of research on this. http://www.baseballprospectus.com/article.php?articleid=12844

    ReplyDelete
  21. Alex B. - Using RA/9 instead of ERA is an excellent idea and what I should have done.

    Anonymous - There is solid justification showing that pitchers have limited control about what happens when the ball ends up in play. I have a whole bunch of data showing that to be the case.

    Here's the thing. You're arguing that FIP undervalues the Orioles pitching staff. I'm arguing that the ERA-FIP difference shows the value of the Orioles defense. You'd agree that our defense is good, right?

    It makes sense that a good defense will make pitchers look better. That's basically what we're saying here.

    ReplyDelete
  22. If pitchers have "limited control over what happens to balls put in play," why include home runs in FIP? This is a rather silly and arbitrary distinction to make - that hard hit balls are not pitcher-controlled unless they're hit hard enough to go over the fence.

    I wholly expect that if/when the hitFX data becomes public, much of this nonsense will be cleared up. There is value in attempting to create pitcher metrics that don't depend on fielding, but we do not have the data to do it well and I see no reason to believe that current attempts at it are anything other than meaningless data-dredging.

    ReplyDelete
  23. Because fielders are not on the other side of the fence.

    I have seen nothing in your comments except unattached cynicism.

    ReplyDelete
  24. So what? There is no reason that the presence of fielders would nullify the ability of a pitcher to influence outcomes. Fielders have a harder time fielding balls that are hit well than balls that are hit poorly. Home runs are an extreme of that, but it is absolutely ridiculous to suggest that outcomes of balls in play are luck-dominated unless that ball happens to be hit hard enough to go over the fence. There is no magical fundamental divide here, simply a silly and arbitrary one drawn by people overfitting the data we have.

    There is a very big difference between cynicism and skepticism. The latter is sorely lacking in (among many other places) most sports statistics, largely because the readership (and many of the writers, unfortunately) simply do not have the necessary mathematics background to question the established metrics along the proper lines.

    ReplyDelete
  25. Except you are wrong about FIP's sampling bias. Tango developed that and he is not some casual dabbler in statistics. That is why I highly question your doubts because they have been wrong in their assumptions and overly generic in their application.

    ReplyDelete
  26. Double dipping is not sampling bias.

    If you don't understand what double dipping is, or why it is bad, it is discussed in any decent book on statistical analysis. Even wikipedia does a decent job of explaining it - see their page on multiple comparisons.

    ReplyDelete
  27. It is awkward trying to have a conversation with multiple anonymous people especially when they disagree with each other.

    Historical data shows that pitchers have control over home runs but limited control over other hits. If you want further information discussing this then I suggest reading all off the stuff that Voros McCracken wrote for BP.

    Failing that, I'll talk about it some more in a future post.

    ReplyDelete
  28. Machado and Hardy: Our present day Brooks Robinson and Mark Belanger? Hardy hits way better than Belanger did. Machado has shown glimpses of the great Brooks Robinson on defense. He may show to be a better hitter than was Robinson.

    ReplyDelete
  29. Two interesting ideas lurk in between the lines of this very good post: first, the idea of diminishing returns from improved defense implies that many effects may be non-linear. Sabermetrics seems to be based on the idea that every relationship is strictly linear, which would make baseball the only complex system that could be so described. Second, the difference between FIP and ERA might be a better metric for team defense than the defense metrics that poorly describe individual performance and basically ignore combined performance. This latter issue is important for defense, because almost every play involves multiple individuals, each of whose capabilities may reinforce or detract from their teammates'.

    ReplyDelete
  30. That is not what Saber metrics says. That would be like saying that medicine thinks that height and weight are the only needed variables when telling if someone is dangerously overweight. FIP is one measure of many that are used. No single measure or approach is universal.

    ReplyDelete
  31. Didn't realize people were still commenting.

    Hardy does hit better than Belanger but shortstops were weaker offensively back then. In retrospect, Cal was pretty amazing offensively and turned our shortstop position into a strength.

    I think that ERA-FIP does provide a check for team defense especially over long periods of time. There should be a correlation between those numbers and fielding metrics. That relationship does exist.

    ReplyDelete