16 July 2014

Worst Is First: Orioles' Projected Second Half

The Orioles are currently sitting four up in the AL East.  That is considerably better than what Baseball Prospectus and their PECOTA model projected at the beginning of the year.  Based on that work, they saw the Orioles entering the break with 45 wins as their 50th percentile (of course, many of their scenarios had the Orioles at 52 wins and above, but we tend to focus on the median result), a pace that is 7 games behind their current position.  Likewise, they were noted as having a 13.6% chance to make the playoffs.  They now sit at 58.9% and the favorite to win the division.

That said, BP's system says that now only will they win the division, but will do so losing the most games by any AL East division team.  In other words, they have built up enough of a buffer that it will help shield them from future poor play.  Below are the actual 1st half marks and the projected second half.

1st Half
Projected 2nd Half

AL East W L W L Wins GB
Baltimore 52 42 33 35 85 --
Toronto 49 47 34 32 83 2
NYY 47 47 34 34 81 4
Tampa 44 53 34 31 78 7
Boston 43 52 33 34 76 9

There are a few points to take away from the above table.  First, those projected records are all fairly similar.  That means that the PECOTA system using up to date performances sees little difference between the clubs.  In fact, I would imagine that it would take about five games to note that any one team is significantly better than any other here.  So, yes, the Orioles have the worst projected record, but it is likely not all that different from what would be expected of Tampa.

Second, these projected records are based on teams that have yet to be finalized.  Large deals may still be completed, which would likely drop teams like the Rays and Red Sox a couple games and improve the Orioles or Blue Jays by a game or two.  With that in mind, we may be able to reach some level of significance between the best and worst team by the end of July.

Third, you should also keep in mind that these systems do not know certain things.  For instance, it likely does not know that Tanaka is gone for the season.  That may seem like a major loss for these projections, but also keep in mind that the projections did not foresee Tanaka pitching as well as he has, so the expected performance marks have not changed much. Also, if Chris Davis' poor bat is related to hex placed upon him, BP has no way to measure that.  If Nelson Cruz has mechanistically done anything to permanently improve his performance, BP's system will not know that either.

There is actually a decent amount of data suggesting that knowing these things largely do not matter.  That is, if a player has played the first half and done exceptionally well or poor that he is much more likely to hit what his projection is as opposed to what he did in the first half of the season. That is good news for Chris Davis fans in that he is likely to regain value.  However, that knowledge does not bode well for Nelson Cruz.

The take home should be that these projections are not worthless.  We noted at the beginning of the year how valuable these tools are in that there is a good positive correlation between good performing teams and good projected teams.  Yes, there is uncertainty.  These projections are considered to be as accurate as rain forecasts about 4-7 days out.  In other words, it is good to know it is likely to rain next week when planning events, but do not go nuts ensuring that you have everything anticipating for rain on that day.

All of that said, it is good to see that the club has a four game lead going into the second half as they will likely need every game they can muster with the Yankees and Blue Jays capable of making up some ground.  This is one of those years where it may make sense for the team to relieve themselves of a Dylan Bundy or Hunter Harvey.  We will see.


Anonymous said...

"The take home should be that these projections are not worthless."

The projections for the first half of the season were. Why should these be any better? Maybe statistical projections are... stupid?

Jon Shepherd said...

Actually, the projections for the first half were not useless. You are probably noticing outliers and there tends to be a good number of outliers. That should be expected for any system that has an r of .7 or so, which all systems have.

If you bothered to read the article that we linked over to MGL's work, then you might have noticed what I wrote about in the above article. That is, projections are more accurate than half a season of data.

That said, I think maybe you read the title and then took six seconds to respond, so a more thoughtful discussion on projection models was probably difficult for you to accomplish.

Anonymous said...

Clearly the producers of that stuff haven't seen the Red Sox play or heard that Tanaka - the only thing keeping the Yankees afloat - is hurt.

John Morgan said...

The projections aren't worthless, but nearly so. They always emphasize regression to the mean; these projections seldom predict any team will win 95 games, but every year teams do that. The interesting part of baseball is what happens when teams and individuals exceed expectations (or fail to live up to them). That will happen in the second half, and I would guess that the standings will not resemble these "averagey" projections as a result.

Jon Shepherd said...

These projections predict 95 win seasons in every simulation. What you are looking at are 50 th percentile projections. A projection is not a prediction.