More considerate statements expressed doubt over the repeatability of the statistic year to year, which is actually a great point. Career accumulation of extreme numbers, for instance Mike Scioscia's 47 wins above projected in the past eleven years, are easy to see as being of some worth and being the result of the manager, front office, or combination thereof. What is less easy to see is whether perhaps Bobby Valentine's 2012 Red Sox season with -22 wins above projected is meaningful.
In order to address this is the most simple and straight forward way I could imagine, I decided to compare accumulations season by season. In other words, I took a manager's first full season in my database of seasons since 2003 and compared it to his second season to generate an r value. I then took the average of his first two seasons and compared it to the average from his first three seasons. Finally, I took the average of his first three seasons and compared it to his first four seasons. By doing this, we get a rough idea how repeatable the metric is and how many seasons of data are probably required before the statistic levels out. The total number of managers in the data set with four seasons was 29.
Here is the r value table.
For those who may be more visually inclined here is the graph that produced the middle value (two seasons versus three seasons).
MWAP 1 season 0.64 2 seasons 0.85 3 seasons 0.86
There certainly is variability in this data set, but looking at it in a general sense, there is a great deal of fit there. Even if you take it as a litmus test of good versus bad, there are only three points out of 29 that shift from a positive average to a negative average from the second to third season with only one of those making a major leap. We may not know if a manager is fully behind this association, but he certainly is one of the common factors here.
To explore the impact of the front office, I took managers from the very same database who had spent two seasons with one club and then two with another. The idea here is that perhaps the only similarity before the two situations is the manager. It might help us discern whether the manager is really the one driving this difference in projected vs. actual results or not. In other words, is Buck Showalter doing what Dave Trembley could not or maybe is Dan Duquette doing something Andy MacPhail could not. I would not really too much into that sentence and what we are doing here. I would only take into consideration that if there is no significant differences between experiences from the managers then maybe the difference has a good bit to do with them.
I only had 13 managers meet this criteria: Bob Melvin, Bruce Bochy, Buck Showalter, Clint Hurdle, Dusty Baker, Eric Wedge, Fredi Gonzalez, Jim Tracy, Joe Torre, Ken Macha, Lou Piniella, Manny Acta, and Ned Yost. I compared their final two seasons with their first organization with the first two seasons of their second organization using a paired t-test. Using this statistic, would we be inclined to think that the difference was mainly due to the front office if the value for this population was 0.10 or less. What we actually find is that the value is 0.71. This means that the last two seasons from team one are quite similar to the first two seasons of team two. This suggests that the deviation from the projection win total and the actual win total is likely to do with the manager in that his ability to deviate is repeatable. Therefore, it is a skill.
Finally, from the original data set of 29 managers, I thought it might be interesting to sub-select two year average and see how they compare the the second two year average. I chose the average for the third and fourth seasons and compared it to the average from the fifth and sixth seasons. 24 managers fit this requirements. The r value was determined to be 0.89. Here is the graph:
Again, if we use this simply as a litmus test, we find that only three of the 24 managers crossed from one designation (above or below) to the other side of the axis.
This suggests that managers do have an actual impact on the team's performance and that this impact might actually be, to some extent, measurable. Of course, having just one season of data does not appear to be ideal when attempting to figure out what exactly this value is.