Mark Trumbo - Ideal Leadoff Hitter |
Here, we will use that approach to assess the Orioles. On proof of concept, we shall do something simple. Let's assume that Welington Castillo, Chris Davis, Jonathan Schoop, Manny Machado, J.J. Hardy, Hyun-soo Kim, Adam Jones, Seth Smith, and Mark Trumbo would play every game and that their performance would be in line with 2017 ZIPS projections. We can plug in their projected OBP and SLG to find out what lineup would be best for the Orioles. The tool finds two lineups producing equal value and above all other lineups:
LF Hyun-soo Kim
1B Chris Davis or 3B Manny Machado
C Welington Castillo
DH Mark Trumbo
3B Manny Machado or 1B Chris Davis
2B Jonathan Schoop
CF Adam Jones
SS J.J. Hardy
RF Seth Smith
In general, parts of the lineup make sense and other areas are rather curious. Kim leading off makes sense because the tool values leadoff men who do not make outs and, according to ZIPS, he will have a .370 OBP. That sets the table for the batters following. Davis or Machado following him makes sense because you want to maximize your chances of being able to score this OBP-abled Kim. Castillo as the third hitter seems questionable, but this model acknowledges that league data shows that the third hitter in the lineup faces remarkably fewer RBI situations than hitters in the second, fourth, or fifth slots. The rest of the lineup makes some traditional sense. You can read up more about this kind of lineup optimization here.
Perhaps what is more interesting about the above lineup tool is that the difference between the projected best lineup and worst is 59 runs. That difference of six to seven wins is large, but less so when you consider the worst fathomable lineups are something no manager would ever do. Meanwhile, the best lineups are quite close to what we traditionally envision. For instance, the worst projected lineup is one with Davis, Machado, and Trumbo filling out the final three slots in the batting order. It would never occur to Buck to arrange his hitters like that. So, the major take home message for all has been, in effect, lineup order rarely matters because a manager's lineup is usually incredibly similar to what this tool projects to be the best lineup.
Now, I think there are obvious problems with this tool. By using a league wide population as a data set and then applying regression, we are assuming that each batter in each lineup position exists separate from other batters. What I mean is that Manny Machado in this tool does not have Kim in front of him and Castillo behind him. Machado, instead, follows the league average leadoff hitter and is followed by a league average third slot hitter. This lack of connectivity between players is an issue. Yes, ideas like lineup protection are poorly evidenced, but I am more referring to how hitter ability improves run scoring chances. This makes sense. If you have an elite OBP generator in front of you, your lineup position is potentially more productive than the league average lineup position. Overall, that may have great impact.
With that in mind, I decided to create a new tool and run a different regression model. This model did not consider on OBP or SLG metrics. Those metrics were strangely revolutionary over a decade ago, but have their limitations. They encapsulate a great deal of information that include different skills that may be useful in different scenarios. Instead, I focused on event rates of walks, strikeouts, and various batted ball results against Runs Batted In minus Home Runs (based on the assumption that home run RBIs of the batter were lineup independent). Each lineup position took into consideration the performance of that player, but also the players who bat before that player. The data set I used was league wide and by team from 2007 to 2016.
Using this approach frees ourselves from only considering a player by a context-free lineup position. Once I developed the formulas for each batting position, I then compared the expected runs to actual runs and resulted in a trendline fit with a R(2) of 0.84. I wondered how well the model would work if each lineup position was normalized and wound up with a R(2) of 0.68. In other words, consideration of lineup order was a major consideration in improving the fitness between the relationship of expected runs and actual runs.
At this point, we can go back to that original dataset of nine Orioles hitters. Remember, this is a concept piece, so we should not take this exercise as how many runs the Orioles will score or even that this lineup is universal and invulnerable to handedness. Instead, we should merely look at this as a simple exercise to see where the different kinds of production appear to fit best using this lineup position model.
In this post, my limited coding know how leaves me unable to create a computer program to figure out the best lineup. Therefore, I decided to go about this using some knowledge about where certain players might fit best (until Patrick Dougherty finishes the build and runs the model, which will be a later post). I began with the assumption that Chris Davis is ideally the cleanup hitter. From there I took the other eight hitters to see who increased his value the most in the three spots ahead of him. What I found is that Davis has the most expected RBIs if Seth Smith, Hyun-soo Kim, and Manny Machado batted in front of him. He would stand to see 87 RBIs in addition to his 46 HR RBIs (over the course of 162 games played).
I then moved on to Manny Machado in the third slot, which goes against the rationale of lineup optimization perspective that began this article. While, the Smith and Kim were a good one-two punch before Machado, a little shifting around of names found a far batter solution with minimal impact to Chris Davis' projected RBIs. The result was fairly surprising in that the model appears to think that the best one through four for the Orioles is Trumbo, Smith, Machado, and Davis. With great certainty, I can tell you that this model is the only thing on this Earth that has suggested that Trumbo should lead off.
Before revealing the rest of this "ideal" lineup, let me explain some things about run opportunities. Trumbo leading off does make some sense in that each position in a lineup is greatly dependent on the abilities of those who come before the player. For instance, if you are a cleanup hitter then you will not exactly want a great OBP player leading off. Why? A good OBP player leading off will let the inning go to the second and third hitters. Past the first inning, that leadoff hitter stands a good chance of batter when the worst batters in the lineup have hit right in front of him and likely were turned into outs. Those second and third hitters that follow the leadoff hitter will also become outs the majority of the time. This means that there is a great chance of the inning ending and the clean up hitter coming up as the first or second batter without no one on base.
That makes sense, right? You want to maximize the batters on base immediately before you best base clearing hitter, but also isolate them enough from inferior hitters who rack up outs and put the base clearing hitter in scenarios where there is nothing on base to clear.
Well, the next question comes to why then have such an extreme home run hitter batting first and not fifth to clean up what Davis cannot get to? The reason against that is that Davis does two things really well: (1) knocking in base runner with a lot of homeruns and (2) getting a lot of strikeouts which ends innings. This means Trumbo has to contend with a player who will often clean the table by homerun or striking out. With a strikeout, the inning ends or players do not move up a base. That decreases run opportunities. There is a logic there that the model is expressing. It is possible that putting a secondary base cleaning threat at leadoff, you give him more plate appearances to knock himself in as well as making most of a poor situation at the bottom of the order with poor hitters racking up outs.
After some more tinkering, the final model projection is:
DH Mark Trumbo
RF Seth Smith
3B Manny Machado
1B Chris Davis
CF Adam Jones
2B Jonathan Schoop
LF Hyun-soo Kim
C Welington Castillo
SS J.J. Hardy
In the end, this lineup looks like a wholly reasonable lineup if the only thing you did was flip Trumbo and Jones. That flip will often be made due to the belief in speed needing to be in the leadoff position, which might be a questionable conviction. The Trumbo leadoff model suggests a 162 game production of 834 runs, while a Jones leadoff model nets 827 runs. Seven runs, so not that big of a deal.
What is interesting is if one flips Seth Smith with Mark Trumbo. A simple flip of the first two batters while leaving everyone else the same. Run production drops from 834 to 797. Thirty seven runs. That seems very drastic to me. Very, very, very drastic. In the traditional data model above, a flip of two players would result in a very minor change in run production. Is that because it would literally result in a minor change of run production or is it because the flip assumes all positions are context neutral to that position.
One other lineup to test would be this one: Kim/Smith/Machado/Davis/Trumbo/Jones/Schoop/Castillo/Hardy. This is a very generic, normal lineup. How is it viewed? 782 runs. Here we have an "ideal" lineup generating 834 runs and a perfectly normal lineup getting dropped to 782 runs. That spread is nearly equal to what the traditional model thinks the difference is between the best and worst lineups possible.
It may well be that in order to have a useful lineup optimization tool that you need to consider chaining production, linking the players in the lineup into a greater entity than just assuming a player's talent is independent of others by the assumption that they are surrounded by league average talent and abilities.
I am unsure whether I truly believe this, but, after several days of hammering it, I am at a loss as to what I might not be considering. Have we really neglected the importance of lineup construction because of a simple overly normalized lineup tool presented over a decade ago?