08 March 2017

Mark Trumbo is Not the Ideal Orioles Leadoff Hitter - Chris Davis Is

This post runs as a follow-up to Jon Shepherd's lineup optimization proof of concept.

Buck Showalter tinkers with lineups, but he'll never be able to try out every combination. Lineups can be considered permutations of the roster - combinations in which order is relevant, such that the same nine players can be ordered differently for a new lineup. The total number of distinct permutations of nine players on a 25-man roster is given by the following equation:
9! / (25-9)! = 741,354,768,000
Given the possibility of trades and acquisitions, not to mention the Orioles' penchant for shuttling players between the Majors and the Minors, the true number of possible lineups over the course of a season is even higher. Given only 9 batters, as Jon Shepherd worked with in his proof of concept research, the number of lineup permutations drops to a much more manageable 362,880 distinct possibilities. No amount of lineup tinkering will allow Showalter to test each of these lineups; he would need 2,240 seasons of 162 games to see them all take the field, and I doubt Manny Machado will even be an Oriole after all that time.

As a fan enabled with myriad tools for lineup optimization (and Jon's algorithm that explains the variance run production quite nicely), I sought to find the best Orioles lineup given those same nine batters that we can reasonably expect to feature as starters at least very frequently in 2017. I sifted through all 362,880 possibilities thanks to the built-in permutation generator in Python, ran each through Jon's algorithm, and recorded the results.

Because I considered all possible lineups, I had no need to establish a set of assumptions that would guide me. I don't need to start with Chris Davis batting fourth because of his skillset. I expected to see a sort of cycle in the most productive lineups, resembling something like 3-batter groups that end with a power hitter. This follows Jon's suggestion that a batter's ability to produce runs is predicated largely on whether the batters before him can reach base. I did not expect to see a prototypical leadoff hitter batting first, because innings rarely end so tidily as to allow the following inning to start over at the top of the lineup. More often the man leading off an inning will not be the leadoff hitter, and in my eyes, this indicates that the importance of a "true leadoff hitter" is vastly overstated (the importance of a player with true on base skills is not).

The most productive lineup suggested by this exercise is the following:
1B Chris Davis
LF Hyun-soo Kim
2B Jonathan Schoop
DH Mark Trumbo
3B Manny Machado
RF Seth Smith
CF Adam Jones
C Welington Castillo
SS J.J. Hardy

This lineup is worth an estimated 885 runs over the course of 162 games, not accounting for handedness splits. That would be 141 runs more than the Orioles scored in 2016, and 50 runs more than the Mark Trumbo-led lineup that Jon suggested last week. In fact, Davis was the leadoff hitter in four of the 10 most productive lineup permutations. Perhaps Trumbo is not the ideal leadoff hitter after all, but Davis, his left-handed counterpart with a similar batter profile, is!

This lineup is worth 287 runs more than the worst lineup combination possible from these nine players, a Machado-led abomination that slotted Trumbo, Adam Jones, and Davis as the 6, 8, and 9 hitters, respectively. Such a batting order would fly in the face of traditional lineup construction as well as this new machine-led practice.

More importantly, if we assume that last year's run prevention is indicative of this year's run prevention, we can estimate how well the best and worst lineups would perform according to pythagorean win-loss. Again, assuming that the 715 runs scored against the Orioles in 2016 carries over and would be identical in 2017 (unlikely, and a tenuous assumption at best), the pythagorean win-loss record would estimate the following results for the best and worst lineup permutations:

Projected Runs Scored, 2017
Runs Against, 2016
Pythagorean Win %
Pythagorean W-L

By this estimate, the best possible lineup is worth 28 wins. This matches up with the rule of thumb that 10 runs is equivalent to one win. The best possible lineup, with a projected 885 runs, is expected to score 100 runs more, or 10 wins better, than the traditional lineup put forth in Jon's article.

If this exercise is to be considered accurate, then lineup optimization is critical to a team's success. It boggles the mind that the more analytical front office and managerial combinations haven't considered context-dependent lineup optimization if they are believed to be the difference between a team fighting for a playoff spot and one of the best teams in the league with minimal tinkering.

I end with the same question Jon posited: have teams neglected the importance of lineup optimization because of some normalized tools and broad rules of thumb?

I choose to believe that there are human factors pushing teams away from this sort of radical overhaul, specifically that players wouldn't like it. As antiquated as they are, RBIs and lineup position seem to be points of pride for many players, and it's not a stretch to think that a prototypical leadoff hitter can market himself as such and earn a higher payday than he would if the skillset asked of the first lineup spot was fungible. Making players uncomfortable likely has real effects, even if there's no technical reason why batting seventh should be any different than batting second. It may also drive free agents to consider other teams that won't torpedo their ability to market themselves, or toss their routine into a blender every time someone new joined the team.

Further, many fans and owners would likely be too quick to call an experimental lineup a failure. One bad game out of batters would be enough to lampoon the manager who organized it, and persistence in the face of a handful of failures would probably lead to the manger's and/or GM's ousting. In terms of self-preservation for a manager or GM, it makes far more sense to leave those wins on the table and use the standard, sub-optimal batting order formula that every other team uses. It's similar logic to why NFL coaches kick field goals and PATs more often than they should, when going for a first down or two-point conversion improves win expectancy: it's safer to lose doing what's accepted than to lose doing something radical, even if the radical idea made the loss less likely.

There may also be a technical limitations to this process that has prevented teams from truly optimizing lineups. It took nearly three days to run all 362,880 batting order permutations through Jon's algorithm, and that was only with 9 batters. All possible lineup permutations given the full 25-man roster caused a memory error on my computer. I can't imagine trying to expand this algorithm to consider the 40-man roster, which would hold over 99 trillion permutations. Doing this on a regular basis for each team would take more than just modeling and coding knowledge; it would require a deep understanding of how to efficiently manage physical storage, and likely a huge amount of it at that.

However, the benefits that can come from analyzing the order of just the nine batters the team expects to play most often seems to have some benefit that doesn't require a supercomputer or a superanalyst. I then return to the thought that maybe shaking up the lineup may be akin to shaking a hornet's nest, both in terms of upsetting players and risking careers.


Anonymous said...

More often than not, a strikeout to start the game....I think not!

Unknown said...

More often than not, the game starts with an out of some kind. What does it matter if it's a strikeout or a groundout? I would argue that of the outs to lead off a game, strikeouts might be most advantageous because they tend to take more pitches than other types of outs.

Anonymous said...

If Davis batted leadoff, he could approach 300!!! strikeouts, a record that would NEVER be broken!

Jon Shepherd said...

I am unaware of any leadoff hitter getting 900 PA.

Anonymous said...

Davis would only need about 700, and if you batted leadoff every game, you would be close. One thing Davis will hold when done...All time strikeout leader, he will blow Reggie away!

Jon Shepherd said...


700 PA and 300 Ks is a 40% strikeout rate. Davis is a 31-33% K batter. It would be a monumental collapse to see him go to 40%.

Common Sense.

40% K batters are not productive batters. Davis would not play every game.