Camden Depot: Yes, Chris Davis Should Lead Off

29 March 2018

Yes, Chris Davis Should Lead Off

A year ago, I decided to experiment a bit with lineup optimization. If you have toiled around baseball data science a bit, you know that lineup optimization tools come down to a few realizations:

1. No one uses what a tool would consider an optimized lineup.
2. Everyone is not far off that optimal lineup.
3. The difference between the best and worst conceivable lineup is about 30 runs usually.
4. These tools rarely include enough of the right information to make an informed lineup.

I recognized that state of the research and decided to take on a challenge that most tools do not consider: the linear relationship of a lineup.

What I mean when I talk about lineup linearity is that each member of a batting lineup exists in context of those who bat around him. While much of baseball data science is about isolation, isolation, isolation; I tried to consider context, context, context. Yes, true talent level is best measured in a vacuum, but talent effect might well be best measured by recognizing how a player's talent is impacted by the talent of others.

Let us consider an extreme example. Let us say we have a singles hitter. Let's say that this singles hitter is very fast. Let's say that he walks a lot, too. How about we given him a line of 350/450/400 and 80 steals out of 85 attempts. The individual we created is a super Juan Pierre. A player like that would be worth about 5-6 WAR. Now, what if I told you that all of his teammates struck out. They struck out every single time without exception. While super Pierre has 5-6 WAR "talent," his "talent effect" is below replacement level.

Why? The way WAR works is to assign a run value to every event. That run value is determined by league averages. What WAR considers is this, what is super Pierre's talent in the average lineup, in the average position in that lineup, in the average base-out condition, with as many other considerations averaged out. You can see how that is a great way to determine Pierre's true talent, but not his effect. Because his effect is linked into his context.

With that in mind, I created (part 1, part 2) a lineup optimization tool that considered how a player does in a particular position in a lineup in relationship to those who bat before him. The model I put together worked well and correlated to actual run production. The model weighs heavily on doubles, home runs, walks, and strikeouts. Those were the primarily determinants in run scoring. It should be noted that one is limited by who actually plays in each position in the lineup. A big bruising hitter batting leadoff is highly uncommon, so the model may well be extrapolating beyond its data capabilities. Weird things may well happen outside of the data set. But what was remarkable about that work was that it suggested that perhaps it was a bad idea to group power hitters. That maybe your best home run-centric power hitter who gets on base should bat lead off.

The model declared that Chris Davis was the best leadoff hitter for the Orioles.

Fast forward to this Spring Training and a major point of discussion was that Chris Davis was in fact leading off games. It was noted as being done to get him more plate appearances, but also noted as testing out the idea that maybe he should well be batting leadoff. It is a scenario that Davis tends to do well with. Last year, when he was confronted with a situation where there were no outs and the bases empty (130 PA), he hit 259/338/534 (129 wRC+) and fared more poorly in other situations with a 184/308/308 (64 wRC+). Now, all that is just gravy. The model does not know those situational stats. What it recognizes is Davis' overall statistics and what it means based on how leadoff hitters have hit in the past.

This post will only look at two different lineups. Yes, the season will offer a myriad of sequences, but we will just play around with this iteration.

Traditional Optimized

Tim Beckham 3B Chris Davis 1B

Trey Mancini LF Trey Mancini LF

Manny Machado SS M. Machado SS

Jonathan Schoop 2B J. Schoop 2B

Chris Davis 1B Adam Jones CF

Adam Jones CF Tim Beckham 3B

Anthony Santander DH A. Santander DH

Caleb Joseph CF Caleb Joseph CF

Colby Rasmus RF Colby Rasmus RF

738 runs 790 runs

This is one of those stunning model results. Optimizing the lineup to the model results in a prediction that major gains in run scoring would happen at the leadoff position (+12 runs), sixth position (+14 runs), and seventh position (+26 runs). The leadoff difference can pretty much be explained by Davis' increase in power over Beckham. Sixth has more to do with run opportunities than differences in hitter makeup. Seventh has nothing to do with the hitter and all about the opportunities he now sees. Still, I really want to reiterate, that it is astounding that the model predicts a difference of 52 runs between these lineups. That would be worth five wins and would greatly improve upon the runs scored by last year's team (743).

These results, however, are not astounding to us because we came to this conclusion last year and that surprise wears off. We also saw about a month or two after publishing our results that several teams began experimenting with our approach (i.e., Kyle Schwarber batting leadoff). With a club like the Orioles, a club without an obvious leadoff hitter and a need to find value in something that few others are doing, this might well be a kind of advantage they can exploit if the model is actually correct.

Maybe the Orioles will venture and give this idea a chance. Or, maybe they will do what everyone else is doing and hope to beat them by playing the same game.

4 comments:

stevej said...: Chris Tillman should be the Opening Day starter. He is the key to being competitive. If his velocity is back, they are OK, if not, the trade deadline will be the only interesting part of the season. Note that if he had his typical season in '17, they would have been tied for the 2nd wild card one week before the season's end.; March 29, 2018 at 10:53 AM
Pip said...: This is very interesting, I really appreciate it.; March 29, 2018 at 9:24 PM
Will Sisco said...: I was with you until your model recommended Caleb playing CF; March 30, 2018 at 5:24 PM
Jon Shepherd said...: Either autofill or we really stumbled onto something.; March 30, 2018 at 5:49 PM

Post a Comment

Contributors

Jon Shepherd - Founder/Editor
@CamdenDepot
Started Camden Depot in the summer of 2007. By day, a toxicologist and by night a baseball analyst. His work is largely located on this site, but may pop up over at places like ESPN or Baseball Prospectus.

Matt Kremnitzer - Assistant Editor
@mattkremnitzer
Matt joined Camden Depot in early 2013. His work has been featured on ESPN SweetSpot and MASNsports.com.

Patrick Dougherty - Writer
@pjd0014
Patrick joined Camden Depot in the fall of 2015, following two years writing for Baltimore Sports & Life. He is interested in data analysis and forecasting, and cultivates those skills with analysis aimed at improving the performance of the Orioles (should they ever listen).

Nate Delong - Writer
@OriolesPG
Nate created and wrote for Orioles Proving Ground prior to joining Camden Depot in the middle of 2013. His baseball resume includes working as a scorer for Baseball Info Solutions and as a Video Intern for the Baltimore Orioles. His actual resume is much less interesting.

Matt Perez - Writer
@FanOfLaundry
Matt joined Camden Depot after the 2013 season. He is a data analyst/programmer in his day job and uses those skills to write about the Orioles and other baseball related topics.

Joe Reisel - Writer
Joe has followed the Norfolk Tides now for 20 seasons. He currently serves as a Tides GameDay datacaster for milb.com and as a scorer for Baseball Info Solutions (BIS). He is computer programmer/analyst by day.

Joe Wantz - Writer
Joe is a baseball and Orioles fanatic. In his spare time, he got his PhD in political science and works in data and analytics in Washington DC.

Statistics Cheat Sheet (2016 percentiles)

Batters (Q)
	OBP	ISO	wOBA	UZR
90th	.385	.252	.383	12
70th	.358	.217	.354	4
50th	.340	.180	.340	0
30th	.321	.158	.326	-5
10th	.304	.118	.304	-10
SP (Q)
	FIP	GB%	K/9	BB/9
90th	3.30	53	10.0	1.8
70th	3.76	48	8.7	2.3
50th	3.95	44	7.8	2.6
30th	4.27	41	7.4	3.1
10th	4.88	34	6.6	3.7

Traditional		Optimized
Tim Beckham	3B	Chris Davis	1B
Trey Mancini	LF	Trey Mancini	LF
Manny Machado	SS	M. Machado	SS
Jonathan Schoop	2B	J. Schoop	2B
Chris Davis	1B	Adam Jones	CF
Adam Jones	CF	Tim Beckham	3B
Anthony Santander	DH	A. Santander	DH
Caleb Joseph	CF	Caleb Joseph	CF
Colby Rasmus	RF	Colby Rasmus	RF
	738 runs		790 runs