Over the past few weeks, Patrick Dougherty and I have been throwing lineup optimization regression models at you. I introduced the decade old lineup model identifying run value on a positional basis and introduced a new model that identified runs batted in on a co-dependent positional basis. Patrick then did a thoroughly best fitting of the new model and found that Chris Davis leading off was the best iteration of the starting nine we evaluated. However, it is easy to note that Chris Davis is an atypical leadoff hitter, so how atypical is he?
A best fit line on a scatter plot is an easy visual to understand. You can observe the range of data on the x-axis and on the y-axis. You have a decent handle on whether a new data point is found within that range of data points or if it is an exceptional outlier. Intuitively, the degree to which a data point is an outlier, the more and more your concern rises about whether this model can realistically handle your new data point.
David Freedman, an economist, is known is some circles for his cheeky Conservation of Rabbits Principle. He states that in order "to pull a rabbit from a hat, a rabbit must first be placed into the hat." In other words, a model outcome that is different from the model input should be highly questioned, so let us explore Chris Davis as a leadoff hitter.
We shall ignore the seven, eight, and nine hitters. A quick glance over them shows us that they are reasonable bottom third lineup hitters. Chris Davis at the top of the order feels a bit more peculiar. The model considers walk rate, strikeout rate, doubles rate, and home run rate. Those metrics were the most relevant based on significance testing.
Chris Davis is projected by ZIPS to walk 11.6% of the time he is up at the plate. Of the 300 data points over the past ten years for a team's leadoff hitter, 15 are within 10% of Davis' projection. In total, that rate would be the 17th best and on par with excellent walk rates put forth in 2007 and 2008 by the Orioles' own Brian Roberts. Anyway, a top ten percent walk rate certainly stretches the model, but stays within the boundaries set by the data. With doubles, Davis is well within the model variables with his 3.7% projected rate. That, however, is certainly not very impressive among the data points in the data set. He would be 248th out of 300 positions.
Davis also has a projected 33.6% strikeout rate. That is off the model radar. As noted, the model has 300 team entries and the highest rate is the 2016 Brewers with 26.5%. Davis' rate would be a 30% increase over that. Davis is also projected to have a very impressive 6.6% home run rate. That is also about a 30% increase of the next closest number, which is the 2016 Twins. With respect to these metrics, we are in an area that the model is not well supplied to use that information.
What about the aforementioned Mark Trumbo? For home runs, he is 10% over the extent of the data in the model. His doubles are right smack dab in the middle. His strikeout rate would be third worst in this dataset. His walk rate would also be in the middle. As a whole, we should feel more comfortable with Trumbo's projection as a leadoff man than Chris Davis', but both are so unconventional that a regression model like this might be extrapolating effects beyond where we should feel comfortable.
The lesson here really should extend beyond the exercise Patrick and I have been performing. It is important to understand causality and the limitations placed upon us to be able to determine what exactly causes anything else. Certainly, I would think that we all agree that induction is useful to determine a better grasp on causation, but that we must be quite transparent and acknowledge the uncertainty involved in our methods of induction.
When we put forward such unconventional answers to well trodden fields, we must note that we have certainly extended ourselves beyond practiced reality. True, this extrapolation may one day be shown to be true, but this is more of a leap of faith than any sober trust put into our methods. And, that is really the crux of it. When our universe is limited to what we have experienced, our intellectual foundation beyond that scope is weak. No, I do not think Trumbo or Davis are ideal lead off men, but I would suggest that it is a perfectly good hypothesis to offer that they might well be ideal lead off men.
I doubt when tens of millions of dollars are at play though that we will be able to fill in our data set.