Comments on Camden Depot: Mark Trumbo is the Ideal Orioles Leadoff Hitter

The Rangers used Brian Downing as their lead off h...

2017-03-04T20:36:14.600-05:00

The Rangers used Brian Downing as their lead off hitter back in the day. It worked OK.

Email the Depot. ..I have a couple ideas.

2017-03-02T05:45:03.140-05:00

Email the Depot. ..I have a couple ideas.

Jon, I can try to work on it, but I can't prom...

2017-03-02T03:59:09.142-05:00

Jon, I can try to work on it, but I can't promise anything. In order to write anything truly interesting I'd have to run the Monte Carlo analysis. Unfortunately I'm a VERY rudimentary level coder, so I'm not 100% confident that I can build a baseball sim that will spit out sufficiently predictive data to be interesting. I should have time to work on something like that next week, so I'll try to think about what a basic sim might look like in the meantime.

A quick and dirty workaround might be to find close historical analogs to the players' projections and build several test lineups and use the WhatifSports simulation engine to test those against a few 2016 pitching staffs. If nothing else I might try that. Presumably the guys charging $13 a season to simulate baseball are doing a decent job of it. No way to do it for free on Strat...

Sounds like a great idea to me! Mr. Smith seems li...

2017-03-01T10:11:34.009-05:00

Sounds like a great idea to me! Mr. Smith seems like he'd be a great contributor to this site (or any serious baseball site).
I'm not a statistician and it's been a long time since I've taken a stats course, but I'd certainly welcome seeing him contribute his analyses not only to the question of how best to order the lineup but to other baseball topics that involve statistical analysis.

Any desire to write up your concept for post here?...

2017-03-01T08:10:49.863-05:00

Any desire to write up your concept for post here?

My whole reply is maybe. I think it is an assumpti...

2017-03-01T05:54:04.297-05:00

My whole reply is maybe. I think it is an assumption that more PA over a year means more runs, but I think it is equally valid that ordering can produce more runs than having more players come to the plate. I recognize this goes against really baseball data conclusions, but lineup quality obviously has importance.

So, I think your point is possibly valid, but I do not find that the concept is more compelling. From a result standpoint, I comprehend the contention.

This was basically my interpretation too. But by h...

2017-03-01T01:48:06.356-05:00

This was basically my interpretation too. But by handling everything on an individual basis the model inherently ignores long-term opportunity costs. Obviously Trumbo not getting on base impacts the guys batting behind him in the lineup and their opportunity to drive in runs. But it also shortens the game and ultimately costs everybody in the lineup a little more on top. If we assume that each lineup position has an equal chance to come up last in a game, swapping the guys at 1 and 7 in the order with .063 points of difference in OBP shortens the season by roughly 7 outs. That's not a huge number, but it's a few runs, and it highlights an inherent weakness of considering only individual contributions, particularly when the model is based on a regression analysis of a fairly small data set.

At first glance 10 seasons of 30 teams, thousands of individual player-seasons, feels like a big sample. But when you consider the number and range of the variables involved - including those that aren't included in the model - it really becomes quite small. Your fitting assumes that the model is reliable everywhere, but the number of guys with a specific statistical profile like Trumbo's is quite small. In fact, the sample of players from 2007-2016 with 500 PAs and within 5 points of Trumbo's projected AVG and OBP and 10 points of his projected SLG contains exactly 2015 Todd Frazier. When you further break it up into individual events you see that Frazier's HR rate definitely lags behind Trumbo's, while his doubles rate is substantially higher. If you expand the sample down to 0 PAs, you can add 2011 Chris Heisey, who is a better statistical match for Trumbo. So our reasonable comparison group for Trumbo is 1 or 2 guys, with 1 or 2 sets of teammates. RBI is a stat with huge error bars. How good of an idea can we possibly have how somebody with Trumbo's statistical profile really interacts with different types of players to produce runs? It's not just Trumbo in the leadoff spot that creates projection issues. A few thousand player seasons in a sport with massive statistical variance leave Trumbo and many others with a limited or nonexistent data set entirely.

After further thought, I'm not surprised that the regression fitting was good. If you add enough parameters you can make a model fit any data well. In such a situation it doesn't necessarily imply predictive power. Another big concern I have with this model is the apparent lack of consideration for SB/CS. Guys who steal bases - even with a success rate below the break-even point - tend to score more runs than statistically similar players who don't steal. By correlation, guys who bat behind them would tend to have more RBI. Again, given the scarcity of data, such factors could really skew the model.

It seems to me that you could obtain a similar type of model with much more predictive power through a Monte Carlo analysis initiating innings with random ordering of batters and statistical outcomes of at-bats and see how runs are scored. To do an even better job have a statistical distribution of "pitcher skills" - include logic to massage hit, walk, K, and HR rates in a manner consistent with the variability in pitching. From such an exercise you could ostensibly extract ordering of players that correlates with greatest run-scoring for this team specifically and with a much better-developed data set for the players involved. Obviously there are some people who will take issue with it not being grounded in a "real" data set, but most of those people aren't going to trust a lineup produced from a regression analysis either.

I think what the model says is that a player is mo...

2017-02-28T19:30:07.261-05:00

I think what the model says is that a player is more than his OBP and Kim's ability to get on base along with his other abilities are more important elsewhere.

The whole model is based on maximizing RBIs, which indirectly values players who score runs.

The out issue is indirectly included in the model.

Real weakness of the he model is that Trumbo in the leadoff slot is a clear projection outside of the available data because teams do not bat a guy there.

Given that your model is based on regression analy...

2017-02-28T18:32:59.598-05:00

Given that your model is based on regression analyses of individual events, I'm guessing that it doesn't have a way of counting outs? Granted your model fit surprisingly well to the league run-scoring data, but it does seem like an obvious weakness and could explain how it finds something "optimal" with the team's highest-OBP player batting in the bottom third and a guy with a projected OBP of .307 leading off. The traditional model for lineup optimization basically just convolutes effectiveness with number of opportunities, admittedly ignoring correlated events (IE hits and walks might be much more similar outcomes for players batting behind Trumbo and Schoop, who don't spend a lot of time on first and second, than for player batting behind Kim. But you may have moved too far in the opposite direction and miss major game trends (IE you put high OBP guys at the top of the order and you will get more PAs).

At least the last lineup, with Jones and Trumbo sw...

2017-02-28T08:04:02.711-05:00

At least the last lineup, with Jones and Trumbo switched as you suggested, would be something that Buck might use. The first lineup you suggested (with Machado in second) is also not a bad idea to try. Considering the O's hit as they do (K's and Hr's) and there were an awful lot of solo shots last year, a statistically predicted lineup seems like a good way to get an instant boost in production.