Camden Depot: Does Half Make a Whole: Jason Vargas and Kevin Gausman

28 November 2017

Does Half Make a Whole: Jason Vargas and Kevin Gausman

Unless you won the World Series today, baseball is about tomorrow. And tomorrow is about yesterday and some number of the days before. What I mean is that regardless of your level of statistical thought, you likely are looking at yesterday to figure out just what tomorrow might look like. Some players make these predictions feel more certain. For instance, a pitcher who pitches the same every month of the year, you probably feel rather confident that next year will be more of the same. Meanwhile, a pitcher with remarkable differences between his first and second halves may provide a more exciting effort at fortune telling.

There are three schools of thought. One, that the first half of the season is indicative of what a player really is. That offseason preparation and spring training includes approaches that typically stay the same so that the first half is a better illustration of a player while the second half results from things that happened in the first half that are not reproducible.

The second perspective is that the second half performance is indicative of future performance because it is most recent. A slight tinker to approach may unlock a player and that player will take that forward. Or perhaps injury occurs and that injury moves forward with the player.

The third perspective is that the first and second half are simply unimportant series of events that are better treated as a whole than in parts. That development does not neatly progress linearly or as arbitrarily episodically as season halves are set up. Therefore, go with the whole enchilada instead of reducing your sight by going halfsies.

This offseason, the Orioles have two players that fall into the whole half season argument. Jason Vargas was lights out in the first half of the season and the lights went out in the second half:

1st half: 106.1 IP, 2.62 ERA, 3.80 FIP, 2.0 fWAR
2nd half: 73.1 IP, 6.38 ERA, 5.94 FIP, -0.3 fWAR

His first half was so good that BORAS considered him as a 4/80 contract even at the ripe age of 35. The second half lowered him down to 2/17.2, which is a considerable drop. Of course, the two halves could not be more different in terms of production. Is he a good, but fortunate, 2 WAR pitcher? Or is a a replacement level workhorse? Supposedly, the Orioles are kicking his tires.

A little different and closer to home is Baltimore's own Kevin Gausman.

1st half: 97 IP, 5.85 ERA, 4.75, 1.1 fWAR
2nd half: 89.2 IP, 3.41 ERA, 4.19 FIP, 1.4 fWAR

OK, so Gausman being a second half starter is a bit of a product of his ERA and a never dying narrative, but he is included here as an introductory example because, well, ok, I don't know. Look! An Oriole player. I guess we can always go back to Ubaldo Jimenez' last season with the Indians.

Anyway, what is the method? I looked at all pitchers from 2012-2016 whose WAR/100 in the first half was 1 win better or worse than the second half. I then devised two models. One model simply looked at their year long performance and matched it against their performance the following year. The second model was applied to each group (i.e., better second half, worse second half) to devise a way to weight the halves in the prediction for the following season.

Pitchers who performed remarkably better in the second half.
Linear Model
2018 production = 0.547*(2017) + 0.071
R2 = 0.50

As expected, the previous year does a decent job projecting WAR rate for the upcoming season. These exceptional swing in performance seemed to indicate that a half season's worth of performance did not make it much better of a prediction tool than a half season (first half model, R2 = 0.47; second half model, R2 = 0.43). In other words, the linear models appear to show that half or full season performance for this cohort almost equally accounts for the next season performance. Therefore, a big second half bump in performance does not appear to provide any indication in a new set level of improvement in performance. There is a better indication in this group that the lower first half performance might be slightly more indicative of next season performance.

Regression Model
2018 production = 0.571*(2017 1st Half) + 0.311*(2017 2nd Half) + 0.016
R2 = 0.48

For this cohort, the regression model appears to agree with the linear models in that the lower first half performance (much greater coefficient value) has more weight in estimating future performance. However, the regression model does not account for next season performance any better than the simpler linear models. Again, the take home is basically that if all things are equal then maybe look at the first half performance of pitchers in this cohort.

Pitchers who performed remarkably worse in the second half
Linear Model
2018 Production = 0.665*(2017) + 0.072
R2 = 0.49

The model values a slightly different than the previous cohort, which is likely an aspect of regression to the mean. Pitchers who are performing well enough to last the first half and then see a big uptick in performance likely had a decent year, so they would be more inclined to see a decrease in their rates. Meanwhile, a pitcher who was kept in the rotation with a poorer second half probably has displayed a higher level of talent in the long run, so the regression is less. Regardless, the linear model here decently accounts for the data and the linear models for the halves are not remarkably different (R2 = 0.48 for both).

Regression Model
2018 Production = 0.361*(2017 1st Half) + 0.393*(2nd Half) + 0.015
R2 = 0.51

Similar to the linear model, the halves were valued similarly as expressed in their coefficient weighting. The regression model in general did as well accounting for the data as the linear model did.

Conclusion
The safe conclusion is that the data is incredibly noisy and improved or decreased performances in the second half are not terribly more informative than full season performance. The only caveat might be that for pitchers who see an improvement in their second half performance might be better judged by their first half performance. Doing so, probably would not be meaningful in most cases, but in a close decision between a couple pitchers then it might be prudent to entertain the idea that the performance in the first half might be more meaningful moving forward to next year.

3 comments:

Pip said...: I'm sure the statistics account for this, but Vargas only has a 40% ground ball rate.
That means that a tremendous amount of his success must have been the support given by the outstanding outfield playing behind him. The Orioles OF defense is horrible, so it stands to reason that regardless of how good he was or wasn't in KC, he would be considerably worse in Baltimore.
I'm not in love with Doug Fister, but I would much rather have signedhim than go anywhere near Jason Vargas.
By the way, because Vargas is so old, it would have been interesting for you to include the element of age and your calculations, because Vargas is old. It's quite possible that we just saw a "dead cat bounce " from him last season, and remember, dead cats only bounce once.; November 28, 2017 at 2:34 PM
Unknown said...: He's ace like compared to ubaldo and Miley. Who was the last O to win 18 games, Mike Boddicker?; November 28, 2017 at 9:25 PM
Anonymous said...: No Vargas. Jeez, we got to be able to do better than that. Have you guys been watching Kristen at Bird Watchers play faux GM? Supposedly the Braves faux GM traded both Markakis and Jim Johnson back to the O's with the Braves only paying about $3M in salary. Please tell me a REAL GM would not ever do that - especially without getting real SP prospects in return (as opposed to giving some nebulous prospects up to the Braves).; November 29, 2017 at 8:38 AM

Contributors

Jon Shepherd - Founder/Editor
@CamdenDepot
Started Camden Depot in the summer of 2007. By day, a toxicologist and by night a baseball analyst. His work is largely located on this site, but may pop up over at places like ESPN or Baseball Prospectus.

Matt Kremnitzer - Assistant Editor
@mattkremnitzer
Matt joined Camden Depot in early 2013. His work has been featured on ESPN SweetSpot and MASNsports.com.

Patrick Dougherty - Writer
@pjd0014
Patrick joined Camden Depot in the fall of 2015, following two years writing for Baltimore Sports & Life. He is interested in data analysis and forecasting, and cultivates those skills with analysis aimed at improving the performance of the Orioles (should they ever listen).

Nate Delong - Writer
@OriolesPG
Nate created and wrote for Orioles Proving Ground prior to joining Camden Depot in the middle of 2013. His baseball resume includes working as a scorer for Baseball Info Solutions and as a Video Intern for the Baltimore Orioles. His actual resume is much less interesting.

Matt Perez - Writer
@FanOfLaundry
Matt joined Camden Depot after the 2013 season. He is a data analyst/programmer in his day job and uses those skills to write about the Orioles and other baseball related topics.

Joe Reisel - Writer
Joe has followed the Norfolk Tides now for 20 seasons. He currently serves as a Tides GameDay datacaster for milb.com and as a scorer for Baseball Info Solutions (BIS). He is computer programmer/analyst by day.

Joe Wantz - Writer
Joe is a baseball and Orioles fanatic. In his spare time, he got his PhD in political science and works in data and analytics in Washington DC.

Batters (Q)
	OBP	ISO	wOBA	UZR
90th	.385	.252	.383	12
70th	.358	.217	.354	4
50th	.340	.180	.340	0
30th	.321	.158	.326	-5
10th	.304	.118	.304	-10
SP (Q)
	FIP	GB%	K/9	BB/9
90th	3.30	53	10.0	1.8
70th	3.76	48	8.7	2.3
50th	3.95	44	7.8	2.6
30th	4.27	41	7.4	3.1
10th	4.88	34	6.6	3.7