28 November 2017

Does Half Make a Whole: Jason Vargas and Kevin Gausman

Unless you won the World Series today, baseball is about tomorrow.  And tomorrow is about yesterday and some number of the days before.  What I mean is that regardless of your level of statistical thought, you likely are looking at yesterday to figure out just what tomorrow might look like.  Some players make these predictions feel more certain.  For instance, a pitcher who pitches the same every month of the year, you probably feel rather confident that next year will be more of the same.  Meanwhile, a pitcher with remarkable differences between his first and second halves may provide a more exciting effort at fortune telling.

There are three schools of thought.  One, that the first half of the season is indicative of what a player really is.  That offseason preparation and spring training includes approaches that typically stay the same so that the first half is a better illustration of a player while the second half results from things that happened in the first half that are not reproducible.

The second perspective is that the second half performance is indicative of future performance because it is most recent.  A slight tinker to approach may unlock a player and that player will take that forward.  Or perhaps injury occurs and that injury moves forward with the player.

The third perspective is that the first and second half are simply unimportant series of events that are better treated as a whole than in parts.  That development does not neatly progress linearly or as arbitrarily episodically as season halves are set up.  Therefore, go with the whole enchilada instead of reducing your sight by going halfsies.

This offseason, the Orioles have two players that fall into the whole half season argument.  Jason Vargas was lights out in the first half of the season and the lights went out in the second half:
1st half: 106.1 IP, 2.62 ERA, 3.80 FIP, 2.0 fWAR
2nd half: 73.1 IP, 6.38 ERA, 5.94 FIP, -0.3 fWAR
His first half was so good that BORAS considered him as a 4/80 contract even at the ripe age of 35.  The second half lowered him down to 2/17.2, which is a considerable drop.  Of course, the two halves could not be more different in terms of production.  Is he a good, but fortunate, 2 WAR pitcher? Or is a a replacement level workhorse?  Supposedly, the Orioles are kicking his tires.

A little different and closer to home is Baltimore's own Kevin Gausman.
1st half: 97 IP, 5.85 ERA, 4.75, 1.1 fWAR
2nd half: 89.2 IP, 3.41 ERA, 4.19 FIP, 1.4 fWAR
OK, so Gausman being a second half starter is a bit of a product of his ERA and a never dying narrative, but he is included here as an introductory example because, well, ok, I don't know. Look! An Oriole player.  I guess we can always go back to Ubaldo Jimenez' last season with the Indians.

Anyway, what is the method?  I looked at all pitchers from 2012-2016 whose WAR/100 in the first half was 1 win better or worse than the second half.  I then devised two models. One model simply looked at their year long performance and matched it against their performance the following year.  The second model was applied to each group (i.e., better second half, worse second half) to devise a way to weight the halves in the prediction for the following season.

Pitchers who performed remarkably better in the second half.
Linear Model
2018 production = 0.547*(2017) + 0.071
R2 = 0.50

As expected, the previous year does a decent job projecting WAR rate for the upcoming season.  These exceptional swing in performance seemed to indicate that a half season's worth of performance did not make it much better of a prediction tool than a half season (first half model, R2 = 0.47; second half model, R2 = 0.43).  In other words, the linear models appear to show that half or full season performance for this cohort almost equally accounts for the next season performance.  Therefore, a big second half bump in performance does not appear to provide any indication in a new set level of improvement in performance.  There is a better indication in this group that the lower first half performance might be slightly more indicative of next season performance.

Regression Model
2018 production = 0.571*(2017 1st Half) + 0.311*(2017 2nd Half) + 0.016
R2 = 0.48

For this cohort, the regression model appears to agree with the linear models in that the lower first half performance (much greater coefficient value) has more weight in estimating future performance.  However, the regression model does not account for  next season performance any better than the simpler linear models.  Again, the take home is basically that if all things are equal then maybe look at the first half performance of pitchers in this cohort.

Pitchers who performed remarkably worse in the second half
Linear Model
2018 Production = 0.665*(2017) + 0.072
R2 = 0.49

The model values a slightly different than the previous cohort, which is likely an aspect of regression to the mean.  Pitchers who are performing well enough to last the first half and then see a big uptick in performance likely had a decent year, so they would be more inclined to see a decrease in their rates.  Meanwhile, a pitcher who was kept in the rotation with a poorer second half probably has displayed a higher level of talent in the long run, so the regression is less.  Regardless, the linear model here decently accounts for the data and the linear models for the halves are not remarkably different (R2 = 0.48 for both).

Regression Model
2018 Production = 0.361*(2017 1st Half) + 0.393*(2nd Half) + 0.015
R2 = 0.51

Similar to the linear model, the halves were valued similarly as expressed in their coefficient weighting.  The regression model in general did as well accounting for the data as the linear model did.

The safe conclusion is that the data is incredibly noisy and improved or decreased performances in the second half are not terribly more informative than full season performance.  The only caveat might be that for pitchers who see an improvement in their second half performance might be better judged by their first half performance.  Doing so, probably would not be meaningful in most cases, but in a close decision between a couple pitchers then it might be prudent to entertain the idea that the performance in the first half might be more meaningful moving forward to next year.


Pip said...

I'm sure the statistics account for this, but Vargas only has a 40% ground ball rate.
That means that a tremendous amount of his success must have been the support given by the outstanding outfield playing behind him. The Orioles OF defense is horrible, so it stands to reason that regardless of how good he was or wasn't in KC, he would be considerably worse in Baltimore.
I'm not in love with Doug Fister, but I would much rather have signedhim than go anywhere near Jason Vargas.
By the way, because Vargas is so old, it would have been interesting for you to include the element of age and your calculations, because Vargas is old. It's quite possible that we just saw a "dead cat bounce " from him last season, and remember, dead cats only bounce once.

Unknown said...

He's ace like compared to ubaldo and Miley. Who was the last O to win 18 games, Mike Boddicker?

Anonymous said...

No Vargas. Jeez, we got to be able to do better than that. Have you guys been watching Kristen at Bird Watchers play faux GM? Supposedly the Braves faux GM traded both Markakis and Jim Johnson back to the O's with the Braves only paying about $3M in salary. Please tell me a REAL GM would not ever do that - especially without getting real SP prospects in return (as opposed to giving some nebulous prospects up to the Braves).