30 October 2017

BORAS (po) 2.0: Improving Upon the Original BORAS Position Player Contract Model

A few years back, I introduced two models to predict contracts terms: BORAS (pi) and BORAS (po).  That acronym, like many in the data science world, came first and then the words later, but it means Baseball Observation-based Renumeration Assumption System.  The idea being that the bias in determining contract value (even in comp model systems) could be removed, perhaps, by throwing all the recent contracts together in a big pot and regression modeling them with certain performance parameters.

The BORAS (pi) system, which is for pitcher contract, worked remarkably well.  The cumulation r2 of the model is 0.86 and the median miss is 1.7 MM.  Some of the misses are great, such as being off on Ian Kennedy by 11 MM AAV.  Simply put, highly encapsulated pitching metrics were rather effectively in projecting what a contract would be for pitchers.  Traditional metrics were batched together and more "advanced" metrics were batched together.  The previous three years of performance were considered.  It worked out.

I used the same approach with the BORAS (po) system for position players.  Encapsulation of traditional and those more "advanced" metrics appeared to be the wrong move.  Where the pitching metrics for starting pitchers largely all met needed confidence points, the data for position players was more varied.  For instance, fielding metrics for position players needs about two to three years to really stabilize and I was batching this information into more certain offensive data.  In other words (and this is very simply stating it), I was valuing a great fielding season the same as a great offensive season which is a no good, terrible, oh so awful way to do things.  It yielded an r2 of 0.47 and median value off of 3.9 MM, which is decent if you know nothing.  However, a passing interest in contract terms would make you far more away of probable terms than this model would.

This performance has led me to think about other ways to utilize data and come up with, perhaps, a better performing model.  First, I wanted to thin down data sources and noted that, unlike the pitching model, the hitting model valued similar statistics regardless of their traditional nature or more "advanced" nature, so I decided to just go ahead and use less encapsulated "advanced" metrics and ignore the traditional ones.  I also separated out fielding metrics and put them in a more appropriate bin with similar certainty.  So, now we have BORAS 2.0 (po).

Below is how the models compare.

Yr Total Yr Total
Alonso 2 18.7 3 34
Duda 2 19.6 2 21.4
Holliday NRI NRI
Hosmer 4 61.2 5 93.8
Lind 1 7.9 2 17.5
Moreland 2 21.4 2 16.1
Morison 3 34.8 4 59.8
Reynolds 1 9.5 1 7.7
Santana 3 45.2 3 51.4
Kendrick 1 10.6 2 20.4
Utley NRI 1 6.8
Walker 2 22.3 3 41
Cozart 4 59.3 5 83.7
Escobar NRI 1 8
Frazier 3 50.6 3 37.7
Moustakas 3 34.2 3 35.2
Reyes NRI 1 8.7
Nunez 2 20.8 4 53
Cabrera NRI 1 8
Dyson 2 25.2 3 32.5
Granderson 1 12.7 2 25.9
Jay 1 9.2 2 17.6
Martinez 4 64.8 4 85.1
Maybin 2 21.8 2 20.5
Upton 5 83.9 5 100.8
Cain 5 92.2 4 64.7
Gomez 2 21.2 3 31.3
Jackson 2 21.8 2 20.6
Bautista NRI NRI
Bruce 3 30.1 3 37.5
Ethier NRI 1 7.2
Gonzalez 1 8.6 2 15.7
Smith NRI 1 8
Sogard 2 17 2 14.1

BORAS 2.0 appears to do better with the higher end players.  Justin Upton at 5/100.8 made more sense than 5/83.9.  Cain makes more sense as a 4/64.7 than 5/92.2 player.  However, BORAS 1.0 seems to be more into handing our NRIs.  It will be interesting to see how this offseason plays out and whether BORAS 2.0 is an actually improvement on the original model.


Anonymous said...

It sure looks more like what's going around in the chats/blogs. I'd be interested to see what it has to say about the big upcoming FA's (Machado/Harper/Donaldson etc...). I notice that Dyson comes out closer to the "platoon player" numbers now and Nunez's cost goes up (which would have prevented me from offering at him in my blueprint). I honestly don't think that guys like a Werth or a Holliday would accept a NRI at this point in their careers. They'd want some guaranteed money or retire. There may be some tweaking to be done there. Werth is at least as good a bench option as Smith (1/8). Holliday as a DH only.... not so sure. Why don't you do some comparisons with last years FA group and see how the results came out? That might be a good initial check.

Anonymous said...

Although, Sogard signed for one year. Maybe there is a home team discount that can be had.

Jon Shepherd said...

R2 are the results.

Jon Shepherd said...

Also NRI is this case would be equivalent to retire.