Big Data Baseball: Math, Miracles, and the End of a 20-year Losing Streak is the next logical step in the Baseball Sabermetric non-fiction collection, preceded by Michael Lewis' Moneyball and Jonah Keri's The Extra 2%. Those books took meandering walks through concepts devised by the front office and how those were applied in the clubhouse and on the field while noting it was the perfect collection of personnel at the right time to make it all work. With Big Data Baseball, the premise is largely how there is a great deal of data points out there now and the key is knowing not only what to do with them, but, also, how you communicate what you learn to the people who actually step out onto the field. It is a story of how do you turn 10 million dollars into 10 additional wins...and having the right people just at the right time.
Perhaps the heart of this book is Clint Hurdle. He is the Billy Beane of this story. A man broken by the game and on his way who challenges his own convictions about the game. Hurdle overcame his failure as a player (if you can call making the Major Leagues and burning out being a failure) and his difficulties winning as a manager to fully embrace a deeper application of what the Pirates' data analysis department was coming up with. This not only included analytical scouting reports and frequent team meetings, but the actual inclusion of the data science team in the clubhouse and interacting with the players. This is presented as quite revolutionary.
For the non-narrative readers, the pull is by and large the focus on defensive shifts as well as the player development, acquisition, and application of players who fit the style of defensive shifts they are incorporating. After experimenting with minor league clubs, the organization decided to more fully adopt defensive shifting. They are certainly not on an island as other clubs like the Orioles have dedicated themselves to the shift as well. However, it is certainly true that the Pirates are one of the few teams on the tip of the shifting spear. Through the use of shifting, they essentially gained the plurality of those added wins.
The rest of those added wins were made up by Russell Martin who was a Yankees castoff. Martin's ability to pitch frame was able to give many runs back to the team simply by converting a few called balls to called strikes. This helped keep starters longer in the game by keeping hitters in pitcher's counts. It is a concept well-written about here at the Depot and elsewhere. As with shifting, it is also a concept that has become largely mainstream within the game. The part on Martin does get a little loose as the book tries to describe his pitch calling technique as being like Jazz even though Martin hates Jazz. It describes his way of calling pitches as for the pitcher to throw what the batter is not expecting, which is actually quite ordered.
The other aspect of the book that I found a little off was that this was a book about data science, but it did not seem to be edited by anyone well-versed in data science. For instance, a point was made of Pirates pitcher Gerrit Cole and how Cole's father was well studied in baseball analysis. The major point driven in this aside is how his dad instituted an application of the Verducci rule which largely centers on a gradual buildup of total innings year by year in order to prevent arm injuries. What is interesting is that the Verducci effect was initially poorly studied with an apparent confirmation bias. In the years past, it has been resoundingly discredited as being anything useful in application.
Perhaps truthing out the Verducci effect was not the place of this book, but I think it highlights something missing from the book as well as those written by Lewis and Keri. That would be that Science Fails. It fails a lot. It certainly is better than going blind into something, but the marvelous thing about scientific endeavors is that we refine reality as we know it as times moves on. While the book highlights how we have entered a new era of millions upon billions of data points to digest, it fails to note that having a lot of data points can also be problematic and wind up with a great deal of false positives. It is that false positive story that is needed here. Verducci's was a proto-big data false positive.
In the end, what this book does well is deliver a solid narrative with several interesting characters while also introducing many readers to more current thought of data analysis and market inefficiency opportunism. Travis Sawchik is able to take some relatively complicated concepts and provide a soft, inviting touch for less data obsessed readers. We are also quite pleased that former Camden Depot writer, Stuart Wallace, is name dropped in the book as a significant hire as the club moves forward.
Big Data Baseball: Math, Miracles, and the End of a 20-year Losing Streak
Flatiron Books, 256 pg