The first test I ran used data from ESPN Stats and Information. I measured players wOBA based on contact against pitches thrown in the strike zone, wOBA on contact against pitches not in the strike zone and wOBA based on at bats that ended in either a hit by pitch, strikeout or unintentional walk. Then, I determined a given player’s percentage rank for each of these three categories from 2013-2015 as well as his rank for the likelihood of each of these events occurring.
I found that the average wOBA for pitches put into play in the strike zone was .385, for pitches out of the strike zone was .290 and for pitches not put into play was .204. I expected pitches hit in the strike zone to be the most productive type, but I was surprised to see that contact made against pitches thrown outside of the strike zone was more productive than at-bats not resulting in contact.
This indicates that batters are stuck in a game theory situation since they want to make contact as often as possible. Swinging aggressively may result in higher contact rates but also more strikeouts, fewer walks and potentially less productive contact. This means they need to decide whether to swing at a bad pitch early in the count. Ideally, batters would never walk or strikeout because they have their best results when making contact. On average, they should be willing to give up roughly 11 walks in order to prevent 9 strikeouts. For comparison, a player like Adam Jones roughly strikes out 5 times for each time he earns an unintentional walk.
A stepwise regression analysis suggests that wOBA for pitches hit inside the strike zone percentile rank is the variable with the strongest relationship to actual wOBA percentile rank. The next most influential variable was wOBA on pitches which weren’t put into play. On average, 85% of all at bats end in one of the two above scenarios. There is a weaker relationship between actual wOBA rank and the percentage of balls in each of the three categories, suggesting that this has some relevance but not a huge amount. In general, the percent of pitches resulted in strikes or balls put into play had a positive impact on total wOBA (obviously, putting strikes into play is better than putting balls into play) while failing to put a ball into play had a negative impact.
The R^2 for this analysis was .9225 indicating that these wOBA categories accurately describe overall wOBA. This is expected but important to verify.
There is a high year-to-year correlation between how often a batter puts a pitch in the strike zone (.746) or out of the strike zone into play (.725) or alternatively fails to put the ball into play (.836). There is some year-to-year correlation between a batter’s ability to produce when putting pitches in the strike zone into play (.544) or when not putting the ball into play at all (.639), but only minimal when putting pitches not in the strike zone into play (.169). This suggests that players largely have the same outline from year-to-year but their production is considerably more variable. It also suggests that production on pitcher-friendly pitches is considerably more variable than production on hitter-friendly pitches. It also suggests that while these results have some predictive value, they’re more helpful for illustrating actual results.
The second test I ran used PITCHf/x data from 2013-2015 and studied players that faced at least 1000 pitches in a given season. I then used zone information to determine whether a pitch was a clear strike, a clear ball or unclear defined as being within 1 inch of the strike zone in any direction. Then I determined a players’ overall wOBA percentile rank as well as his wOBA rank for pitches that are strikes, balls and unclear as well as the likelihood percentile rank of him putting a ball, strike or unclear pitch into play.
Unsurprisingly, batters were most successful against pitches in the strike zone with a .411 wOBA, worse against pitches that were questionable with a .342 wOBA and worst against pitches that were clearly balls with a .284 wOBA. It’s pretty clear that batters are most successful when swinging at strikes.
Roughly two-thirds of balls put into play were strikes. As a result, it should not be surprising that a regression analysis indicated that how a player does against strikes has the largest impact on his wOBA by a significant margin. Performance against pitches that are unclear or not in the strike zone are significant variables but with minimal impact. The model’s R^2 was .96 suggesting that these components do accurately describe what occurred.
As with the ESPN data, a players’ year-to-year profile stays reasonably static. There’s a strong correlation of about .73 between a players current “In Play Percentile Ranks” and his rankings the following year for pitches in the strike zone or pitches not in the strike zone. There is a smaller correlation of .516 between a players’ current “wOBA against strikes” percentile rank and his rank in the following year. All in all, it shows that we can be reasonably certain that a player who swings at good or bad pitches will continue to do so in the future, but that this dataset is better used to describe what happened rather than to predict what will happen. This chart can be found below:
The third dataset that I looked at was also PitchF/x data from 2013-2015 based on players that faced at least 1000 pitches in a year. For this dataset, I determined their wOBA percentile rank based on batted ball type (fly ball, ground ball, line drive and pop up).
As one might suspect, batters were most productive hitting line drives with a .732 wOBA, flyballs ranked second with a .354 wOBA, grounders were third with a .251 wOBA and pop-ups were fourth with a .022 wOBA.
A regression analysis suggested that fly balls were more predictive of future wOBA than line drives. Hitting a larger percent of fly balls and line drives resulted in a higher wOBA while hitting grounders and pop ups resulted in a lower wOBA. An R^2 of .91 suggests that these categories accurately describe what occurred.
I found a moderate year-to-year correlation for the likelihood of a player being ranked at a given percentile for his profile type. I also found a reasonable year-to-year correlation for a player hitting fly balls, line drives and pop-ups but only a minimal one for a player hitting ground balls. A chart summarizing this data can be found here:
All in all, these datasets largely show things that are obvious. They show that hitters do better against pitches in the strike zone than against pitches that aren’t. They show that hitters are more productive when hitting line drives than when hitting ground balls. But they also show a few things that aren’t obvious. They illustrate how strikeouts, walks and hit by pitches relate to balls put into play. And they also can illustrate a player’s strengths and weaknesses.
Being able to visualize the data like this can lead to some surprising findings. Hopefully I’ll get a chance to show you some.