24 July 2012

You are what your record says you are?

Many folks like to quote Bill Parcels' gem: You are what your record says you are.  It is a simple phrase and one rings true.  However, many question the truth of that statement.  The following is a statistic known as the Pythagorean Expectation for Wins.

[the graphic below is not appearing nicely for everyone: the formula is as follows:
Win = (runs scored)^2 / ((runs scored^2)+(runs allowed)^2)]

\mathrm{Win} = \frac{\text{runs scored}^2}{\text{runs scored}^2 + \text{runs allowed}^2} = \frac{1}{1+(\text{runs allowed}/\text{runs scored})^2}

The idea behind that formula is that runs scored and runs allowed are better indicators of talent than wins.

A few days ago, I put forward data that shows that extra inning winning percentage does not indicate whether a team is good or not.

Below is a simple graph comparing first half record, first half Pythagorean record, and first half 9 inning game record against second half record.



The data set is 2011 AL teams.  Just 14 data points.  What we see here is that all three methods are not particularly great approaches to predicting future success in 2011.  However, the Pythagorean record does tend to reflect second half performance a little better than the other two.

It would probably be a good idea to repeat this for the past ten years and see whether these trends hold true.  At this point, it appears there may be slightly better ways than wins to figure out who you are.

3 comments:

Anonymous said...

Can you label your axes? Not sure what exactly I'm looking at.

Jon Shepherd said...

Later...y-axis is 1st half metric and x-axis is 2nd half wins.

Matt P said...

I did a similar calculation with all the teams solely for win percentage from 2000-2011. Getting the run data would simply take too long.

Two differences: Instead of first half, I did a teams April, May and June record vs their July, August and September. I don't think this matters.

I didn't get the number of games played per month so I have to assume each month has about the same number of games. This isn't ideal but probably won't matter.

I got an r^2 of .2311 for win percentage which means that the correlation (r) is .48075.

I also did a comparison of a team's record from the start of the season to July and the record in August and Sept.

I got an r^2 of .2727 and an r of .5224.

On the positive side, the correlation between a seasons record and their record at this point was .912 (r^2 of .83). That means that the wins the Os have should be considered in the book at this point.