To continue this practice of forecasting attendance, I decided to take a look at what factors influence single-game attendance at Camden Yards. Perhaps this work would qualify Camden Depot to help the Orioles plan promotional night or concession inventory (always buy 1 extra bobblehead for me, Orioles)!
The features I decided to build my regression on were pretty straightforward:
- Day of week
- Month of year
- Opening Day
- Day/night game
- Game-time temperature
- Weather condition (rain, sun, overcast, drizzle, or dome)
- Winning percentage (or pythagorean winning percentage)
- Runs per game, for and against
- Current streak type (has the team won or lost the last few games they've played?)
- Games back in the division
- Promotional night
- Kids' promotional night
- Size of promotion/number of items given away
- Family night/field trip days (kids run the bases, etc.)
- Fireworks night
- Gender-specific promotional night (Mother's and Father's Day, specifically)
I ended up dropping day/night game, runs for/against, family nights, fireworks nights, and gender-specific promotional giveaways because they all tended to be on Fridays, Saturdays, and Sundays, which introduces some complications because day of week was already included. I also dropped the year from the analysis because I was concerned that it was too similar to winning percentage.
I chose to use a lasso regression model, because it (in summary) zeroes out coefficients that aren't actually significant to attendance. And interestingly, a lot of coefficients were zeroed out: most teams, some months, some weather conditions - and winning percentage. The final list of included and meaningful coefficients is shown here:
As you can see, this is a much shorter list than we started out with! I want to call out the teams that affect attendance at Camden Yards when they come to town:
Within the AL East, the Yankees and the Red Sox cause attendance to spike, while the Rays drive attendance down. That makes sense with our understanding of fan tendencies: Rays fans don't go to games anywhere, much less hundreds of miles from home, and Yankees and Rays fans are both everywhere and willing to travel to see their teams at Camden Yards. Being the Orioles' primary rivals, I'm sure their presence makes O's fans want to show up as well.
Other nearby teams, the Phillies and the Nationals, also drive attendance up. The Nationals' annual visit to Baltimore is the largest single-game attendance boost from a team.
The White Sox probably don't really drive people away from the ballpark. They simply have the misfortune of being the team that was in town following the unrest in Baltimore City and, as a result, the team that played the Orioles in front of an empty stadium.
The most surprising thing for me was the effect. or lack thereof, of playing winning baseball on attendance. While a good team draws more fans throughout the year, it seems that fans are drawn to the ballpark day to day for how well it works with their social schedule. Weekends are popular draws, as are the summer months, and people show up for promotions and family nights. Interestingly, attendance also decreases as temperature rises (I suspect that that function is parabolic, with a nice warm day drawing more fans than either a cold one or a very hot one). But how good the Orioles are seems less important than how convenient it is to attend a game.
To be fair, this regression uses data from 2012 to 2015, years in which the Orioles were generally pretty good. Winning percentage, or pythagorean winning percentage, might be a stronger predictor of attendance if some of the more turbulent years were included as well. So how well does it work?
It does a decent job, but we're within, usually, 10,000 attendees one way or the other. Not terrible, but considering that some games have as few as 15,000 attendees, this might not be the model to base inventory plans on. This regression has an R^2 of 0.635, indicating that this model describes 63.5% of the variation in single-game attendance. Another limitation, as far as long-term inventory planning goes, is that this model uses weather as a predictor of attendance - which it most certainly is! Walk-up ticket sales spike when it's nice out, but the weather probably isn't something concessions stands would know far enough in advance to use to their advantage.