New Developments

It's been a busy couple of weeks here at Minnesota SSA. We have been making strides in advancing our statistics, reviewing and tweaking our formulas in order to inch closer to complete accuracy.

In this post, we hope to give people a look into the what we do in order to help our readers better understand the stats that we have created. The first thing we did was a complete overhaul of our batting metrics. We kept our formula for batRuns completely intact, but we've remodeled our formula to calculate negRuns. NegRuns was initially seen as a way to measure the negative effect of a small selection of what we deemed to be beyond normal outs. This included outs on the base path, ie stolen bases, outs that limited the movement of other baserunners, and also double plays because they obviously have a much more negative effect than a simple out.

Our initial view on the topic of negRuns was that only those runs needed to be factored in, and that "simple" outs would be relatively uniform between most players. We found that this was true for the average players, because our whole method is grounded in comparing players to averages. However, once you reached the extreme lows and extreme highs there was some substantial variation. To fix this, we did some research to look for the simplest way to bridge the gap we were facing.

The method that we determined to be the most accurate was to use the same calculations we had used for positive at bats and apply it to simple outs. This is as far into the specifics I can go without revealing our formulas. I also wanted to give you an update on the newer stats that we have been working on. The biggest area we hope to expand into is projections. This is a much trickier thing to work with because it takes more time to prove it's accuracy. Our current stats "predict the past", because they predict how many runs a team scored based on stats that we already have.

With projections, we hope to use the same numbers, but also look at other factors to give our best guess at future production. This will be far more useful in things like choosing starters and batting orders because it will factor in opponents and how hot/cold a player has been. In order to do this, we've started off by quantifying slumps/hot streaks. What we are doing is comparing a players last 10 games to the rest of the season. We weight the importance of the last 10 games as slightly higher than the rest of the season. For example: if the season was exactly 50 games in, then the typical "weight" is 20% for the last 10 and 80% for the other 40, because those are the actual percentages of the season. Once weighted, it would be more like 30% and 70%.

The new weighted stats are plugged into our formulas we already have been using, and the numbers we get more accurately represent a trend that we would expect to see in the next few games. The nice thing about this method is that it will update on it's own as the season progresses.

The second thing we have done is working on what's called regression to the mean. That is the theory that stats that are extremely different from the norm are more likely to go back to average over time than to stay the same. The way we determine "the norm" is based on the average pitching stats of the teams next 3 opponents (typically 9-10 games). We once again use weights to keep the resulting numbers closer to the batter's stats than the opposing pitchers.

Once numbers have been run through these two methods, we use the same formulas as we usually do, and return per-game stats that can be used to better predict a players performance in a game than our current stats do.

We will keep updating you as we develope as an organization. Remember that you can get regular updates on our work, as well as the St. Paul Saints at our twitter page here.

Popular posts from this blog

Predicting* Some Conference Champions