The All-Knowing Derbytron Roller Derby Rankings

Data used

Currently, data is only being taken from WFTDA Sanctioned games which means only WFTDA teams are being counted in the ratings. Eventually, I would like to include every interleague team in the country including secondary travel teams and non-WFTDA teams. That is not possible at this time. I'm working on it, though.

Accountability

Accountability is important in anything like this. How can I say that my ratings system is accurate if there isn't anything keeping track of how accurate it really is? This is why the percentage of correct predictions is visible on every page of the Derbytron, showing the accuracy of every official game. As in, if a lower rated team beats a higher rated team, it's basically counted as a loss, but if a higher rated team wins, it's counted as a win. The only time a lower rated team would be predicted to win is if they are the home team and are rated less than .026 lower than the higher ranked away team.

But, in reality, it's sports which is by nature unpredictable and that's why it's so fun.

Methodology

The basis of the Derbytron rankings is to order teams by their relative quality. Every part of this equation is used to determine where teams fit. #1 should beat #2. #34 should beat #35. Etc (assuming these games are played on a neutral floor). This isn't a system that will determine who is having the best season or who has the best history. So, how does it work?

My goal with this equation is to rate average teams as .5, better teams move closer to 1, and worse teams move closer to 0.

50% - What happened in the game? That's important, right?
This is determined using the pythagorean expectation. Essentially, it's an equation that uses the score of a game to predict a team's winning percentage. Using this method will produce a number between 0 and 1. A close game will give a number close to .5 to both teams. A blowout will give a number approaching 1 to the winning team and a number approaching 0 to the losing team. This is what I call the vacuum game rating as this number only takes into account the score of the game (it doesn't matter where it's played or how good or bad either team is, thus, it's like it's being played in a vacuum).

25% - Who are you playing and what does their vacuum look like?
The vacuum rating is then combined with the opponent's average vacuum rating. This number gives a good indication of the quality of an opponent but it is not everything. A team could be very good but if they've only played the top 4 teams, they may have a very low vacuum rating. That is why it is only 25%.

25% - Who has your opponent played? Kind of important.
How has the opponent fared when compared to their competition? This is a number calculated by combining game ratings with opponent ratings and giving a more accurate description of the quality of the opponent.

Other Variables

Where was it played?
A percentage is added to all away games and taken away from all home games. There is an advantage to playing at home so that needs to be reflected in the ratings. Neutral games are unaffected by this variable.

When was it played?
Games from this season are far more important than last season. Teams can change dramatically from one year to the next. So, only games from the last six months of the previous year are counted. Not all games are used if teams played more than five games in the those six months. In that case, only the last five are counted (unless multiple games were played on the same weekend (nationals and regionals being the prime example)). These games carry much less weight than games from the current year.

Blowouts shouldn't hurt you.
One of the problems I ran into with this system was highly ranked teams blowing out lowly ranked teams and having their overall rating number lowered. For example, if a team that is rated as a .7 plays a .2, even if the .7 beats the .2 by 400 points, the .7's overall average would come down after combining the numbers with the low numbers of the other team. I made up for this by inserting a blowout contingency plan. A blowout cannot hurt you. I have determined that a blowout is more than 80 points, so if a team beats another team by 87 points but that game rating is lower than their final rating, they will be awarded with their final rating for that game. So, a team cannot move down in the rankings after blowing out another team. It is a sliding scale from 80 so there isn't a huge difference between beating a team by 79 or 80.

Teams that haven't played anyone else suck.
Teams playing their first game are not counted. All of the previous averages that I've mentioned are determined without the current game as a factor. But, the averages of teams that have only played two games wouldn't be averages at all, they'd just be the ratings of the one other game they've played. So, for teams that have only played two games, both games are calculated into their average ratings.

What games are being used?
Currently, data is only being taken from WFTDA Sanctioned games which means only WFTDA teams are being counted in the ratings. Eventually, I would like to include every interleague team in the country including secondary travel teams and non-WFTDA teams. That is not possible at this time. I'm working on it, though.

Hopefully all of this makes sense. Any questions you have should be posted in the comments (don't be shy) and I'll reply and update the method description with more accurate information.

Why Derbytron? Why?

We already have Roller Derby ratings? Why do we need more?

Well, when I look at the other rankings (DNN, FTS, WFTDA), I see faults. I'm not saying they're wrong or that their methods are invalid, I just disagree. Originally, my plan was just to bitch and moan every time the rankings were updated but then I had a slow weekend and decided to start on this ever-enlarging project.

On the surface, Derbytron is most similar to Flat Track Stats as they are both computer rankings systems (or mathematical equations). But, FTS uses much more data, dating back a few years. FTS also puts emphasis on tournaments where Derbytron does no such thing. Derbytron is much more of a here and now, who is better at this moment, kind of a rankings system, whereas FTS is ranking who has been better over time.

The obvious difference between the Derby News Network Power Rankings and the Derbytron Rankings is the human element. DNN has no mathematical equation, it's a few experts ranking teams based on what they've seen. The other difference is that non-WFTDA teams are eligible for the DNN rankings. These two rankings systems are similar in that they have the same goals, ranking who is better at the current moment, they're just concacted using completely different methods.

And, obviously, the WFTDA Rankings and Derbytron have almost no similarities other than they are both roller derby rankings systems. WFTDA is a regional human rankings system.

Official/Unofficial Rankings

Games with 1 team that hasn't played a single game in the current season will not be counted. Games with a team that hasn't played 4 or more games in the current period (current year plus last 6 months of previous year) will not be counted in the accuracy number.

Early in the season, ratings will not be counted until the average number of games played in the current season is 3.0. The ratings will remain unofficial until that point and will not affect the season's accuracy rating because computer rankings need data and there just isn't enough to be confident.

The All-Knowing Derbytron is now on derbytron.com

4.30.2009

Data used

Accountability

Methodology

Why Derbytron? Why?

Official/Unofficial Rankings

ABOUT

FACEBOOK

TWITTER

TWITTER

SUBSCRIBE

BLOG ARCHIVE

BY DATE