The Size of Dota’s Skill Brackets

August 21, 2012

The first script I wrote for the API was more of a practice run really, but the results turned out to be rather interesting.  The goal was to count the number of games played in each skill level (Normal, High, and Very High, according to the client’s recent game search) on a given day.

The way I accomplished this was by running a search in a specific skill bracket over a narrow range of time, say an hour, to get the total number of matches that took place in that skill bracket.  I then shifted the search parameters back an hour until I had repeated the same search for all 24 hours.  Add together all the results and you have the number of games that took place in a day at that skill level.

One potential complication was that the API can only return a max of 500 results, so I knew that if I was ever getting 500 results I would be dropping matches.  To avoid this I adjusted the range size by skill level.  Very high worked fine with hour long searches.  Normal required a new search for every 2 and a half minutes.

The other complication I discovered later is that Normal serves as a sort of catch-all bracket.  It includes bot matches and all matches with less than 10 players, which I assume are matches with early abandons that don’t count for matchmaking stats.  My original stats did not take this into account, but I’ve since then established rough estimates on the average proportion of real Normal matches to bot and abandon matches.

The original stats were:

  • 103,799 – Normal
  • 17,702 – High
  • 3,652 – Very High

From my experience since then I estimate that 13% of Normal game returns are actually bot matches.  Of the remaining 87%, around 2% are abandoned games.  Adjusting the original results for this gives us:

  • 88,499 – Normal (80.5%)
  • 17,702 – High (16.1%)
  • 3,652 – Very High (3.3%)
  • 109,853 – Total

It’s important to keep in mind that this is a distribution of the games when what we really want is the distribution of the players.  However, it’s likely that the player distribution is fairly close to the game distribution.  There’s a couple of effects that might shift the game distribution away from the player distribution (differences in game frequency, group queues, and players near the border getting pulled up or down for matchmaking), but it’s unlikely that any of these add up to a swing larger than a percentage point or two.

So based off this, here’s some wild speculation

1. The current Dota2 skill brackets are 80/16.5/3.5 because they feel like the kind of numbers a human might pick.  This only reflects an approximate measure of the population percentages.  That is to say ‘x’ was chosen as the minimum rating for Very High skill because 3.5% of players tend to be above that rating (assuming a period with no rating inflation).

2. The previous Dota2 skill brackets were likely either 33/33/33 or 30/40/30.  This is why people who were formerly in the highest skill bracket because they were in the 70th through 80th percentiles can now be in the lowest.

3.  If High is the top 20 percent of players, then solo queuing into high ranked games is comparable to somewhere around a 1350 rating in League of Legends, if Riot’s description of their rating system is to be trusted.  Unfortunately I’ve been unable to find a similar statistic for HoN.

And finally, a caveat about these tests.  I was unfortunately only able to run it once before the recent API bug made it impossible to check against other days.  The day I chose for testing was the day after the release of Keeper of the Light, Nyx Assasin, and Visage, so the level of activity may have been unusually high.  Hopefully once the International is over and the API changes come out I’ll be able to run it again to see if the results are consistent.  In the future I’d also like to try to run the same test over multiple months to see if there’s any inflation or deflation in the number of high and very high skill games.


What is DotaMetrics?

August 20, 2012

Back in mid-July, Valve released a WebAPI for Dota2.  I decided to have some fun with it.

By the time of the API release, Dota2 had already had some pretty interesting websites and articles devoted to game stats, stands out in particular, but the downside was that all of these approaches used complete aggregation.  We could see that Treant Protector had the highest win percentage, but we couldn’t really isolate the factors that lead to this.  We could see that Ursa and Lycan were the best performing ‘carries,’ but are they just low level pubstompers or is there something more substantial to their success?

What interested me most about the WebAPI is that it had filters based on Valve’s skill ratings that could allow me to break down the results according to relative skill levels.  This allows us to create a statistical progression in hero usage rates, hero win rates, item popularity and much more.  It’s not surprising in the least that strategies change as the skill level increases, but now we can get a glimpse into that strategic shift without relying solely on our often misleading intuitions.

I should be clear that nothing that follows should be treated as evidence for “the right way to play.”  Building an item because of a number in a spreadsheet is just as dumb as building it just because you saw a pro do it.  Probably dumber.  All statistics can do is measure trends.  This measuring has a certain value, but if you genuinely want to be a good player you have to be the one creating trends.  Maybe a < 45% win rate is indicative of an underpowered hero, or maybe that’s just a hero that the general population hasn’t quite figured out yet.  In case that left any ambiguity, let me be completely clear: if you try to use any of my writings to throw your teammates under a bus, you’re a greater detriment to your team than any possible hero choice or item build.  That’s not what this site is for.  Don’t do it.

To end on a lighter note, I’d like to show some appreciation to winxp from the Dota2 Dev forums.  Without his Python WebAPI interface I may not have gotten started with any of this.