The Death of Dota 2 Statistics Has Been Greatly Exaggerated

There’s been a lot of misinformation going about DotaBuff’s closing (and current potential not-closing), the privacy setting, and what it means for statistical collection in Dota 2.  I don’t claim to have all the answers, but here’s the facts as I know them to be.

You can divide DotaBuff into three features:

  1. Everything that existed before the unveil of DotaBuff Rating and DotaBuff Plus
  2. The new features unveiled with DotaBuff Plus, i.e. spirit bear items, skill builds, hero win rates by DBR, chat logs, etc.
  3. DotaBuff Rating

Let’s take them one by one and see how the privacy changes affect them.

1 is easy in that the answer is “not at all.”  You can check CyborgMatt’s mock-up for yourself.  Virtually every stat DotaBuff aggregated is available through the API.  You won’t be able to see the names of private players in a match, but every match is still available so you can still have hero win rates and item usage rates.  You won’t be able to see your personal stats and match history without disabling your own person privacy settings.

2 is a bit trickier, at least at the moment.  As far as I know, most of these features were created through replay parsing.  Anyone could theoretically do this, but DotaBuff’s replay parser is likely significantly more advanced than any other parser outside of Valve.  Parsing through hundred’s of thousands of matches per day is also a rather extreme feat that should not be trivialized.

Aside from this, you previously could grab match replays through an entry in the API which appears to have been disabled.  Without access to this DotaBuff would no longer be able to do any match parsing.

My assumption is that this has been disabled until Valve can find a way to make replays mesh with their new privacy policy.  I don’t personally know the details of what information replays store, but this post suggests that previously you could use replays as a way to bypass the privacy setting.  I expect this functionality will eventually be returned which will allow for all parsing features to be replicated.

One other twist is that there were a few new DotaBuff Plus features that were dependent on DBR, specifically the hero win rates and skill build by DBR bracket.  These features can be replicated using the in-game skill brackets that are available through the API.  Admittedly, these are match measurements rather than player, but the difference is relatively negligible.

3 is absolutely broken.  Unless something drastic changes or something new comes to light, it will stay broken.

So let’s look at this from the perspectives of DotaBuff and Valve.

DotaBuff came out with DotaBuff Plus.  On one hand, their operation is pretty massive, and they need some way to offset expenses.  On the other hand, there’s no reason to believe that this was a non-profit organization.  Ultimately, I see no reason to laud or impugn their motives.  Ultimately it’s just business.

DotaBuff Plus came with a lot of features, but the shining star was clearly everything related to DBR.  Regardless of your personal opinions on a visible MMR, it was far and away the most capable feature of driving subscriptions, and DotaBuff likely knew this.  It was also the feature that future competing stat sites would not be able to compete with.  Having it killed off puts a big damper on their financial plans, and however you feel about those plans, killing them would put a damper in being able to cover their server expenses if nothing else.

That being said, they don’t appear to have thought of any contingency plans for if/when Valve took issue with DBR.  A bad move on their part because that when turned out to be “very quickly.”

The obvious question is why was Valve so quick to respond?  DotaBuff Plus and DBR get announced and within days Valve suddenly announces a privacy setting AND the re-release of the API.  Both things have likely been in the works for a while, but the timing can only be seen as a response to DotaBuff’s announcement.

One reason suggested for this is that DotaBuff Plus in some way is the culprit, either that Valve didn’t like the attempt to monetize or that it somehow put them in an undesirable legal position.  I can’t really say much about this explanation, but in the absence of evidence I don’t find it very compelling.

Other people point out that DotaBuff was using a backdoor method to get their information.  Yeah, but it was basically an open secret that their stat aggregation wasn’t entirely on the up and up since before they were even called DotaBuff.  Valve had to have known what they were doing and neglected to do anything about it until now.  This explanation might serve as a pretext for Valve’s actions, but I don’t believe it is what drove them.

My preferred explanation is that Valve feels they cannot allow a public MMR to exist that they do not tightly control.  DBR broke that rule, and Valve transcended Valve Time and broke DBR.  So why is Valve so controlling when it comes to public MMR?

1. They want to tightly control how and when any kind of ratings get displayed in order to prevent poisonous behavior

Yes, I called DBR “relatively innocuous,” and it’s certainly less toxic than the way HoN handled ratings, but I can understand that weak assurance being not good enough.  There was nothing stopping DotaBuff from changing their policies in the future, so cutting off their access was much safer.

2. DBR undermines their plans for the future development of the game

Some have said that they have no problem with All Pick being a serious competitive environment.  Fine, there’s no accounting for taste.  What’s not ok is the primary competitive outlet for a game having absolutely no party size requirements or restrictions.

Some people claim that stacking isn’t actually a good way to boost your win rate.  There’s a kernel of truth to that, but only a kernel.  In reality, stacks have a built-in matchmaking handicap that compensates for their (assumed) superior coordination.  Many groups can’t quite live up to this handicap and experience sub-50% win rates.  Meanwhile, there’s a relatively small number of groups that can easily outperform the handicap and maintain massive win rates.  As of now this isn’t that big of an issue.  Group handicaps are likely a work in progress, and no system will ever be perfect.  If you suddenly turn the main matchmaking queue into a quest to get the highest DBR you’ll spur plenty of competition, but the most productive outlet for this competition will be to come up with a group design that best games the group matchmaking handicap.  This is a completely unacceptable state of affairs.  Any matchmaking with a visible MMR should either be 5v5 or solo queue.

Aside from that, one thing most people neglect to consider is that every matchmaking option is essentially in competition with all the matchmaking options.  The quality of your matches is dependent on how many people are looking for a match at the same time, and every alternative game mode will be drawing people away from your queue.  A visible personal rating is a feature that will draw many people to a mode, and having it in general matchmaking makes it significantly more difficult for Valve to test future matchmaking additions in a controlled environment.  Along these lines, I stand by my previous position that public matchmaking with a visible MMR will compete for player attention with any ranked 5v5 mode.  5v5 is the harder mode to get running.  The barriers to entry are higher, and it becomes very difficult to keep low end play rates high enough to keep low end matches sufficiently balanced, which in turn erodes match quality and leads to further low end player loss, etc.  5v5 needs to receive priority attention before any kind of solo matchmaking with a visible rating is implemented, and DBR interferes with this.

3. DBR could represent a potential security risk

The joke goes that Valve doesn’t want DBR because it would expose the fact that their matchmaking doesn’t work.  Besides the typical D-K Overdrive conspiracy theory, there could actually be some truth to this.  Valve presumably does not want its competitors knowing how its matchmaking system functions.  DBR provides a list of matches along with a relatively generic rating system.  If you find matches that don’t make sense within DBR, you could potentially use them to infer how the matchmaking system actually works.

I honestly find this one to be a bit of stretch.  1 and 2 are sufficient alone, but I guess it’s possible that DBR represents a potential security break in the future.

In any case, where we stand is this.  All the stats we’re used to are still there.  MMR is back to being completely hidden instead of partially, imperfectly revealed.  DotaBuff may or may not survive, but even if they go down the competition now has all the tools to replicate what they’ve done until this point.  That being said, no one is close to the scale of DotaBuff, so losing them would still be a loss in the short term.  I personally feel that Valve’s implied stance on public MMR is justified regardless of the implications.


7 Responses to The Death of Dota 2 Statistics Has Been Greatly Exaggerated

  1. Factory says:

    1) Isn’t going to work, unless valve allow db (or any aspiring website) to have unlimited logins or removed the requests per minute limitations. Given that they turned off the api in the first place due to overuse, I doubt valve are eager to do this.

    2) I could see valve taking issue with this, the replays are fairly large, and the bandwidth costs must be massive.
    OTOH IMHO the 3 skill brackets are a bit too coarsely grained for good statistics. (at least that was my experience with 2011 data)

    • phantasmal says:

      If Valve didn’t want to foster the creation of stat tracking websites, they wouldn’t have bothered making the API in the first place. They’re aware of the growth rate of Dota 2 matches, and they know how many calls it would take to keep up with that growth rate. I don’t feel that they would roll out the API again ~7 months later only to be shocked by the same issue that caused them to take down the API in the first place. Also, I have it on good authority that further API updates are coming out ‘soon,’ so that will be a better time to evaluate the potential limitations of the system.

      We’ll see how replays develop. I think the current issue is that they break the privacy setting, but maybe we’ll run into a problem if multiple stat sites are trying to do the mass replay aggregation that DotaBuff was doing.

      I’m not convinced that the in-game skill brackets are significantly worse than the divisions DotaBuff was using. Very High seemed roughly comparable to Diamond, and High to Platinum. Normal is still the wild, wild west, but I’m not too torn up over that. There’s also the issue of players getting pulled up a bracket through premades or to match premades, but it’s not something that keeps me up at night.

  2. achiko says:

    Hey, another amazing article :) One question though, would you care to explain what the D-K Overdrive conspiracy theory is?

    • phantasmal says:


      It’s just me trying to come up with novel ways to refer to the Dunning-Kruger effect without running it into the ground.

  3. Decoud says:

    Nice article. Regarding stats category 1 (everything prior to DBR), I believe that will be replicated and available at some point. The API forum on dev.dota is buzzing with activity and there appears to be a lot of people at work on this. I think there will be a lot of good sites coming out in the next few months and the influx of competition will produce some neat things. In my mind, the only thing that could prevent this is Valve turning the API off.

    Another concern is whether enough API calls can be made in order to effectively keep up with the new data, but a little cooperation between the numerous people accessing the API could solve that.

    Regarding stats category 2, we’ll have to wait and see. If the ability to download replays is never reactivated then the community is going to lose a lot of useful data from parses. This will be a significant hit to the Dota 2 statistical community and those seeking to become better players.

    For example, on Reddit today someone created a topic asking what a good GPM is. With a site that downloaded and parsed replays, that poster could theoretically go to that site and see what a good GPM is with filters for hero, skill level, and minute of game time. The site could also have the gold by source so the poster could theoretically see what type of farm he is not taking advantage of.

    Theoretically if his GPM on a 25 minute Anti-Mage is subpar, he could see that he is not getting enough jungle or ancient gold (compared to the averages) and alter his play style accordingly.

    Dota 2 statistics are not dead by any means and we’ll soon, in my opinion, have access to an adequate amount of useful stats. However we’ll have to wait and see if that adequate amount of stats becomes something extraordinary.

    • phantasmal says:

      One thing I would suggest is to encourage people trying to evaluate carry performance to look at creep score per minute rather than GPM. GPM can be heavily influenced by kills and tower gold, but getting early kills and tower gold as a carry is often thanks to factors that are beyond your control. Your last hitting and eventual jungle efficiency are much more directly under your control, so they make a better metric for farming. Of course, game circumstances might prevent you from being able to farm, but it’s better to say “Here my CS/min fell behind but it’s ok because I needed to teleport to the opposite tower to help my team win a team fight” than to implicitly say “In this match I screwed up a lot of easy last hits but I made up for it in kill and assist gold.” The way you treat exceptions in a CS/min model tends to encourage good behaviors whereas the GPM model tends to excuse bad behaviors.

      This actually reinforces your point because we would need replays to know something like average CS/min by 10 minutes in the Very High bracket. And you already point out that replays could allow us to separate lane gold vs jungle gold which could lead to some cool things.

  4. Troy Barnes says:

    The last section point 3: I don’t think it really matters if Valve’s MMR system is flawed (it is flawed). The problem, as the article says, is people thinking their MMR system doesn’t work when in reality it’s just flawed for the convenience of their players.

    Many players will stop playing Dota2 if they have to wait 10 minutes to find a game which will pairs you up with people of your exact level of skill (early in Dota2 beta, remember the pains of waiting 5 minutes for a matchup?). Valve will not be willing to pay that ‘cost’ if they can most likely find you a decently match up within 2 minutes.

    Ultimately, Valve’s MMR is flawed but that doesn’t matter – it’s good enough. The MMR system is not like Citibank credit transfers. Things also end up equalizing (~50% win rates) unless you’re Merlini.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: