The Insignificance of Pub Stats, Part 1

The other day I came across Statistical Significance: The Value of Pub Stats, and I’m afraid I cannot let it go unanswered.  I don’t believe the author intends anything malicious, but he is breathtakingly wrong on nearly every point he raises.  And this isn’t merely some philosophical spat.  Bad statistics are poisonous to a game community.  They warp players conceptions of the game so that they chase higher stats at the expense of performing uncelebrated tasks.  They offer ammunition to our ceaseless desire to prove that everything bad that happens is someone else’s fault.  They create an implicit hierarchy along some aggregate number that no one understands but everybody trusts.  And finally they make us lazy.

Because I’m responding to an article, I’ll take the issues chronologically according to the article.  Hopefully this is convenient for those of you following at home.

A common criticism of tracking detailed statistics is that it provides ammunition for bad mannered. With the DotA community’s infamy for bad mannered players, many charge that lifetime statistics will lead to increased harassment.

[But] I believe DOTABUFF is right that flaming exists in gaming communities regardless of stats, and Dota 2 is no exception. In the end, the availability of stats does not increase flaming, it merely alters the content of flaming. (emphasis theirs)

The marginally more minor objection to this is that the availability of stats does increase flaming.  When someone instant locks a carry, they might get flamed.  In certain skill brackets it can feel like everyone believes that their teams pick too many carries, and so everyone they get matched with should know better and pick supports so they can pick their favorite carry.  It’s a frustrating environment and flaming will happen from time to time.

If you take the same environment and give players the ability to see the lifetime KDA (Kills plus Assists divided by Deaths) of every player on their team with a single click, you will see the pre-game flames skyrocket.  A big part of this is that ease of availability is really important.  Trying to load up a site where you have to type up every player’s name to get their KDR is not only tedious; it’s also tough to do while the pick timer is counting down.  And then when you bring it up in chat you have to admit you perform a manual background check on everyone you play with, which makes you look like a creepy, obsessive moron.  This serves as a deterrent, which is good because if you do this you are a creepy, obsessive moron and anything that discourages you from using that to ruin games is a big plus.

The bigger threat is that having stats available does alter the content of flaming.  If someone’s just ranting at the champion select screen about so-and-so picking a carry the rest of the team is likely to roll their eyes and know precisely who they should /ignore at the first sign of conflict.  But if you suddenly give this person access to the carry player’s MMR or KDA, you’re basically handing them a facade of credibility.  People don’t have the time or energy to exhaustively research every statistic waved in front of them, and if you let something like MMR or KDA checking become common you risk giving these practices a legitimacy they do not deserve.

The article goes on to say

Comparatively, educated players will understand that poor stats, such as a low total KDR, are not relevant to evaluating the performance of the support players in a match, and that a person’s career KDR is not useful metric of evaluation when they play a variety of heroes with different expected averages. These players will not be led to flame because they have statistics as ammo; they understand such criticisms are misplaced and faulty.

I mean no offense by this but educated players are so rare that they aren’t statistically significant.  If you polled every Dota 2 player the majority of them wouldn’t begin to be able to tell you the multitude of ways KDR is flawed as a stat.  The majority might not even be able to tell you what KDR is in the first place.  What it comes down to is it we’re all struggling to come up with mental models that help explain an incredibly complicated game, and any time someone creates some aggregate stat that purports to tell you how well you’re doing that stat immediately becomes a tempting shortcut.

But almost inevitably this metastat is a false idol, misleading us in ways that are difficult to notice due to their incredibly subtlety.  Even worse, these stats absolve us of the responsibility of having to perform that sometimes tedious task of trying to understand the game for ourselves.  We create this superficial metric and let that metric becomes an authority figure that establishes an implicit hierarchy.  Then we just accept it because it let’s us become comfortable in being able to pretend to know what we are talking about.  The only way to prevent this is to cultivate a healthy suspicion of statistics.  If you’re not constantly worried that every number in your spreadsheet might be lying to you, you’re not doing it right.

Valve is 100% in the right to not integrate statistics in the client.  Maybe some day when we actually have good, nuanced statistics for evaluating player performance that will change, but until then we should collectively admit that judging player performance is hard and that we’re incapable point to some single number somewhere to determine whether you’re doing a good job or not.

Anyway, this is longer than I expected and I won’t have time to finish it tonight.  Stay tuned for the future installments where we explain why KDA isn’t a reliable stat, why statistics can never completely replace watching replays, how we can work to eventually build better statistics, and why stat shaming to try to create a more competitive community is the dumbest thing ever.  Also, I’d like to mention that for all my complaints about the article, I feel that the sources quoted in the article are generally much more reserved and cautious about the potential misuses of statistics.  Reminds me of this.

Continue to Part Two


4 Responses to The Insignificance of Pub Stats, Part 1

  1. Tommy says:

    What would be a good rating system?? With your experience Im sure you would have a pretty close guess.

    Im a solo queuer with that plays in very high Bracket usually. However couple days ago i was dropped down to high bracket and games were just different. In very high games, like you mentioned, players intend to win early. In the first game of my drop down to high, i experienced a very quite and boring game. People sit in their lanes and farm. What I want to know is, how do you move up in percentile?? Is every 1 in high bracket close in percenitle? I always wanted to appear on the first page in the watch tab. But it is very difficult for me as a solo queuer.

    In my experience i learned that the matchmaking system is decent. tho players flam at each other from time to time.( players have their reasons for bad performance)even in the normal bracket players think they’re the best and every1 else in the same game is inferior to them hahahah.

    Just a couple of days ago i started to queue with a team. of players from high and very high bracket. and I noticed that it was easier to win.

    I just want to know that is win rate important to move up high in the percentiles? also are players in very high close in the percentiles.
    and it be awesome if you could analize my stats. ID:canagh2

    I enjoyed reading every 1 of your articles, I hope you’d write more :P

    • phantasmal says:

      Part 2 is going to address a lot of that (evaluating the worth of stats and rating systems), and going in-depth into it here would be a bit much for a comment section. But one quick preview I can offer is that if we want to be able to analyze specific games or short series of games (like a pro team’s games during 6.77 or your own personal carry games within the last week) then we really need to start using time sensitive stats. Dota is a game with pronounced very pronounced feedback loops, which is a great quirk of its design but makes establishing statistical causation really tricky. For a really simple example, any kill you get in a game after you’ve established a huge gold lead will tend to be less valuable than the kills you got when the teams were even on gold. On the other end of the spectrum, the CS you get in the first 10 minutes is more valuable than the CS you get after two of their barracks are down.

      Unfortunately, these kind of statistics aren’t readily available right now. Because of this, I mostly stick to big aggregate statistics in hopes that these unwanted effects get watered down over a large enough sample size.

      For personal evaluation right now, the best thing to do is just watch your own replays critically. One example of a drill you could run is pick Nature’s Prophet, do the typical jungle thing, but make a personal point of moving your camera to the lanes every time you start a new camp. Keep track of the dynamic of each lane in your head. Who is building what, which lanes tend to push, which lanes are taking harass. Skip Hand of Midas for an early Medallion of Courage and focus on getting as many teleport ganks as possible in the first 15 minutes.

      Then once the game is done, go watch the replay. Are there any kill opportunities that you missed? Are there ganks you attempted where you could have prevented them from escaping had you positioned differently? Were there ganks that backfired, and if so did you miss the warning signs on the minimap? Yeah, it’s a lot more work than just running the numbers, but for personal improvement it’s going to be a hell of a lot more effective.

      As for the brackets and percentiles, yeah, outside of the start of an account the most important thing is winning more consistently.

      The three biggest elements to winning more consistently in lower level pubs are improving your farm rates, swapping lanes regularly and early for ganks and counterganks (which includes good TP use), and recognizing which of those factors your team will need more during hero selection and picking accordingly.

      This is purely speculative, but I suspect that Very High games tend to be wider in both percentile and rating. The percentile part is because there’s less people up there so the matchmaking has to stretch more to make timely matches. Another factor is that premade stacks might be more common in Very High and can have percentile spreads larger than you’re likely to see in purely matchmade games, but the matchmaking mechanics do try to compensate for this. As for rating, Very High is going to be where the the most skillful outliers end up who are just exponentially better than even the best of the pubstars.

      This doesn’t mean that matchmaking in Very High is necessarily worse though. Normal games might have a much tighter percentile spread but normal ratings might also be more inaccurate on average due to housing the majority of the newer accounts that are still settling in to their rating. Normal players could also potentially have a greater game-to-game volatility of performance, which would be a pretty interesting thing to look into.

      And finally, don’t agonize too much over your bracket placements. If you’re playing AP, it’s preferable to lose a game and learn something than win for the 500th time with a strategy you could execute in your sleep. The real test will be 5v5 matchmaking, so in the meanwhile concentrate on preparing for that. While still having fun of course.

      And as a quick reply to TC, you’re touching on alot of the stuff coming in part 3. Particularly how we treat Elo and similar MMR stats in a public 5v5 environment.

    • phantasmal says:

      I just want to know that is win rate important to move up high in the percentiles? also are players in very high close in the percentiles.
      and it be awesome if you could analize my stats. ID:canagh2

      One quick thought about this before I head out for the day. I looked at and noticed that your character selection is a bit one dimensional. Very heavy on carries and hybrid carries/gankers. You might benefit as a player by taking a stretch of time to just outside of your comfort zone. It will likely sink your MMR a bit in the short term, but in the long run the versatility can pay off and the time you spent playing other roles will help give you a greater insight to what your opponents are thinking.

      I don’t have much in the way of specific suggestions since the possibilities are really wide open, but Batrider and Undying are two examples of very popular competitive heroes that you haven’t touched at all and can fit into a wide variety of laning formats.

  2. TC says:

    The problem with having statistics is that it is too easy to focus on statistics. Too many people look solely at statistics and nothing else. In a single player game, I might agree that that’s a fair way to assess a players ability; but in these team oriented games, there are too many intangibles that can’t possibly be measured.

    Yet I’d be worried about ignoring statistics completely. For example, back in Wc3 days, when it was difficult to see how well others farmed, a good game, I’d get about 100 creep kills; But now, 100 seems like a standard game. Statistics can help us understand what parts of our game can be improved. However, statistics also, as you mentioned, has harmful components. People get focused on things like KDR and place less emphasis on actions that are not measured.

    If the community can approach statistics with the maturity to recognize that statistics is a useful tool instead of a definitive result, perhaps it’d be better off. But it also would need to learn the humility to recognize that there needs to be continuous improvement.

    The above poster asks for a rating system. I’d argue there can’t be a good one. While I like concepts like ELO, that seems more applicable to single person competitions. A WAR (wins above replacement, ESPN) might be useful, but again, it focuses too much on tangible statistics. There are too many intangibles when dealing with teams that you can’t capture with numbers.

    Ty for the post, keep up the good work.

