Trends in Dota 2 Hero Usage by Skill Bracket

Last week I completed a big general sample of 10k/12k/12k games from the Normal, High, and Very High brackets.  It’s still smaller and less random than I’d like it to be, but it’s still useful for some things, the most obvious of these being hero usage rates.  Unsurprisingly, the trends are very similar to the ones I found back in 6.74, but it can’t hurt to rehash a bit.

Let’s go over the basics first.  I define a hero’s usage rate as the percent of games that hero was played in my sample or (Hero Uses / Total Games).  Because there are 10 heroes per game, the sum of all the usage rates in a given bracket is 1000%.  If all 98 current heroes were played evenly, every hero would have a usage rate of ~10.2% or (1000/98).

I use some pretty basic linear regression to create a trend-line of each hero’s usage across the three skill brackets as defined in-game.  To be honest it’s a bit of a lazy solution, but it’s a good enough way to distill what’s going on into a single, rankable number.

There’s two different measurements.  The first is just a trend in pure percentage points.  If a hero goes from 30% usage in Normal to 20% in High to 10% in Very High, the percentage point trend will be -10.  Only one hero actually managed to hit double digits in this measurement; see if you can guess who it was.

But while this percentage point trend has its uses, it’s not great for measuring popularity shifts because a relatively small movement in the usage rate of an already popular character can completely drown out relatively larger increases in unpopular characters.  For this I have a second trend that measures the percent increase in usage based off their least popular bracket.  If a hero goes from 2% in Normal to 4% in High to 6% in Very High, this is a 100% trend.  The same in reverse, 6% to 4% to 2% would be a -100% trend.  The following tables have been sorted using this relative trend.

So let’s start off with the heroes whose popularity falls off the hardest the further you get in the skill bracket.

[Usage]Bottom

If you guessed Drow Ranger earlier you were correct.  In terms of net percentage point change Drow was completely unmatched.  She goes from appearing in over 1/3 of all Normal games to being in barely over 1/10 Very High games.  Curiously though, nothing I’ve seen indicates that Drow is significantly less successful in Very High than she is in Normal.

One funny fact about the Percentage Points is that the largest shifts are almost exclusively in this top 10.  The top end only features 4 characters with a shift larger than 4% and the highest positive shift is 6%.  Another way of looking at this is suppose you have 100 games in Normal and Very High using my sample’s statistics.  In the Normal games, these top 10 heroes would make up roughly 187 of the 1000 possible hero slots so that you’d expect to see two of these heroes in every game you play.  In the Very High games their total slots would drop to 76, so you’d effectively see one less of these 10 heroes per game.

As for the relative shifts, it’s unsurprising that this list is mostly composed of characters with very simple aggression (minus the somewhat inexplicable Razor) and is very carry heavy.  Under this sorting, the highest ‘support’ is Omniknight at 19.  If we restrict ourselves to more traditional supports, it’s Lich at 25.  Ogre Magi is at 33, and by that point the usage trends are nearly flat.  This bottom 35 only includes a total of 5 intelligence heroes: Zeus, Death Prophet, Lich, Outworld Whatever, and Ogre Magi.

Going back to the top 10, they’re mostly not terrible picks for lower level play.  Aside from the obvious Drow, Spirit Breaker, Zeus, and Viper are all actually pretty solid picks right now.  Huskar and Sniper aren’t great, but what I’ve seen suggests that they do much better in lower level play than they tend to do in more skilled games.  The really obvious sore thumb is Phantom Assassin who simply requires way more farm than the average player in this bracket is capable of delivering.

Now let’s move on to the positive end of the scale, the heroes whose usage surges in High and Very High play.

[Usage]Top

Look at that sea of blue.  Unsurprisingly, this list is the total inverse of the last, loaded with intelligence heroes, difficult to use heroes, supports, and initiators.  Rubick, Windrunner, Nature’s Prophet, and Invoker see the largest net change, but none of their shifts are even half of Drow’s.

Really there’s not a lot to say about this end of the list.  I suppose some other interesting inclusions that didn’t quite make the top 15 are Clockwerk at 16, Tinker at 17, Nyx Assassin at 19, Storm Spirit at 20, and Gyrocopter at 25.

If there’s demand I can release some kind of a web version of the data set, but it’s kinda a pain to make and historically they haven’t been used all that much.  We’ll be coming back to the sample over the next few weeks, including an upcoming post later this week that revisits the topic of the size of the Dota 2 skill brackets, which means we’ve finally come full circle in a way.

Update: Raw Data Available on Google Drive

Related: Using Hero Usage to Estimate Bracket Size

Also a long-time reader has created an Infographic if you’re the visual type.

 

About these ads

15 Responses to Trends in Dota 2 Hero Usage by Skill Bracket

  1. jimmydorry says:

    As a potential enhancement if you do this again, it would be nice to see how these usage statistics correlate to win rate. Something less popular in higher tiers may end up being more effective, because it is used less.

    I would appreciate being able to peruse the summary data in a google doc, if it is not too much bother. It will be immortalised and sort-able. =P

    I was expecting to see Meepo usage rise fairly steeply, and Wisp’s usage to rise more than it did.

    It’s interesting how far the meta shifts between tiers of play. Myself being a normal player, have played in high a few times. My favourite heroes when I first started were PA, Bounty, and Sniper. It only took a few games of slightly higher play to realise the massive downsides to PA and Sniper. *_* If only I could stay in the worst tier, but with a team that does not make me want to pull my hair out… I would happily spend my time pub-stomping.

    Another interesting experiment for you to re-visit, would be the ideal team compositions analysis. There is currently a lot of theory and experience used to outright determine if a team will be successful or not, but we may find some interesting quantitative data to refute this. Your last analysis revealed an interesting trend… but at the time seemed to be hampered by lack of quality data.

    I would suggest breaking down by in-game role (ganker, support, semi-carry, etc.), hero type (agi, str, int), attack type (mellee, range, hybrid), team (radiant, dire).

    Depending on which language you use to get the data from the API, I may be able to lend my assistance in potentially doubling your sample size. If you can collect the data in an autonomous fashion, my server that runs 24/7 may be of use.

    • phantasmal says:

      I have win rate statistics, but I’m not going to use them as extensively as I like because I have reason to believe they have significantly more error. It’s not the end of the world, but I’m hesitant to promote them as much because some of them have clearly crossed to line between “somewhat inaccurate” to “outright misleading.”

      As for sample collection, it actually might be possible that we could vastly increase the sample size, but it’s tricky.

      1. Normal/High/Very High data is only included in the GetMatchHistory API. Unless something changed while I wasn’t watching, this API won’t allow you to set date searches smaller than one complete day. That means that for any given search, you can only get 500 matches a day max. If this has been fixed or if there’s a direct way around it, I don’t know about it.

      2. Theoretically what you could do is use a timed script and the GetMatchHistory API. 15 minutes after the start of a GMT day, get the first < 500 matchIDs for that day and put them in a list. Every 15 minutes run the search again, append any new IDs, and stop the search as soon as you hit an ID you already have. Wait 15 minutes and run it again. Continue until you've collected all the High and Very High matches for your chosen time period. 15 minutes is a guess though. Really you need to have timing increments small enough that the number of High matches in that increment never goes above 500.

      3. With a list of all the High and Very High matches in a time period, you then use GetMatchHistorybySequenceNum to get all the match details from that time period. You have a complete list of High and Very High matches, so everything unlisted should be a Normal match, Bot match, or Abandon match. Theoretically, you now have a complete match listing by skill bracket.

      4. Possible snags are perhaps GetMatchHistory doesn't update perfectly chronologically so that some new matches might end up placed earlier than matches you already have and get skipped. I don't know enough about how it behaves in realtime to eliminate this possibility. It's also potentially call intensive, but maybe not as bad as I originally thought. If you can get both High and Very High done in 15 minute intervals, then you're only looking at 40 calls ((500/25) * 2) max every 15 minutes. Most of the calls will be to bySequenceNum, which is way more efficient than what I'm using now and can be done at your leisure since there's no time constraint.

      It's an idea I've been playing around with anyway. Get a couple of complete days and you're probably looking at 20k Very High matches, ~4 times that for High matches, and a couple hundred thousand for Normal. Certainly better than getting 10k/12k/12k out of most of a month.

      • jimmydorry says:

        I haven’t coded anything up for the API yet, but what I would do is use getmatchhistory (http://dev.dota2.com/showthread.php?t=58317).

        getmatchhistory is ordered by ID, so this means that the results it returns will mostly be abandoned and left games. This is quite a problem if you wanted to view the most recent data, but is a non-factor for all data that is atleast 1.5hours old (most games finish within an hour tops).

        You can then use the StartAtMatchID parameter, or worst case… date_max parameter.

        You can run a search using a date_min that is a day or two old, and then work back from there, for however long you need. You can also filter by game_type and skill… and I can only hope that it has been implemented properly. If I recall, the biggest gotcha is that all games that are not high or very high fall into normal. So that means there is a massive proportion of abandoned and bot games that are “normal”. There was a limited number of check you can run, but as dotabuff has found… you can’t filter all of these trash games out.

        I would almost prefer if we could just get games as they were recorded, instead of by sequential ID.

      • phantasmal says:

        GetMatchHistory is what I use.

        start_at_match_id is great but it’s only functional to a point. Every set of call parameters returns a 500 match window. As far as I know, start_at_match_id let’s you move around that window, but if you want to get a match before or after that window, start_at_match_id won’t take you there unless you change your parameters to define a different window.

        date_min and date_max were great for moving that window around back when the API was first released, but since a certain point back in July/August of last year any value you use “rounds” to 0:00.

        For example

        https://api.steampowered.com/IDOTA2Match_570/GetMatchHistory/V001/?key=&skill=3&hero_id=5&date_max=1360195200

        The first match it returns is
        “match_id”: 118636000,
        “start_time”: 1360195172,

        so let’s use 1360195171 as our date_max

        https://api.steampowered.com/IDOTA2Match_570/GetMatchHistory/V001/?key=&skill=3&hero_id=5&date_max=1360195171

        First match it returns is again
        “match_id”: 118636000,
        “start_time”: 1360195172,
        which shouldn’t be included since the start time is greater than the date_max setting.

        I first realized what was happening when I started plugging in the start_times of the first entry of my searches and found that they consistently began at ~23:59 GMT. The only way to explain it is that any date_max I use rounds to 0:00 GMT of the next day.

        If there’s a way around this, I don’t know it. I suppose the Sequence API is the way around it because Valve wants people to cut down on excess calls, but it doesn’t have skill level data so it’s of limited use to me. The only real solution I have is trying to use new searches every x minutes to get a complete list of high/very high games for the time period and then filling in the skill information of a Sequence API match collection myself. Until then it appears I’m limited to a 500 match window per hero per bracket per day. If you or anyone else knows something I’m missing about the GetMatchHistory API I’d love to hear about it though.

      • jimmydorry says:

        Yea… Unfortunately it is what it is… and as is, it is some what restrictive.

        They are asking for people to slam the API with excessive calls, so there is no way around it, other than to slam it.

        I left out mentioning the date thing as I assumed you were aware. I would only use it to roughly find a starting match, and then work backwards from it.

        The sequence would too I guess… but involves quite a few excess calls. Up to you I guess in how much effort you want to spend bending around backwards to get the data.

        I am crossing my fingers hoping they make some significant improvements soon.

      • phantasmal says:

        What do you mean by “work backwards from it”? If it’s just day-by-day, that’s what I’m doing, but it means I’m limited to 14,000 matches per bracket for the month of February. It’s not the end of the world, but it would definitely be nice to get something like a 50k/100k sample from a period under a month so that it could be all from the same hero pool/patch period.

        As for Sequence and excess calls, I admit I haven’t played around with it so I don’t know how many trash matches you have to filter, but unless I’m terribly mistaken the filtering takes place entirely on your end. If only 100 of every 1000 are well-made High/Very High matches it seems like it would be more call efficient than what I’m doing now. The only trick would be getting the list of High/Very High matches in the first place.

        Currently it takes me 1.04 (1 + 1/25) calls per match. If you do a live match collection for every bracket every 30 minutes you’re looking at ~40 calls per hour to collect 1,000 games per hour per bracket, keeping us at the 1/25 ratio. So long as you average 10 matches per 1,000 Sequence matches you’re maintaining the same ratio of match details to calls, but getting 24,000 games per day per bracket instead of 500. As long as there aren’t 7.2 million matches a day you should be able to beat that 10 per 1,000, so Sequence should be significantly more efficient even under these awkward circumstances.

        But this is all blatant speculation, so if I’m off about something let me know.

      • noxvilleza says:

        Great article, nice insights.

        Why don’t you just poll more aggressively? I think the Valve API guidelines are 1 call per second? I recall pulling at close on that for 2 months grabbing all the historic data in an attempt to work out relative hero synergy and counters. And then the API broke. :)

        It would be nice if someone grabbed all of the games per month and provided as a single data dump, or just used that data and provided a more powerful API of their own, possibly with filtering, a custom Dota Query Language (such as “winrate of drow_ranger in matches between ’24/2/2013′ and ’28/2/2013′ “). Could use a java CC parser to convert to raw SQL.

      • phantasmal says:

        To get Normal/High/Very High breakdowns I need to use a version of the API that is limited to 500 requests per search. This wouldn’t be a problem, except that the date_min and date_max settings for this API won’t accept limits of a less than a day. The Sequence API doesn’t have this limitation, but I would be giving up the Normal/High/Very High breakdown, and I prefer to have that skill bracket info (inexact as it might be) in order to isolate behavioral trends.

        I’m considering some workaround methods, but they’re a bit more complicated than what I’ve been doing, and I’m not sure if they’ll pan out.

        While I’m here, thanks for the approval middle row qwerty man!

      • noxvilleza says:

        Have you considered asking Valve to adjust their API? On the dev forum? I know the WebAPI is quite a low priority for them, but maybe they’ll be nice. Would be nice to add region and skill to GetMatchDetails.

        Yeah, that’s quite annoying. I was just doing a simulation on a modified Elo rating to rate players (and hence teams), providing significantly more accurate ratings than (Normal/High/Very High). That way I could build models and regressions on just the top percentile of matches.

        What you can also do is work on the assumption that the teams in public matchmaking are equally (skill) matched, and then work from there to build models.

      • phantasmal says:

        I’ve mentioned the issue in an API request thread, but yeah, they have more pressing issues and I’m just happy to have the thing back. It’s also entirely possible that in aftermath of DBR they don’t want to draw attention to their own skill rankings, so hopefully I’m not stepping on any toes here.

        I feel that just assuming that teams are equally matched isn’t good enough. Evenly matched isn’t even really that big a deal. There are certain controls I can add if I want to eliminate matches that don’t look particularly even. Low level games just behave differently than high level games. This difference in and of itself is interesting, but besides that, let’s say I want to look at the farm dependency of Nature’s Prophet. If I just took a GetDetailsByMatchSequence dump of a day, it would appear that Nature’s Prophet is actually rather farm dependent, because ~84% of the legit matches will be Normal matches. When I can break it down by average player skill level I found in the past that his farm dependency (or farm benefit depending on how you want to look at it) drops significantly in High and Very High matches. So if I want to model farm dependency by hero I have to create some kind of separation on skill level or else the Normal results will drown out the more interesting results.

        @jimmydorry — I played around with their 2011 dump when the API was down, and it was missing too much skill and match information. I think they were trying to fix both, but if there are still big skill gaps it might be too problematic for me to work with. If the missing matches are fixed it’s still pretty good for someone to create a DotaBuff-esque database without having to record all of 2012 themselves though.

        As long as it’s complete I might use it when/if I want to go back and do some 2012 specific testing, but to be honest I didn’t enjoy getting Python and Postgres to place nice with one another. Entirely possible I was just doing it wrong.

        @max1c — Item Usage is one of the things coming up. I have to test me old code to see whether I need to make any adjustments first, and I’ll probably do so by looking at a couple specific items first before recreating the entire item listing.

      • jimmydorry says:

        Unfortunately no one is making dumps of the latest data available… but here is a link to a dump of all the data from the start till Dec 2012.

        http://dev.dota2.com/showthread.php?t=74438

        Unfortunately it is a dump for PostgreSQL.

  2. asdasdad says:

    dat amazing work

  3. max1c says:

    Hey, could you please do Item Usage by Skill Bracket. Me and I’m sure a lot of other people would be very interested in this.

  4. Nedrapter says:

    The spreadsheet values of usage rate on normal games are less by a magnitude of 10. I guess this is some mistake.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 168 other followers

%d bloggers like this: