Better Skill Build Analysis

With Skill Builds being the big new feature of the latest API upgrade, I thought I’d talk a bit about ways to use this information more effectively.  I don’t intend to touch on the technical hurdles of gathering and storing the information and that includes the technical feasibility of my own suggestions.  If something I suggest is impossible for some given system, then there might be a variant that is possible while still preserving the essential features.  For now, I want to focus entirely on finding a better way to categorize these skill builds for both display and research.

The best starting point is DotaBuff’s skill build system as theirs is the most developed public source that I know of.  To reach it go to the hero page, click on a hero, and choose the ‘Skill Builds’ tab.  For instance, here is Alchemist’s most popular skill builds.

I have two big complaints with these pages as-is.  The first is that it contains a lot of redundant information.  Let’s say I want to look at Ursa builds while researching my last articleWhat I find is that Earthshock first builds cannot be found.  I know it’s a niche build, but there may be enough samples out there to still find out something useful.  Unfortunately, DotaBuff only lists the top 10 builds, and most of those are slight variants of two major build philosophies.  Some of these minor variants might be important, but most really aren’t.  And either way, it’d be preferable if the main page would list the major build styles and then have links that go to the minor variants of each build style.  Solving this requires developing a way to group similar builds together, and preferably one that can be generalized across all heroes (perhaps with the exception of Invoker).

As for the second complaint, look at the numbers on any hero page.  The total build rates of the top 10 rarely ever add up to over 20%, and all the win rates are 10-15% higher than the base win rate for the hero in question.  What I believe is happening is that DotaBuff is only counting the skill builds of players that reach level 18.  This inflates the win rates of the builds because heroes that hit level 18 or higher are more likely to be a higher level than the opponents compared to heroes that end the game somewhere between level 1 and 17, and typically the winning team will have a higher overall XPM (they’re also less likely to have lost a game due to an early abandon).  What this highlights is we want a system that can categorize a skill build well before level 18.  We’d also need to design our system so that we can control for certain factors in the way we measure and display results.  As an example, if I’m right that DotaBuff only uses level 18 skill builds, the page for each hero should display that hero’s win rate among all eligible matches.  That is, it should display the hero’s win rate in all matches where they end the game at level 18 or higher.  Given our limitations, this provides a better benchmark for judging these skill builds than just comparing their win rates to the global win rates of the hero.

With these challenges in mind, what I propose is we categorize skill builds by there skill priorities between level 7 and 9.  This allows us to include most heroes in most games that last at least 20 minutes, and in my opinion these levels contain the information for the most important skill decisions in the vast majority of skill builds.  My proposed divisions are Single-Skill Priority, Double-Skill Priority, and Split-Skill Priority.

(For upcoming reference, my skill build listings will be something like 4/2/0/1.  The first three are Q/W/E in any order.  The last is R.  At this stage we have no particular reason to distinguish between ‘Max Q, Ignore E, Rest of Points in W’ and ‘Max W, Ignore Q, Rest of Points in E.’  We can generalize the process to work for any variation.  If it comes up I will use +X at the end to indicate early points in Stats.)

Single-Skill Priority is probably the most common skill build and represents every skill build that maxes one of its skills by level 7 (or that maxes one skill before any other skill gets 3 points or before both other skills get 2 points).

Double-Skill Priority is any skill build that has two skills with 3 points by level 7 (or two skills have 4 points before the third skill has 2 points)

Split-Skill Priority is any skill build that has no skill with 3 points by level 5 (or every skill has 2 points before any skill has 4).

It should be noted that these distinctions are not exhaustive.  There are some potential disagreements (3/3/0 becoming 3/3/2 for example), and they might not actually catch all skill builds, but they should handle the vast majority.  Any major variants that fall outside of these rules can be integrated through adding additional rules, but I’d like to see how the most simple rulesets perform before any adjustments are made.  For the sake of completeness I included some potential adjustments inside the parentheses.

What we’re left with is 7 major categories.  3 variants of Single-Skill (Q/W/E), 3 variants of Double-Skill (QW, QE, WE), and a sort of other category with Split-Skill Priority.  What we’d expect to find is that not all 7 major categories are necessary for most heroes.  For example, Single-Skill E Skeleton King is almost certainly statistically negligible.  I’m not going to specify any particular criteria for eliminating a category as statistically irrelevant.  We want to be as concise as possible, but we also should be extremely interested in detecting rare but successful builds.  Whatever criteria we use should try to create a balance between these two concerns.  In any case, once we have eliminated the useless variants, we could create a basic pie chart that describes the distribution of each heroes’ skill build preferences.

To take Ursa as an example again, I’d expect to find Single-Skill E to be the most populous build followed closely by Double-Skill WE and Single-Skill W.  Single-Skill Q should have a small but significant population.  Most of the other variants will likely be statistically negligible.

With the major categorization out of the way, we now want to break each major build into its variants.

Single-Skill will tend to have one of three early structures: 4/2/0/1, 4/1/1/1, and 4/?/?/1 + stats.

Each Single-Skill build category also has 3 late build philosophies.  For Single-Skill Q they are Max W, Max E, Split Between W and E.

I define Max W and Max E as any build that puts 4 points in one of these skills before putting 2 points in the other.  They each come in two variants, hard and soft.  The hard variant skips the third skill entirely, while the soft variant puts 1 point in the third skill at some point before maxing the second.  For example, if we had Single-skill Q — Hard Max W, that would be any build that starts 4/2/0/1, and eventually achieves 4/4/0/1.  Soft variants could begin 3/1/1, or they could start 3/2/0 and put the first point in their third ability somewhere between levels 6 and 9.

There’s also the issues of +stats and ult skipping.  In both cases I would ignore these until they happen enough within a major variant that they’re statistically noteworthy.  For some examples:

  • Single-Skill Q Juggernaut and Skeleton King should both have statistically significant +stats populations
  • Various farming carries (Anti-mage, Phantom Assassin, Medusa) might have early stat builds so extreme that they’re effectively Split-Skill Priority builds.
  • Various Leshrac Double-Skill builds have large amounts of ult skipping.
  • Tinker and Dragon Knight might have noteworthy amounts of ult skipping (or in DK’s case 2nd point delaying) in all of their builds.

+stats and ult skipping may need to be handled on a case-by-case basis.  What we’d like to do regardless is be able to measure the percentage of builds that include early +stats, late +stats in lieu of maxing a skill (think of Ancient Apparition’s Chilling touch pre-buff or the many Huskar builds that skip Burning Spear entirely), and ult skipping, and then also be able to measure those percentages within each of the major and minor build variants.

This is actually the big drive behind creating a grouping system for skill builds.  Sure, it could help improve sites like DotaBuff find a more effective way to display information, but the real benefit is so that we can separate two skill build groups or sub-groups and make comparisons.  What kind of things can we learn from these comparisons?  Let’s return to Ursa as an example.

One really cool thing we could do is track the usage rate of a skill build category by skill brackets.  For instance, if Single-Skill Q Ursa is a niche build overall, but becomes significantly more common in High and Very High games over the stretch of time following 6.75, we could be detect the emergence of a new build before it becomes public knowledge.  There’s really a significant amount of potential here.  Does Single-Skill Q + Stats Juggernaut have a higher popularity in the upper brackets?  What about builds that max Healing Ward early either part of a Double-Skill build or a Single Skill Q with a hard Max W?  Do the variants of builds that feature primary or secondary maxing of Crit become less popular or less successful?

But let’s even go beyond that.  We can create statistical distributions for each group and sub-group and compare them.  Suppose we’re looking at Leshrac.  We know he gets played both as a support and somewhat more rarely as a semi-carry.  We can create a stat called ‘Relative XPM’ that represents Leshrac’s ending XPM / rest of the team’s Average XPM.  Games where this value is high, Leshrac was most likely being played as a semi-carry.  When it’s low, he was likely played as a support.  Then we create a Relative XPM distribution for each skill group to determine which builds are seen as semi-carry builds and which as support.

With a little more effort we can do something similar with Ursa.  Replay parsing can differentiate between lane and jungle farm, so we use our skill groupings to create replay samples of Ursa games in each of the four big categories (Single-Skill Q, Single-Skill W, Single-Skill E, and Double-Skill WE).  We can parse the replays to find out where each category tends to get most of its farm in the first 5, 10, and 15 minutes by basically creating a ratio of Jungle Creep Gold to Lane Creep Gold (and of course programming some error-catching for divide by 0).  This would tell us the percentage of the time each build gets used to jungle or lane.

From there we could also measure average jungle efficiencies of the builds or even the time-to-first-Roshan.  This could give us a much better idea of how these different builds compare in their efficiency at neutral killing than we currently have.  We could also use this to detect tactical outliers.  Suppose Single-Skill E on average beats Single-Skill W in Jungle Creep Gold in the first 10 minutes, but that there’s a small cluster of Single-Skill W replays that are comparable to the best Single-Skill E results.  We now have a collection of replays to watch we can watch find out why these Single-Skill W builds are outliers.  Maybe there’s an obscure tactic that let’s Single-Skill W compete that we could adopt to give ourselves greater skill diversity.  Or maybe it’s just a matter of luck with jungle spawns.  Either way, we’ve accomplished a lot by using statistical outliers to separate out interesting results.

The basic takeaway is that whatever system we use to measure skill builds, we want a system that supports a form of reverse lookup.  That is we should be able to take our list of matches and generate the distribution of skill builds, but we should also be able to describe a skill build category and create a slice of the total matches where every match features that skill build.

Data display is an entirely different issue (that I happen to feel a bit less passionately about), but the summary for that is as follows.

  • 7 broad skill build categories — 3 variants of Single-Skill, 3 variants of Double-Skill, and Split-Skill
  • Each broad category has sub-variants depending on how it behaves before 6, after 6, and whether it ult-skips or takes early stats
  • Remove the statistically insignificant categories and we can create a basic chart for both the broad skill categories and each sub-variant.
  • If necessary, we can then do the top 5 builds for particularly popular sub-variants.

And admittedly I have no idea whether this will all hold up in practice.  In fact I suspect much of it won’t.  I’m just presenting it as a possible starting point out of which a better system can evolve.


2 Responses to Better Skill Build Analysis

  1. Decoud says:

    I run a stats site for pro games and have been working on adding skill builds (currently I only have the skill builds broken down by the individual match, see link). Last night I was thinking about ways to show the aggregate skill builds and didn’t really come to any firm conclusions. The only decent idea I had was to show popular skill builds in segments: popular skill builds from level 1 to 6, popular builds from 1 to 11, popular builds from 1 to 16. This would partly alleviate the issue of doing comparisons when hero level can differ so wildly from game to game.

    It was a pleasant surprise to wake up today and see that you had written a post about this recently. I’ll try to incorporate some of your thoughts into whatever aggregate data on skill builds I prepare. In particular I like the distinction you make between single skill priority, double skill priority etc. Also, the reverse lookup would be something I’d really like to have. Maybe an option where you click on the “Single Skill E builds for hero X” and it shows a list of match ID’s and the main heroes stats for each of those games?

    The “rare but successful” will be tough to tease out unless the sample of games is very large and metrics for successful clearly defined. Unless a player does something completely off the wall, I doubt a skill build has a large effect on the overall win % in the small sample I’m working with (at least compared to the overall noise in the data). Looking at lane farm, neutral farm, last hits, ability effectiveness (stuns landed and damage done with spells) etc, could maybe form the beginning of analyzing success with a skill build. However the particulars of each hero, ability, role, and laning strat make this hard to generalize.

    Anyway, I’ll see what I can do and let you know how it comes out in case you have any feedback. For now I’m working on taking the API data and putting it in a table format I’m more comfortable working with and that is more conducive to analysis.

    • phantasmal says:

      I’m not going to respond too in depth because I’m actually in the middle of a pretty big prototype project that you’ll probably be interested in. Hopefully it will be out within a day or two.

      My approach is evolving somewhat in that I basically have two sliders that I’m capable of adjusting to create a balance between detail and maintaining sufficiently large sample sizes.

      The first is grouping method. The most forgiving but least specific is the skill priority system, and it seems to be working out ok. Besides that I also have been using clusters and pathing. Pathing is what DotaBuff uses, but in general it’s very limited because of sample size issues. Clusters is kinda like pathing except they don’t consider order, just however many points you have at a given level.

      The second is, like you touched upon, using the level to create a segment range. Shorten the range to create sufficiently large samples for each grouping. Lengthen the range when your sample size and grouping method allow.

      Right now, I’m using 8 for skill priority grouping, 5 for clusters, and I’m not comfortable going past 3 for pathing. But my current project only has samples of 2500-3000 games, so I’m a little more restricted than I’d like to be.

      As for “rare but successful,” it depends again on sample size. To measure success I’m using win rate. Ideally I’d like at least a 4 digit sample to feel confident about Win Rate, but I’m willing to consider as low as ~300. Since my total sample maxes at 3000, this puts me at ~10% as my threshhold for max rarity worth considering.

      Pro games are going to be a bit of a different beast. If it weren’t for an API limitation I could easily have a sample size in the 10s of thousands for every hero in a given patch period, but a pro game sample can never come close to that. I think you’re right in that you’ll need to use the replay parser to make more detailed statistics than what’s available in an API return. It better fits both the priorities and limitations of working with pro game replays exclusively.

      But yeah, give me a couple days and I should have a working model together that’ll better explain how I’m approaching this.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: