Going to interrupt your regularly scheduled programming for a bit. Most of my hits seem to be driven by a bracket size analysis I did way back when, so I feel the need to clarify my position and its extent before it gets telephoned too hard. To do this, we need to talk a bit about matchmaking. Matchmaking is also a hot topic in the fetid swamp that is http://dev.dota2.com/forumdisplay.php?f=30 so there’s some extra topicality.
I’m going to keep this a non-technical analysis with as little math as I can get away with. I cannot prove most of what I’m about to write. In reality, no one besides a couple people in Valve knows the details. I’m just reasoning backwards from a general idea of how matchmaking works combined with the pieces I’ve seen while doing this. Inevitably some of my points will end up to be inaccurate to one degree or another. Hopefully I can keep the scope of this inaccuracy limited, but either way, don’t take any of this for gospel and think things out for yourself.
1. How does Dota 2′s matchmaking work?
Dota 2 appears to use a Bayesian matchmaking system along the same lines of Microsoft’s Trueskill system. I believe, but am not 100% certain, that Blizzard now uses similar techniques for their Starcraft 2 and World of Warcraft Arena matchmaking systems.
You’ll often hear people complaining that “Dota 2 should use Elo like LoL/HoN.” It is my understanding that most modern games that claim to use Elo have modified it so much that it is a stretch to claim it is still the same system. Elo was an interesting creation for its time and has inspired the evolution of many of the matchmaking systems in use today, but Elo has flaws that have become apparent over the years and it doesn’t translate very well to games that aren’t 1v1.
In actuality, the basic structure of Dota 2′s matchmaking is likely to be quite similar to LoL’s and HoN’s. There are certainly significant mechanical differences, but most of differences that players notice are differences in population, culture, and rulesets — not technical features of the actual matchmaking.
2. Then why does Dota 2′s matchmaking handicapping me with players that have so many less wins than the enemy?
In most cases this means that the matchmaker believes you to be at a similar skill level. Matchmaking rating does not take either number of wins, win percentage, or (wins – losses) into account. There may be certain correlations, but they’re a result of the system — not the driver.
Let’s take them one by one.
Wins is easy. Two players could have 1000 wins. One could have 1200 losses and the other 800 losses. No one in their right mind would say that a 1000-1200 player and a 1000-800 player are likely at the same skill level. The only thing Wins alone gives us is an estimate of how often they play using this account. From this we might be able to infer how confident the system is in its rating for the player, but we cannot infer anything about the rating itself.
But Win Percentage addresses that, so surely it reveals MMR, right? No. For this, let’s create a hypothetical example using LoL. I pick LoL because a lot of what goes on in it’s ranked system is visible, but similar interactions will occur in most matchmaking systems.
In LoL you start out with a rating of 1200 points. Every win you get a certain amount of points and every loss you lose a certain amount of points. These values vary depending on which team was (slightly) favored to win the match and tend to fall somewhere between 15 to 20 points a match last I checked. For the sake of mathematical simplicity, we’ll just assume an average of 20 points per win and 20 points per loss. LoL also uses an accelerated placement rate for your first 10 games or something. I don’t know the details offhand, and we’ll ignore it for this since it’s not hugely important in the long run.
If a player has a 55% win rate after 100 games, that’s a 55-45 record, which translates to a +10 win-loss differential. Since we’re assuming a shift of 20 points per game, this means that at the end of 100 games that player would be at a 1400 rating, up from 1200.
If a completely different player has a 55% win rate after 200 games, that’s a 110-90 record and a +20 win-loss differential. This player would be at 1600 rating despite having the same win percentage as the first player.
There’s nothing surprising about this because a 55% win rate is only as impressive as quality of your opposition. The length of time that you can maintain a positive win rate is important because as long as you’re winning more games than you lose your rating continues to go up and you face stronger opponents. In reality, the second player in our example most likely had something like a 56% win rate their first 100 games and then a 54% win rate their second 100 games. If this trend continues linearly (which is unlikely, but math) their third 100 games would have a52% win rate, leaving them at 1680 rating after 300 games.
So yeah, win percentage clearly has issues too.
3. So if win differential is the best way to estimate MMR, why do I keep seeing people with negative Win-Loss records in Very High games?
Well, the easy answer is “premades.” But that’s only a partial answer and something we’ll get to later.
As for the longer answer, Win Differential doesn’t really work as an MMR estimate in Valve’s system because Valve has done some interesting things. We’re now veering into speculative territory here, so hold on to your cautious skepticism hats.
Remember that question you answered when you first created your Dota 2 account, the one that asked what your skill level is? Chances are that determined your starting MMR. Not to a major extent. All three starting areas are likely within both the old and current ‘Normal’ brackets. But having multiple MMR entries makes sense. One unfortunate consequence of the mechanics of Dota is that MMR systems take a lot longer to detect your true rating than they do in other genres. Having a high MMR starting point shortens this for the top end but has the undesirable consequence of forcing most of the player base to lose to their skill level just when they’re starting out. Sure, people are unreliable when judging their own skill level, but making “I think I can handle the top MMR entry point” an opt-in question is a pretty good compromise solution.
As for the effects of this on Win Differential, let’s expand on our previous example.
Two players in a LoL system. One claims to be of a medium skill level starts at 1200, and one claims to be experienced and starts at 1400. In actuality they both have an immediate rating of around 1300. The 1200 starter goes 55-50 in 105 games, and hits 1300 with a 52.4% win ratio. Conversely, the 1300 starter goes 50-55 in 105 games and also ends at 1300, but with a 47.6% win ratio. 100 rating doesn’t seem like a big deal, but over the span of time that we’re dealing with it translates to a nearly 5% difference in win ratio. If they both go 50-50 in their next 100 games their resulting win ratios would be 51.2% and 48.7%. So the difference shrinks over time but doesn’t go away very quickly, and after looking up a game that they both lost on Dota that 51.2% player would start some obnoxious thread about Valve placing him on teams with baddies despite being at the exact same MMR as the other player.
Besides the question placement, it is my strong belief that Valve has some sort of anti-smurfing mechanism built into matchmaking. I’ve seen numerous reports of experienced players making a new account and having it placed in High or Very High games in an extremely short number of matches. My interpretation of this is Valve has a metric of catching some percentage of the players who are playing at a level significantly lower than their actual skill level and has designed the system so that it boosts them to a point much closer to the estimate of their actual skill level, at which point traditional matchmaking takes over.
There’s a lot more to be said on this topic, but this post is going on longer than intended, so I’ll leave it for part two. But before we move on, there’s one other possible tweak Valve may have included in matchmaking that could de-couple win differential from MMR. Other matchmaking systems have included a certain measure of forgetfulness, meaning that games that occurred months earlier are treated as less relevant to your current MMR than games in the past week. Sudden shifts in performance would then allow players to quickly climb up in MMR without outright canceling a previous sub-50% winrate.
4. You claim that ~20% of players are in the High or Very High brackets, but Valve says that the brackets don’t exist in matchmaking, so what gives?
Ok, this is a fun mess of semantics.
For reference, the quite from Eric Tams is “As far as Matchmaking is concerned none of those buckets [low, medium, high] exist – We match on a continuous scale.” This is undeniably true, but it leaves the question, “So what are those buckets?”
Matchmaking in a team game like Dota takes a bunch of individual ratings and then combines them into one big rating. One way you could do this is by averaging them, and while averaging them may not precisely be what Valve’s system does (and there’s definitely something more going for premades and partial premades), we’re just going to refer to the process as being like averaging them for the sake of simplicity.
So for any 5 potential players there’s a number that represents their team like-an-average-but-maybe-not. This average or team rating gets compared to other potential teams and if the two teams are sufficiently close in number a match gets made. Then someone declines at 9/10, but I digress.
In any case, a match was created without ever referring to anything like Normal, High, or Very High. But what we do have are 2 team numbers. The match recording system looks at one of numbers, or maybe an average of both, and then according to that number places them in a bucket. So we have two semi-arbitrary points ‘H’ and ‘V’. If your match rating is higher than ‘H’ but less than ‘V’ it gets placed in the High bracket. If it’s higher than ‘V’ it gets placed in the Very High bracket. If the game gets abandoned and stats don’t get recorded the game gets sent automatically to the Normal bracket, regardless of the team ratings.
The question then remains, are ‘H’ and ‘V’ arbitrary or were they picked because they represent something. My theory is that ‘H’ is the value that approximately represents the personal rating of players one standard deviation above the mean in the rating distribution, which would be come out to be around the 84th percentile. ‘V’ would be two standard deviations above the mean and be close to halfway between the 97th and 98th percentile.
What this would mean is that if your personal rating is ‘H-1′ and you played on a team with 4 cloned accounts all solo queued, then your game would end up the in Normal bucket. If your personal rating was ‘H’ then your game would be in the High bucket.
Of course this means that there are people who play in Very High games whose personal rating isn’t above the Very High threshold, so it doesn’t really make sense to say “Very High players.” Only games can definitively be in Very High. That being said, if the vast majority of your solo queue games are in Very High, then it’s likely that your personal rating is very close to the top 3% of the playerbase.
That’ll do it for part 1. Stay tuned for part two where we discuss the possibilities of the anti-smurfing mechanism, why finding your rating in a Dota-like takes so damn long, and probably like 500 different veiled references to the Dunning-Kruger effect.