For all you budding data analysts out there, I bring you the data dump of all ~300k players in my sample. It’s a zipped .txt file in a comma separated format and imports into both R and Excel (and possibly other things, but I can’t guarantee it!)
The data is divided by skill level in the bracket column. n is Normal, h is High, and v is Very High. All games should be from the 6.74 patch. I have filtered out all games with less than 10 players, which include bot games and early abandons. I’ve left off a few categories to keep the file easily manageable for now. I intend for a future release to include item data once I figure out how I want to display it. Possibly also a file that is all aggregate match data without player entries.
If you want to give it a whirl but don’t have a lot of experience, you can download R here.
1. Once in the program use setwd(“C:/directory”), where C:/directory is wherever you extracted the text file.
2. data = read.csv(“PlayerData.txt”) ; This makes ‘data’ a data frame containing all of the player information
3. data[1:10,] ; This will show you the first 10 player entries. As you can see, they all come from the same matchID, so you have there an entire set of match data.
4. Let’s say you want to do some filtering. We can create logical entries for that. If you wanted to see all games played by Tiny, we can use L = data$Hero == ‘Tiny’
data$Hero refers to the Hero column in our data frame. L is now a logical check for all entries in our data where the player’s hero was Tiny. (I just realize now I should have included a hero list for the naming conventions, oh well)
5. data[L,] now shows all player entries who played Tiny. If we want to break it down further we could use add extra checks like L = data$Hero == ‘Tiny’ & data$Bracket == ‘v’ which only shows Tiny games in the Very High bracket.
6. If you want to turn these logical checks into their own separate data frame you can now do tiny = data[L,] This gives us a data frame named ‘tiny’ that contains all Tiny players in the very high bracket.
7. Want to see the average GPM of Very High Tinys? Just use mean(tiny$GPM)
8. Want to get even trickier? Let’s go back to data and make a new column that represents Creep Kills per minute. To do this we use data$csm = data$CS * 60 / data$Duration (The 60 is to convert Duration from seconds to minutes)
9. Now let’s find the top 1% of CS per minute in Normal. Create a separate frame using L = data$Bracket == ‘n’ and norm = data[L,]
10. We can find the top 1% now using quantile(norm$csm, c(.99)) This tells us that the 99th percentile is at 6.19 CS/min.
11. To actually see the entries, we can use L = norm$csm > 6.19 and then norm[L,] Oh boy, look at all the Nature’s Prophet.
That’s just a basic starting point, and I’m sure more experienced R users could have way more interesting stuff in no time.
As always, if you want to contact me about anything regarding this just use either the comment section or the e-mail up there in the right sidebar. Happy mining, data spelunkers.