Exploring heat maps for Tobin Heath’s NWSL April 2016 Player of the Month performance

Thanks to the hard work of many volunteers and WoSo fans who have contributed their time to the WoSo Stats Project, we have several matches with location data. The way this project tracks matches means that virtually every action (passes, tackles, pressuring an opponent, lost touches, etc.) has its location on the field logged according to this field breakdown.

The possibilities this creates for analyzing play on the field are numerous. However, logging location data on top of logging the match actions themselves is very time-consuming, so every match has its actions logged first and then location data is added after the fact. This means that only a portion of matches in our database have location data logged, but at this point we have enough matches completely logged that we can start to do some interesting things.

In this post I’ll show how you can use Excel spreadsheets to create heat maps for some of the data we’ve logged. This is just the beginning, but some of this is neat enough to show off already. To keep things focused, I also will show heat maps just for Tobin Heath for her April 2016 Portland Thorns matches.

How the heat maps work

The heat maps featured in this post are in an Excel spreadsheet that can be downloaded from the WoSo Stats GitHub repository (click on “View Raw” to download it). A more in-depth post will follow on how these heat maps function and how you can create your own for any match in our database with location data, but for now it’s important to keep in mind what stats these heat maps depict, how the color gradients are scaled, and just in general how to use them without accidentally breaking the whole thing.

In the Excel spreadsheet, all the data that can be represented on the heat map is in the large table to the right. Don’t mess with this! Each column in that large table is for a given stat in a given zone of the field.

The actual stats I’ve chosen for these heat maps are the following, with a description for what each acronym means listed below:

  • opPass.Att = Open play passes (not free kicks, throw-ins, corner kicks, goal kicks) attempted
  • opPass.Comp = Open play passes attempted
  • opPass.Pct = Open play pass completion percentage
  • Int = Interceptions
  • TO.Won = Take ons won
  • TO.Lost = Take ons lost
  • AD.Won = Aerial duels Won
  • AD.Lost = Aerial duels lost
  • Tackles = Tackles that dispossessed an opposing player with the ball
  • Pressure = Instances where a player applied pressure onto an opposing player’s action, without actually attempting a tackle or dispossession
  • Recoveries = Winning possession of a loose ball

More detail on some of the above stats and how they’re logged can be found here.

To actually create a heat map for a specific player from that heat map’s match, type in her name in the cell below “Enter name here.” The heat map will be for the stat in the cell below “Enter stat here:”, which you can also change so long as it matches one of the stats listed below. The stat for that player for each zone will be listed in the grey-shaded cells.

A note on how the color gradient works. For the sake of simplicity, for now, in this Excel spreadsheet the minimum (white) is always 0 and the maximum for each stat (darkest green) is the largest number out of all the zones for all the players. For example, for the Portland Thorns vs. Orlando Pride match above for the opPass.Att stat, Emily Sonnett had the most open play passes attempted from any one zone, with 20 open play passes attempted from the center of the defensive 3rd (DC). So, that’s the maximum for the opPass.Att stat. The same logic holds true for all the other stats.

I’ve got formulas in the Excel spreadsheet to also account for finding the maximum for a team overall, as you can also just type in the team acronym in the cell under the “Enter name here:”

Tobin Heath’s April 2016 NWSL Player of the Month performance

If you’ve already downloaded the Excel spreadsheet, you’ve hopefully noticed you can create heat maps for the three NWSL matches in there for any of the players who played. However, I’m just going to spend the rest of this post looking through the stats for Tobin Heath, as she was voted the Player of the Month by NWSL Media for that month of April. That Tobin Heath is a great player who had a great start to the season, well, that’s not something for which you really need a heat map. But the following will be fun to see how Heath’s passing and take ons were distributed across the field, especially compared to other players.

Tobin Heath in Portland Thorns vs Orlando Pride (Week 1)

The first thing that stood out to me for this match was the distribution of Heath’s passing across a large swath of the field. Her heat map for her open play passing attempts (opPass.Att) is light green, but there’s a lot of light green especially in the opponent’s half.

There weren’t a lot of players who had that much green in the opponent’s half. The only other player who comes close to that many attempted passes was Alex Morgan, shown below.

And Heath’s passing completion percentage was also fairly high in the attacking third compared to other players (here’s the tweet with screenshots of heat maps for Alex Morgan, Kaylyn Kyle, and Mana Shim, who had similarly high pass completion percentages in the attacking 3rd).

Heath’s heat map for her take ons won is noteworthy because, after looking at the heat map for her entire team, it looks like every take on won by Portland in the attacking third (5) belonged to her.

Finally, aside from her passing and playmaking ability, Heath also appears to have applied a good amount of pressure on opponents. Below, her heat map for pressure applied is fairly green up and down her left side of the field.

By comparison, the two players who look like they applied the most pressure in the attacking third were Portland’s Nadia Nadim and Orlando’s Alex Morgan.

Tobin Heath in FC Kansas City vs Portland Thorns (Week 2)

Similar to her previous week, Heath had a very active passive day against FCKC in Week 2 along her left flank, with a high number of passing attempts centered around that attacking left corner.

The only other player with that many passing attempts in the opponent’s left or right corner was her teammate Klingenberg.

Heath’s passing completion percentage was also high across two-thirds of the field, again, especially in the attacking third.

The four other players from this game who appeared to have similarly high passing completion percentage numbers in the attacking third were Erika Tymrak, Allie Long, Mandy Laddish, & Lindsey Horan.

As for take ons won, Heath won a similar amount of take ons in the opponent’s half compared to the previous week.

Take ons lost are worth looking at for Heath here, as it appears she lost a few more take on attempts against FCKC (5) compared to her previous week against ORL (2).

Other players that game with a similar number of take ons won in the attacking third were Meghan Klingenberg, Mandy Laddish, and Allie Long.

As far as recoveries the loose ball goes, Heath did a good amount of it in the attacking third, especially in that dangerous center of the attacking third in front of the 18-yard box.

Two other players whose recoveries in the attacking third looked similar to Heath were Jen Buczkowski and Desiree Scott.

Heath’s pressure applied to opponent’s in their defensive corners also looked impressive, comparable to just a few other players from this game.

Tobin Heath in Boston Breakers vs. Portland Thorns (Week 3)

Finally for this (short) NWSL month, there’s Heath’s performance at Boston where she had another day of passing attempts bunched into the attacking third’s left, similar to the previous week against FCKC.

Kristie Mewis had a heat map similar to this one, with almost as many passing attempts from the attacking third left, and a few more than Heath further down the center of the field.

Heath’s passing completion percentage in the attacking third looks somewhat inverted compared to Mewis. She “only” had a passing percentage of 50% in the attacking third’s left – I put “only” in quotation marks because with more data from more matches it’ll be interesting to see if that low of a passing percentage is actually relatively good compared to what the average player gets. Either way, Mewis had a much better passing percentage, 80%, in that attacking third left.

Compared to the rest of the month, Heath had what looks like significantly less take ons won across the field; 3 compared to 7 against Orlando and 10 against FCKC.

Heath also had the least recoveries of the month against Boston; 6 compared to 10 against Orlando and 14 against FCKC

Final Thoughts & Next Steps

The stats shown here appear to back up the eye test; Heath was a presence, attempting passes across a wide span of the field. She was a threat in the attacking third left, getting off a good number of passes and take ons from that corner.

There’s a lot more that could be done with heat maps, though. I included only a few stats in these heat maps because some stats I didn’t include, such as assists, happen very rarely; I found that these heat maps and the use of a color gradient are most useful and interesting for actions that happen many times such as passes or pressuring an opponent. One idea that I will pursue is creating a heat map that incorporates several matches and, rather than using totals, will use a “per 90 minutes” stat. That is, to use interceptions as an example, instead of showing how many interceptions a player got in one match (which is rarely more than one in any one zone for one match), show how often she gets an interception in that zone per 90 minutes of play.

There’s also the issue of how to account for a scale that is heavily skewed to one extreme, such as the passing completion percentage stat. Most players, even those with an average passing day, will have a passing completion percentage well above 50 percent, so having the minimum below 0 isn’t very useful.

There’s also the matter of how to account for how hard it is to rack up certain stats in some zones compared to others. For example, in the Orlando game, Heath had a fairly evenly distributed number of passes across the field. However, it may not be giving Heath enough credit by having her 5 passing attempts in the defensive middle’s left (DML) be the same shade of green as her 5 passing attempts in the opponent’s 18-yard box. I’m assuming that any one player getting that many passing attempts in the 18-yard box in any one game is rare, so that zone could reasonably be even greener if it were put up against some sort of scale that took into account what the upper and lower quartiles are for that specific zone; I’d bet if we looked at every player from that month and looked the number of passing attempts each one managed to make in the 18-yard box, Heath’s 5 would be somewhere in the top 25%. On the other side of the field, by that standard Heath’s 5 passing attempts in the defensive middle’s left would probably be an even lighter green, since that’s almost always a very active part of the field, especially for left backs, center backs, and defensive midfielders.


A more in-depth post on how this heat maps works is coming, but for now you can browse the Excel spreadsheets with the raw match actions data for each match here:
* PTFC-ORL (Week 1)
* FCKC-PTFC (Week 2)
* BOS-PTFC (Week 3)

The stats tables next to the heat maps, all by themselves in their own csv files, can be found here:
* PTFC-ORL (Week 1) location-based stats
* FCKC-PTFC (Week 2) location-based stats
* BOS-PTFC (Week 3) location-based stats

And, finally, the R code that created the above csv tables can be found here. That source from this R code that has functions for fine-tuning what matches from our database you’re looking for and from this R code that creates the stats tables.

Help us!

You made it this far down the post! Maybe you can help us out a little more. We need help logging this data. This data only happens because of fans like you who have put hours of their free time into logging data onto Excel spreadsheets. But we need more people helping out. If you’re interested, read more here about how to help, and then send me a DM at Twitter at @WoSoStats or email me at wosostats.team@gmail.com to get started. All it takes is a couple of hours of your free time, a willingness to learn, and knowing a thing or two about Excel.

How to explore USWNT passing stats with heat maps

Over the past several months, in addition to tracking actions such as passes and interceptions, we have also been adding location data to as many USWNT and NWSL 2016 matches as we can. The process for how that works is explained here, but here’s what it ends up looking like on the match’s actions spreadsheet (note the “poss.location” and “def.location” columns) for the USA-Germany SheBelieves Cup match:

Screen Shot 2016-08-02 at 1.00.50 PM

This series of events can be seen at https://streamable.com/xskp

The values in the “poss.location” and “def.location” columns (as well as the “poss.play.destination” column, which are blank here) represent the location of the player from the “possessing” team, based on splitting up the field into different zones as shown here. In the series of event shown above, play is shown moving from Babett Peter in the defensive middle third’s right wing, back to Almuth Schult in the defensive third’s center, and then all the way to the attacking right third where Anja Mittag attempts a side pass that is recovered by Morgan Brian in her own 18-yard box. Also logged is the location of defenders doing certain defensive actions, such as applying pressure onto a pass (as Alex Morgan did) or engaging in an aerial duel with the possessing team (as Crystal Dunn did).

As you can imagine, analyzing something like this, especially over the course of an entire match, is best done in a two-dimensional format. There’s only so many different stats tables you can make before you eventually need to put this on a heat map, like this!

Screen Shot 2016-08-02 at 4.49.25 PM

I created heat maps like the one above for these eight 2016 USWNT matches for which we currently have location data:

  • USA-Ireland (1/23/16 – International Friendly)
  • USA-Costa Rica (2/10/16 – 2016 Olympic CONCACAF Qualifiers)
  • USA-Mexico (2/13/16 – 2016 Olympic CONCACAF Qualifiers)
  • USA-Canada (2/21/16 – 2016 Olympic CONCACAF Qualifiers)
  • USA-England (3/3/16 – 2016 SheBelieves Cup)
  • USA-France (3/6/16 – 2016 SheBelieves Cup)
  • USA-Germany (3/9/16 – 2016 SheBelieves Cup)
  • USA-Colombia (4/6/16 – International Friendly)

The heat maps were created with Excel and can be downloaded here (Click on “View Raw to download).

There’s a heat map for each match in the second sheet of the Excel workbook. Currently, the heat maps only depicts completed passes that were made from within each zone. To change the player the heat map is depicting, just change the name of the player in the cell below where it says “Enter name here”.


Next to each heat map is a big table of stats and player info, which is where the heat map is getting its data. Don’t change any of this! Unless you really, really know what you’re doing. Make sure the player name you type in for a heat map matches the name of the player in that heat map’s adjacent stats table.

Worst comes to worst and you mess something up, just re-download the Excel workbook from the GitHub repo.

This is still a work in progress that I figured out over the course of a night. Way more than just passes can be put on this heat map, and we also have location data for more than just USWNT matches (we also have NWSL 2016 matches!). For now, though, this works.

If you run into any issues, send me a tweet at @WoSoStats or email me at wosostats.team@gmail.com.

We need volunteers!

If you made it this far, maybe you’re willing to help us log even more data! We are always in need of more volunteers to help us log match actions and location data for women’s soccer matches. Without the help of fans volunteering their time for this project, none of this data is possible. No experience is necessary, just a willingness to learn. Read more about how to help here: https://wosostats.wordpress.com/how-to-help/

Exploring passing stats – USWNT vs. GER (SheBelieves Cup 2016)

As part of our project to track stats for women’s soccer matches (please join and help us get more data!), we’ve been working on adding location data to virtually every action we track. Until now, if you’ve been following some of the stuff I’ve posted on Twitter or the WoSo Stats Shiny app, it’s largely been summary data devoid of location data. That is to say, it adds up aggregates of certain stats (such as total passes attempted by a player or team) or in some cases calculates additional stats based on those basic stats (such as a player’s passing completion percentage), none of which take into account where a player was on the field.

This time, I’m going to look at location-based data. In this post, to make things simple, I’m going to focus one match, the USA-Germany SheBelieves 2016 match. To make things even simpler, I’m also just going to look at passing and possession. This is an early dive into the location data we’re getting from this project, and how it can complement what we already know about a match based on its summary stats and, well, actually watching the game.

Passing Stats

One of the most interesting things I found while exploring the stats this project is generating was the impact of pressure on a player’s passing completion percentage.  I expected, based on intuition, to see a player’s passing completion percentage to go down with pressure, but what I saw was that, on average, it barely had an impact.

Impact of Pressure on opPassing


What you’re looking at is the impact that pressure had on a player’s open play passing completion percentage. Open play passes are all passes that aren’t throw ins, free kicks, corner kicks, goal kicks, or goalkeeper throws or dropkicks. I excluded those because those, by definition, can never be “under pressure” by a defender. In the chart above, the further to the right the bar is, the better the player’s open play passing completion percentage got under pressure. To account for differences in open play passing attempts, the darker the green, the more open play passes that player attempted under pressure.

For me, this was a bit of a head-scratcher at first, as I noticed similar numbers across different matches. The median difference is +15%, so it looks like more players’ passing completion percentage actually got better under pressure. I initially chalked this up to, well, these are the two best teams in the world and great players should continue to make good passes under pressure.

However, upon further thought, this does make some sense, which merits further analysis later on. A player under pressure is probably going to be more likely to revert to a “safer” pass, such as a backwards pass, or be forced into a riskier play, such as a take on, due to not having enough space or time to get a pass off. Inversely, a player who isn’t under pressure, with more time and space with the ball, might be more likely to attempt a riskier pass, such as a launched ball, or not even a pass altogether and instead opt for a shot.

It seems pressure might be a better predictor of a player’s passing completion percentage once we are able to break down those decisions a little better, but I’ll save that for another day. What do I want to get at is what happens to these passing stats when we break it down by location.

Adding Location Data

For each pass attempt, we tracked it’s origin (i.e. where the player was passing from) according to which one of the following “zones” on the field she was in.


For this analysis, I grouped together passes in the defensive middle third and attacking middle third as passes that generally happened in the middle third. Now, what happens to a player’s open play passing completion percentage when she’s passing from within that all-important attacking third?


It drops for pretty much everyone in the match who attempted an open play pass in the attacking third. Again, darker colors indicate more attacking third passing attempts, and the further to the right the bar is the better that player’s passing completion percentage got in the attacking third, compared to her passes in the middle and attacking third.

There are some outliers here. Lloyd, Horan, and Pugh had some very stark differences in completion percentage, but also because they barely attempted any passes from within the attacking third. In general, though, it appears that most players in this match had their passing completion percentage negative affected.

Something interesting worth pointing out is that most of the players in the top half of the chart were German. This stands out even more when we take these two different passing completion percentages (in the attacking 3rd vs. everywhere else) and put them on a dot plot, with a color for each team, as shown below.

opPassing by Location - Dot Plot.png

The further to the right, the higher the player’s open play passing completion percentage in the defensive and middle third. The higher up, the higher the player’s open play passing completion percentage in the attacking third. The size of the dot indicates the number of open play pass attempts in the attacking third, so players who attempted more passes in that part of the field stand out more.

Almost every German player was above the median for open play passing completion percentage in the attacking third. Notably, Marozsan was the only player in the 75th percentile (better than 75% of all players in the match) for both categories. Meanwhile, it looks like Brian’s passing in this match was negatively affected the most when attempting a pass from within the attacking third.

Unfortunately for Germany, despite having better passing completion percentages in the attacking third and applying what appears to have been great pressure on the U.S. defense, they still lost due to an incredible take-on by Alex Morgan in the penalty box that led to an equalizer and an equally incredible error from Almuth Schult, the German goalkeeper, that gave Sam Mewis the game-winner.

Better passing in the attacking third, then, wasn’t enough to get Germany the win, which is really all that ultimately matters in soccer. It’ll be interesting to see, though, as we get more data for more matches, if that’s out of the ordinary. All that pressure on U.S. defense did get the Germans a goal and credit as the only team in 2016 to date to score a goal on the United States. It may not be a guarantee of victory, but I suspect it points most team in the right direction.

Either way, the way the U.S. goals came about is a nice segue into an analysis of take-ons (and what a player does afterwards) and changes in possessions (and where they happen), which I hope to do in the coming week with the USA-Colombia matches.

You can view the stats and visualizations used in this blog post on Tableau and the WoSo Stats Shiny app. All the source  data is freely available on the GitHub repository.


Okay, if you’ve scrolled this far down then hopefully you’ll be interested enough to help us contribute to our small but growing database of women’s soccer stats. As almost everyone who’s tried to search for something as simple as passing stats for their favorite player knows, there’s a dearth of even the most basic stats for women’s soccer and really women’s sports in general.

Please help us change that, one match at a time! We need people who are willing to volunteer some time and effort (any and all would be appreciate) into logging data for women’s soccer matches. To see which matches immediately need help, check out this month’s goals. To learn how to help and get started, read here. The hope is, for starters, to track every NWSL 2016 match but we still need more people!