USWNT passing – comparing positions and opponents’ FIFA rankings

Over the past couple of days I’ve been trying to figure out how to create a Tableau workbook that aggregates all our USWNT data in a similar fashion to the NWSL 2016 Tableau workbook. The main challenge has been figuring out how to best show and compare stats from USWNT that, quite frankly, are all over the place due to how varied the quality of opponents has been.

Thankfully, we’re able to use all the USWNT stats tables we’ve got in the GitHub repo and use the database.csv file, with data for all the matches in the WoSo Stats GitHub repo, to create something that can show something like passing stats adjusted for the opponent’s quality.

The visualizations for the USWNT data, for now, are the two worksheets in this Tableau workbook. Below, I’ll explain what each one is, and some more detail on how how the data was calculated and aggregated to make it easier for you to make similar visualizations.

I won’t delve too much into an actual analysis of the data in the two charts. There’s too much there to go into right now – and why have all the fun when you can do that, too? Anyways, on to the charts

Visualizing USWNT Open Play Passing Stats

First, this visualization of USWNT passing stats for the USWNT matches that we have in our database. Each mark on the chart below represents a USWNT player from a match in our database. The x-axis is her total number of open play passes attempted during that match, the y-axis is her open play passing completion percentage. The color is her designated “position” (more on this later) and the shape of the mark is whether or not the opponent, at the time, had a FIFA ranking in the top 15.

Screen Shot 2017-07-16 at 9.27.38 AM

 

Midfielders and defenders generally pass the ball more, which is to be expected. Forwards, who are often surrounded by defenders, and goalkeepers, who may often launch the ball forward, see less of the ball and have lower passing completion percentages. It’s pretty clear that differences in passes attempted and in passing completion percentage have to do with the nature of a player’s position. We need to better adjust for position.

Adjusting For A Player’s Position

This visualization shows passing stats adjusted for a USWNT player’s position by using her standard deviation from the average for USWNT players in her position.

Screen Shot 2017-07-16 at 9.52.13 AM.png

Now it’s easier to spot which players, given their “designated” position, attempted to pass the ball more than average and completed their passes at a higher percentage than average. On the other hand, it’s also easier to spot which players passed the ball less than average and completed their passes at a lower percentage than average.

To account for some outliers, in the chart below I used the filters to exclude performances from any USWNT players who played less than 30 minutes and any USWNT players who had less than 10 open play pass attempts.

Screen Shot 2017-07-16 at 10.06.09 AM.png

A few things stand out. One, it’s easier to rack up more passing attempts with a high passing completion percentage against lesser opponents, as indicated by how many more cross-shaped marks compared to circle-shaped marks are in the upper-right. And playing top opposition can drastically cut down on both, with several circle-shaped marks spread out throughout the bottom-left corner.

Players’ “Designated” Positions and Next Steps

About the positions. Players are only given one for all their matches, instead of one for each match. This means that a player like Allie Long who in this chart is classified as a “midfielder” is being misrepresented for games where she has played as a defender.

And even within positions, some further refinement could be used. Fullbacks like Kelley O’Hara and Ali Krieger, who are correctly classified as “defenders,” have a propensity towards lower passing completion percentages because, as fullbacks, they often play higher up the pitch where a completed pass is less likely. But because they’re defenders, their passing completion percentage’s standard deviation from the average for all defenders looks worse than it really is because they’re counted against centerbacks, who are also correctly called “defenders” but have some of the highest completion percentages in the game.

A next step is going to be to figure out a way to resolve that Allie Long problem and figure out, on a match-by-match basis, a player’s position for a given match. And then further breaking down some positions like defenders into fullbacks and centerbacks.

Another idea is to only show passing stats broken down by thirds of the fields. I suspect the difference in passing stats vs Top 15 opponents and non-Top 15 opponents would be even more stark when we look at the attacking third.

You can help!

This data only happens because of help from fans like you (yes, you)! The WoSo Stats project needs help to log more stats and location data for USWNT stats, and past NWSL seasons. With your help, we can get even more richer data to expand on what we know about the sport.

If you’re interested in logging data for matches (that are all publicly available on YouTube), read more here and email me at wosostats.team@gmail.com or send me a DM at @WoSoStats on Twitter. All the data logged will be publicly available on the WoSo Stats Github repo and will help me and others do more analyses like these!

Advertisements

Exploring passing stats – USWNT vs. GER (SheBelieves Cup 2016)

As part of our project to track stats for women’s soccer matches (please join and help us get more data!), we’ve been working on adding location data to virtually every action we track. Until now, if you’ve been following some of the stuff I’ve posted on Twitter or the WoSo Stats Shiny app, it’s largely been summary data devoid of location data. That is to say, it adds up aggregates of certain stats (such as total passes attempted by a player or team) or in some cases calculates additional stats based on those basic stats (such as a player’s passing completion percentage), none of which take into account where a player was on the field.

This time, I’m going to look at location-based data. In this post, to make things simple, I’m going to focus one match, the USA-Germany SheBelieves 2016 match. To make things even simpler, I’m also just going to look at passing and possession. This is an early dive into the location data we’re getting from this project, and how it can complement what we already know about a match based on its summary stats and, well, actually watching the game.

Passing Stats

One of the most interesting things I found while exploring the stats this project is generating was the impact of pressure on a player’s passing completion percentage.  I expected, based on intuition, to see a player’s passing completion percentage to go down with pressure, but what I saw was that, on average, it barely had an impact.

Impact of Pressure on opPassing

 

What you’re looking at is the impact that pressure had on a player’s open play passing completion percentage. Open play passes are all passes that aren’t throw ins, free kicks, corner kicks, goal kicks, or goalkeeper throws or dropkicks. I excluded those because those, by definition, can never be “under pressure” by a defender. In the chart above, the further to the right the bar is, the better the player’s open play passing completion percentage got under pressure. To account for differences in open play passing attempts, the darker the green, the more open play passes that player attempted under pressure.

For me, this was a bit of a head-scratcher at first, as I noticed similar numbers across different matches. The median difference is +15%, so it looks like more players’ passing completion percentage actually got better under pressure. I initially chalked this up to, well, these are the two best teams in the world and great players should continue to make good passes under pressure.

However, upon further thought, this does make some sense, which merits further analysis later on. A player under pressure is probably going to be more likely to revert to a “safer” pass, such as a backwards pass, or be forced into a riskier play, such as a take on, due to not having enough space or time to get a pass off. Inversely, a player who isn’t under pressure, with more time and space with the ball, might be more likely to attempt a riskier pass, such as a launched ball, or not even a pass altogether and instead opt for a shot.

It seems pressure might be a better predictor of a player’s passing completion percentage once we are able to break down those decisions a little better, but I’ll save that for another day. What do I want to get at is what happens to these passing stats when we break it down by location.

Adding Location Data

For each pass attempt, we tracked it’s origin (i.e. where the player was passing from) according to which one of the following “zones” on the field she was in.

687474703a2f2f692e696d6775722e636f6d2f45514c6d7059702e706e67

For this analysis, I grouped together passes in the defensive middle third and attacking middle third as passes that generally happened in the middle third. Now, what happens to a player’s open play passing completion percentage when she’s passing from within that all-important attacking third?

impact-of-pressure-on-oppassing-by-location

It drops for pretty much everyone in the match who attempted an open play pass in the attacking third. Again, darker colors indicate more attacking third passing attempts, and the further to the right the bar is the better that player’s passing completion percentage got in the attacking third, compared to her passes in the middle and attacking third.

There are some outliers here. Lloyd, Horan, and Pugh had some very stark differences in completion percentage, but also because they barely attempted any passes from within the attacking third. In general, though, it appears that most players in this match had their passing completion percentage negative affected.

Something interesting worth pointing out is that most of the players in the top half of the chart were German. This stands out even more when we take these two different passing completion percentages (in the attacking 3rd vs. everywhere else) and put them on a dot plot, with a color for each team, as shown below.

opPassing by Location - Dot Plot.png

The further to the right, the higher the player’s open play passing completion percentage in the defensive and middle third. The higher up, the higher the player’s open play passing completion percentage in the attacking third. The size of the dot indicates the number of open play pass attempts in the attacking third, so players who attempted more passes in that part of the field stand out more.

Almost every German player was above the median for open play passing completion percentage in the attacking third. Notably, Marozsan was the only player in the 75th percentile (better than 75% of all players in the match) for both categories. Meanwhile, it looks like Brian’s passing in this match was negatively affected the most when attempting a pass from within the attacking third.

Unfortunately for Germany, despite having better passing completion percentages in the attacking third and applying what appears to have been great pressure on the U.S. defense, they still lost due to an incredible take-on by Alex Morgan in the penalty box that led to an equalizer and an equally incredible error from Almuth Schult, the German goalkeeper, that gave Sam Mewis the game-winner.

Better passing in the attacking third, then, wasn’t enough to get Germany the win, which is really all that ultimately matters in soccer. It’ll be interesting to see, though, as we get more data for more matches, if that’s out of the ordinary. All that pressure on U.S. defense did get the Germans a goal and credit as the only team in 2016 to date to score a goal on the United States. It may not be a guarantee of victory, but I suspect it points most team in the right direction.

Either way, the way the U.S. goals came about is a nice segue into an analysis of take-ons (and what a player does afterwards) and changes in possessions (and where they happen), which I hope to do in the coming week with the USA-Colombia matches.

You can view the stats and visualizations used in this blog post on Tableau and the WoSo Stats Shiny app. All the source  data is freely available on the GitHub repository.

Help!

Okay, if you’ve scrolled this far down then hopefully you’ll be interested enough to help us contribute to our small but growing database of women’s soccer stats. As almost everyone who’s tried to search for something as simple as passing stats for their favorite player knows, there’s a dearth of even the most basic stats for women’s soccer and really women’s sports in general.

Please help us change that, one match at a time! We need people who are willing to volunteer some time and effort (any and all would be appreciate) into logging data for women’s soccer matches. To see which matches immediately need help, check out this month’s goals. To learn how to help and get started, read here. The hope is, for starters, to track every NWSL 2016 match but we still need more people!