I wanted to work on my Python skills, so I was putting together a box plot + swarm plot of players by age by club, in the PL for this year so far.
I’ve got a lot to do on this still if I want to clean it up, but it was a fun first go at this. Here’s what I did:
- Box plot by team color (in some cases, secondary or tertiary color, or in the case of Spurs, a color they’ve used for training kits)
- Swarm plot by base position from fbref.com, so GF/DF/CM/FW
- Sorted by current table rank
I did up the background, and despined, and used a custom palette for both the position on the swarm and the boxes on the boxplot, so that was fun. Circle color and edge color on the swarm as well.
My hope was to just see how as the table goes top to bottom, if there’s any sort of visual trending based on the boxes. And it looks like there is from Manchester United down to Leeds United, and it’s right around age 26. If you drew a straight line across, you hit the ‘meat’ of each box, visually.
I normally use Tableau for building visuals that I want to just stare at, and see what I see (I should trademark that), but I keep reminding myself to Push Myself In Python (another tm?) and built this.
I want to somehow look at minutes too, but I’m not exactly sure how to incorporate that. Maybe a bikini chart, or butterfly chart, such as seen below:
If I put sub minutes on the one side, and starter minutes on the other, by age, and possibly even stack based on position, then just build it out for each club, that would be an interesting visual I think. I don’t know how to do that yet (in Python) but that’s the best way to learn.
I find it fascinating right now how clubs are distributed in terms of age and minutes, and finding that sweet spot, and I’ve thought about this for a while now, and I’m hoping to keep hunting down more answers on the best methodology for this. Stay tuned!