The goal is to get a better way of ranking riders for Zwift races without the need for a complete race result-based ranking system. In these two articles, a way of doing this using existing power data is being explored. The outcome will be a score for each rider, between 0 and 1000, with each 100 representing approximately 10% of the cycling population, by sex. That way race organisers can set categories to suit their target riders.
In the first article the data from a recent race series was examined to show that each of the eight power measures on Zwift Power influences the race outcome, and that simply looking at one is imperfect.
As a reminder, in that article eight measures of power were compared against race outcomes. The measures were: 15-second power, 1-minute power, 5-minute power, and 20-minute power, each in both watts and w/kg. The final conclusion was that the influence of each of the eight measures on race outcomes was something like this:

To turn this information into a rider ranking, it is logical to make an assumption that there is a link between high power in one measure and high power in another. The analysis so far has not removed this. For the sake of simplicity, we may assume that about half the “shared percentage” in the bars above is highly correlated. Removing this “shared contribution” (and re-setting the percentages so that they sum to 100%) means that the different contributions are accentuated a bit, but none is ignored, and gives this:

Turning Power Into a Rank
For any rider, for any of the power measures, that power can be turned into a rank by comparing where that power is compared to the power data for all riders. For example, if you have a 5-minute power of 4 w/kg, you are 46.5% of the way up the list of male cyclists. If each of the measures we have is turned into a position value using this approach, then each rider can be given 8 “position” measures between 0 and 100, one per power measure.
To get the final ranking, simply take each of those 8 positions and multiply them by the “percentage influence” from above, add those results up, and then multiply the total by 10. For example:
Measure | Value | Position | Influence | Position x Influence |
15s watts | 621 | 41.0 | 11.1% | 4.555 |
1 min watts | 439 | 57.0 | 12.3% | 7.010 |
5 min watts | 324 | 64.1 | 12.2% | 7.850 |
20 min watts | 270 | 61.7 | 7.3% | 4.504 |
15s w/kg | 7.67 | 32.2 | 13.1% | 4.222 |
1 min w/kg | 5.42 | 43.5 | 14.8% | 6.435 |
5 min w/kg | 4.00 | 46.5 | 17.4% | 8.091 |
20 min w/kg | 3.26 | 41.0 | 11.7% | 4.813 |
Total x 10: | 475 |
So that rider would be welcome, say, in a 400-600 category race.
Final Thoughts
This would provide individual riders with a much clearer sense of their ranking. Race organisers could easily construct their race categories, focussing on different groups of riders with confidence. It could also be extended to specialised classifications (eg for crit or iTT), by varying the influence values.
What do you think? Let me know in the comments below!
Postscript
There is a better way to remove the “shared influence” of the measures, rather than assuming half as above. The approach is to get all the correlations between each pair of coefficients for all the data, and then identify the least correlated pair. Then convert the correlation of that pair into a corrected “shared percentage”, and remove that rather than the 50% assumed in this article.
Postscript 2: Data Sources
The power data used in the original analysis is all available to any Zwift rider who connects with Zwift Power; it has been anonymised in the processing as can be seen. The race data for the original power analysis is from the Dirt Racing Series (who have explicitly given permission for anonymised data use for this analysis). The power to position data in the table above is my personal power to position data from intervals.icu.
Will this eliminate the people who use weight to gain an advantage?
The only way to do that would be to only use the watt (and not the w/kg) measurements. But by having a score which is less affected by weight doping, and by having clear categories which are harder to “game” it should help significantly.
The only comment I’ll make is if you break Zwift events into more than 4 categories you are going to have really small fields in each. Which are to small already.
I probably race 4 to 5 times a week and if you get 12-15 Cs that are on Zwiftpower in that race that’s a good field but it still sucks because it’s only 12- 15. Adding more categories than the 4 will make fields of 6-8 which would not be fun to race against.
The goal isn’t to make them smaller, but to make for fairer, more fun, racing. I agree that small fields don’t necessarily make for a great race, but neither do large fields where 70% of the riders stand no chance. Hopefully having a clear score will give race organisers more flexibility – you could have a race for everyone between 200 and 700 if you wanted a larger race which self separates into groups, or you could have a very focussed “500-550” which was smaller and very balanced.
It is not difficult at all to give yourself a 1.2 or 2 times more watts.
Where do I find these 2 times more watts?! That sounds amazing
Training very hard, I’m guessing
You are absolutely right about small fields, especially folks who are on Zwiftpower. Frustrating.
I feel your pain!
Also women specific fields which are small already.
Yes! Hopefully a better ranking scheme could help with this, especially if the event organisers are free to set the boundaries for the categories for their event,
It there were less events to pick from, the participation in the remaining events would increase. There are times where a dozen events start within 10 minutes of each other.
I suspect that (a) Zwift are hoping that some lesser attended events will get pulled by the organisers to save them having to and (b) everyone is waiting for northern hemisphere winter to see what happens them in terms of usage and participation …
It’d be great if race organisers could experiment more and find out what works. I like this system because you can vary the boundaries. 500 might be at the pointy end of abilities in one race, but low-mid range in another, and riders can also use that to choose what sort of challenge they’re after.
Thank you!
When this will be applicated? It seems very interesting.
I’m not Zwift or Zwift Power so I can’t say, but I’m more than happy to work wtih them, or any race organiser, to refine and implement it.
Hi Neil Nice copuple of artivcles. I’m gessing that probably you are looking for a simple way of getting some “messurement” for race categories. Because the first thing that come to my mind, was wy not do a principal component analysis. As a D rider, I’m going to ask somethig related to us. My w/kg is almos 1.7, so in theory, I shold be close to the middle of the pack in my category (being D category from 1.0 to 2.5). But my experience is quite diferent, normally I’m last, or second last, or a bit more higher, but far… Read more »
There are a number of other analyses that could be done, both technically (PCA does sound like an interesting one!), as well as seeing if the current different categories have different “weights” on the different powers. In the races I had easy access to, C and D were combined, perhaps precisely because of the issues you raise. My hope is that a better categorisation scheme with much more flexibility than D to A+ could allow for some great racing at every level.
Edited because I can’t read, apparently 😅
Didn’t quite follow that, but as more data is available (ie more D and low C racers engage because the races improve for them), then it would be appropriate to revisit the weightings to check that they still work for everyone. It’s a bit chicken and egg, but do-able I think.
This is downright genious. It would be interesting if this was used on your entire power curve. It would be even cooler if they used this kind of data to rank riders for different race types. Climbing versus time trialling versus punchy races and so on.
Thank you! And yes – I would love to see each rider given, say, 2-4 “Rankings” for different race type. More than that might get confusing, but to have say,”iTT”, “Crit” and “General” rankings could provide race organisers with great tools.
Great analysis but in a game where people can cheat relatively easily any power or WKG numbers can’t be relied on. Just use race results. Super easy to understand, if you cheat eventually it will catch up with you.
I’ve no objection to using race results at all, and there are a number of scheme out there for sports which have large participation and lots of people who need to be ranked against each other but who rarely, if ever, race against each other (eg the British Rowing approach). In the absence of something like that, this is a proposed way of improving power based ranking with some actual analysis behind it. Let’s all keep pushing for better!
Does this not allow for the ‘skill’ element of Zwift? Sitting in? Drafting? Judging when to attack, when not? The ‘art’ of it? (Or is it all science?) I gotta believe I can outwit my race compardries with my puny 15sec/1 min power!
It depends what portion of races are won by skill and what portion by physiology. That will impact the setting of rider scores one way or the other. Ideally you would run the scheme for a trial period (once you had done the analysis on a larger data set), and the data from those races would be used to refine the weights, say every year. That would meant that any biases are gradually weaned out, or made sufficiently obviously that they can be mitigated against.
I’m always interested in hearing about ways to make the racing “more fair”. It’s surprising how few folks participate in races and I wonder whether this was never something that interested them or they’ve just given up after a few bad experiences. Thanks for your efforts.
That said, I went into journalism because there was no math requirement so much of your analysis was waaay over my head. To quote the Bard (Barbie, in this case) “Math is hard. Let’s go shopping.”
Let’s spend some of those drops 🙂
Not sure how much you’ve kept close the conversation a lot of people had on Zwift forums, but the general consensus was that any ranking that’s not based on results will fail to create competitive categories. Results based ranking is really the way to go because results will sort people with respect to their strengths and weaknesses. The only place power-based categorization is useful is when people don’t have enough races under their belts.
I have no objection at all to a results based system! If this article is keeping people talking and working towards a better ranking systems, whether it is a better power based on or a ranking based one, I’m happy! Let’s keep developing ideas that are easy to pursue and together we’ll get there!
Great posts!! It makes definitely sense. To make it even more efficient I would combine this with a results based ranking. Otherwise sandbaggers would reverse engineer one way or the other and remain under the thresholds of any algorithm. In any case, it is super interesting. We should create a common research group for that topic, many people are sharing ideas here, combining some of them might make the competition fairer.
I would be more than up for continuing the discussion. One of the key steps would be to be clear on all the trips of cheating which occur, which ones might be addressed though ranking systems, and which ranking systems are better or worse for that. For example, cheating by lowering you weight does gain you wkg (but not watts) and allow you to position yourself at the top of a fixed category. The system proposed in this article mitigates (bot doesn’t totally not eliminate it) in two ways: by including watts in the ranking (not just wkg), and by… Read more »
Agree 100% on the fixed nature of the categories facilitating sandbagging. Even with the system we currently have, we could have events that used different w/kg breakpoints, but the 2.5/3.2/4.0 breakpoints have become institutionalized across just about every event. This strands riders at the low end of their categories at the back of their fields forever, and incentivizes those at the upper end of their categories to find ways to remain there. I would love to see some events using (for example) 2.0/3.0/4.2 or 2.0/3.5/4.5 breakpoints, but this doesn’t ever seem to happen.
Yes – hence the articles to try and create a metric that was more representative of the kind of racing done on zwift and one which made having bespoke boundaries for event categories more intuitive. And you are right – even varying the current boundaries could really mix things up and give more people a stake in their race
Nice system! But the real question is if zwift is interested or actually working on developing the category system, or is it a lot of talking for nothing?
The user communication of zwift in this kind of topic is really bad I think…
I have no idea what Zwift are or aren’t doing – I’m a user with an idea! As I said in a previous comment response, I’d be more than willing to work with them to implement this.
I’m afraid that Zwift comes over to many users as a (sadly typical) “Big Tech” company that has its hands on its ears when users/subscribers are talking. They’ve developed this amazing software that everyone wants so they must know best right?
In the absence of credible (i.e. sand-bagging-free) results for ranking this system sounds good to me.
Thank you!