Pages

Sunday, March 17, 2013

Where did all the Cat 5s go?

Background

The flame war on the D20 and VAcycling email lists about why category 5s don't show up to race motivated me to publish at least some preliminary results of a USAC member retention analysis I've done on several occasions over the years.

Since 2008 I've downloaded a weekly snapshot of the USAcycling member database. I  also have individual snapshots back to 2006. Before 2006, I only have data for the Mid-Atlantic. This trove of data makes it possible to follow the progression of an individual rider, and perhaps put some solid numbers into the discussion of why riders stay in the sport.

Methodology

I should probably do this as a Sweave document, in support of Reproducible Research, but it's Sunday evening, and this is blog. So there. Contact me if you want a custom analysis or if you want me to email you the script. Then you can see what a crappy R coder I am. 

Here's what I did
  1. Read in the USAC rider database closest to the end of each calendar year, since all licensed riders appear in the late December snapshot.  Each database snapshot is about 50000 records of twenty or so variables, including age, category, gender, and city/state.
  2. Exploit the fact that USAC issues licenses in numerical order to identify which licensees are new for each year.  (A more rigorous approach could specifically identify which licenses don't appear in previous years.) 
  3. For each First Year of License (2006-2012) identify the "freshman class" of racers. 
  4. For each subsequent year for that class, find the subset of racers from the first year class who are still licensed and add these to the master dataframe (using "rbind"). 
  5. Iterate on that dataframe to produce some summary statistics. (I'm sure I could have used some tapply mojo here, but maybe I'm really a fortran programmer at heart...) 

Limitations

  • This method will specifically catch someone who misses a few years and the relicenses.
  • The analysis is for the whole USA, and not the MidAtlantic, though that analysis is also possible. 

Results

First lets look at the total number of new racers per year
The top red curve shows that USAC has issued about 9000 new licenses each year since 2006. For some reason 2010 was a banner year for new licenses, that ended a  four-year decline. The lower curves on the plot below show the number of those racers who are still licensed in subsequent years. Since 2012 just ended, there is no year 1 data for 2012. The takeaway is that about 45%  of the first-year racers do not renew their licenses for a subsequent season. 

We can look at the fractions instead of absolute counts as well.
The plot above shows the "lifetime" of a racer. I've overplotted the data by year of first license, but the differences don't look significant to me. In other words, 2012 new racers are just like new racers from 2006. After only a year, 45% of the racers have quit the sport. By the end of the third year, only 36% of the racers remain, and after six years, only 20% remain. 

Finally, we can also look at the category progression for those racers. 
The plot above is a little complicated. The individual panels break out by road category (1-5). Some trends are interesting (to me at least). 
  • Amazingly, after six years, 15% of the cadre are still category 5. Who are these people? Mountain bike racers who reflexively take out a road license each year and never use it? 
  • About 30% of the new licensees upgrade from Category 5 in the first year. 
  • About 0.16% of new licensees make it all the way to Category 1 in the first year. 

Conclusions

  • "Infant mortality" is significant for new bicycle racers. About 45% don't come back for a second season.
  • The progression of riders through the sport is relatively unchanged over the past six years. 
Solutions to the infant mortality problem (if it is indeed a problem) are left to the reader. If solutions are even necessary. I could argue that it's better to let in 100 people and have 45 of them quit, than to only have 65 people, but keep them all. 

5 comments:

  1. Perhaps the rising cost of triathalons hit some tipping point in 2010 which pushed a number of the tri-crowd into cycling. (or the recession encouraged a scale back to the cheaper option). Either way its still interesting the the percent that stick remains steady.

    ReplyDelete
    Replies
    1. I think it's more likely that the huge spike in 2010 is due to the insane popularity of cyclocross (see the comment from marc below). Still thinking about ways to control for CX participants as distinct from "roadies"

      Delete
  2. Bill- I think most CX racers do so on a road license. Even if you race a lot of 'cross, you don't progress past cat 5 on the road. -marc

    ReplyDelete
    Replies
    1. Interesting thought. I could re-do the analysis and control for CX category. We could really make some progress if USAC would open up its results API for queries. Then we could look at people who actually showed up at a race.

      Delete
    2. Of course, the retention rate is unchanged since 06, despite the phenomenal growth of CX during that period.

      Delete