Thursday, December 11, 2008

Malcolm Gladwell's Outliers and making the NHL

In his book Outliers, Malcolm Gladwell discusses the odd distribution of birth months among NHL players. Because youth players are registered in leagues based on their year of birth, the biggest and strongest players tend to be those born in the first few months of the year. This selection process starts as early as age 8, and the effect persists more than a decade later in junior hockey in Canada. The effect has been visible for decades:

One thing that's very unintuitive about this effect is that, other things being equal, if you have a 17-year-old player who puts up the same number of points as an 18-year-old player, the 17-year-old will have a much higher performance ceiling. I like to call this the 'Wayne Gretzky-Dan Hodgson' effect - two players who had identical stats their last year in junior; but Gretzky was 16 and Hodgson was 19, and so it was obvious who would have the better NHL career.

At any rate, if you have identical junior players born in January and December, the December player was almost a year younger when he achieved his performance. If we project that performance forward to Age 23, then we'd expect the December player, on average, to be better. And this is an effect we see when we compare the birthdates of junior hockey players to NHL players:

The first time that players aren't strictly grouped by birthdate is when they reach professional leagues. At this point, younger players outperform older players by a wide margin, making the jump from junior hockey to the NHL at a 50% higher rate. Gladwell mentions in his ESPN interview that "Canada is squandering the talents of hundreds of boys with late birthdays." It seems pretty clear that he's right.

Labels: ,

One thing that's missing from this analysis is how this distribution compares to the birtdates of all hockey players. Intuitively we think that birth dates are distributed evenly throughout the year. But are they? Only if the distribution of NHL birthdates differ markedly from the birthdates of all hockey players, are the data significant.
yes, birthdays are uniformly distributed throughout the year.
I believe Gladwell claims the birthdates are *not* uniformly distributed in the NHL. You would expect 25% of the players to be born Jan-Feb-March but he claims the number is 40+%. The explanation for the anomaly is the early age of hockey "streaming". The jan born 8 year old competing at tryouts against the Dec. born 8 year old is almost a year older and that is a fair gap. The Jan. born is more likely to make rep and the Dec. born is more likely to be in house. Once that house/rep split happens the outcome is determined (more training, better coaching = more chance of NHL).
Colin, Elaine, Slapshot - I think you're all missing the point here. Junior players have a much more extreme birthdate distribution than NHL players do. This occurs because the NHL has the time to look at a player's ultimate ceiling instead of how he's going to do just this year. So the advantage that players get from Q1 birthdates up until they're 18 gets inverted when they get to the pros.
How can you have an advantage from ages 8 to 18 get "inverted"? Imagine that when you were 8 your parents send you to the best private school - and they send your twin brother to a poorly funded public school. After 10 years of school you both get to apply to University. Even if they don't even see your school transcripts - chances are your going to do better on the entrance exam.

So the crux of the issue is not the ceiling thing - its that you never get a chance at being an NHL prospect if you got relegated to house league from an early age. You don't get the same coaching as the rep kid. Thats not reversible later.

Gladwell claims that the NHL is just as poorly distributed as juniors...does anyone have the data?
OK here's the data from ESPN posted
Updated: December 8, 2008

of the 512 NHL Players:
Jan-Mar 31%
Apr-Jun 28%
Jul-Sep 22%
Oct-Dec 19%
Colin and Elaine - read my most recent post on this topic. Overall, the ratio of Q1 to Q4 birthdays in the NHL is about 1.5:1 or so (31:19 ~ 1.65:1). But among high-scoring players, it's about 1.2:1. The imbalance is much higher among NHL players with a high number of penalties per game, and that skews the overall data.

Among the very best players, being younger is ultimately a huge advantage - but it doesn't show up until they're trying to make the jump to the NHL.
Are the NHL data for Canadian players only?

If it includes players who came up / were selected under systems which have a different cut-off date or variable cut-off dates this will blur the association.
Something like 98% of Canadian junior players came up through the Canadian system (1980-pres).
I just read Outliers - he seems a good story teller, but IMO he missed the net on the hockey study for a few reasons, basically drawing conclusions from the birth dates of one team from a cold part of Canada.

The following data is from Feb 15 2010.

NHL Players
- born Jan-April. 136 CDN players, or 34.3% of Canadians in NHL (meanwhile, 32.9% of 1991 live births were in Jan-Apr.) +1.4%
- May-August. 146 CDN players, or 36.8% of Canadians in NHL (33.8% of 1991 live births) +3.0%
- Sept-Dec. 115 players, or 29.0% of Canadians in NHL (32.3% of 1991 live births) -3.3%.

His findings may work for junior hockey, but not the NHL. The "outlier" is May-August born Canadian hockey players currently in the NHL, who are over-represented, based on when Canadians are born in the year.

Also, sociologists Lam and Miron (1991) cover the distribution of births in Canada by month, over time.
Post a Comment

Links to this post:

Create a Link

<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]