Thursday, April 28, 2011

Consider Data Collection Biases

I am thinking about writing a series of posts in which each post illustrates how I look critically at scientific/medical data.

For instance, for each of the years 1999-2007 inclusive, the CDC has published a long document that says how many death certificates issued in that year were for people of each year of age, as well as what race and sex were put down for those people. Here is the link:

Now, there is lots of data to be found in those tables. One thing that struck me was that 68 people are listed as having died at ages 115 or older in that 9 year period. OK, so far so good. Out of those 68, only 18 had their race listed as white. Only 11 out of the 68 had a recorded sex of male, with three recorded as white male.

Since roughly 85% of all deaths were in white people, and roughly 50% of all deaths were in male people, this means that if recorded age at death didn't have to do with sex or race, you'd expect that 57 or 58, rather than 18, of the people who died at age 115 or higher would be white. You'd also expect that about 34, rather than 11, would be male. So in fact, being white or male cut down the odds of making it to age 115 on the death certificate by two thirds.
Looking only at rates of people who've made it to 100, the picture warps further. In people whose death is recorded at age 100 or greater, men make up about 1 in 6, so that there being 11/68 of them 115 or older is not at all surprising. White people, however, make up about 10/11 of people with death recorded at 100 or greater, making their absence in the group of people 115 or greater even stranger.

Okay, good, now let's step back from the data for a moment. If you were a newspaper journalist and you wanted to report on the statistics above, you could choose a lot of different slants.
You could claim, "African Americans Weather Last Years Better, says CDC."
You could claim, "Whites Unlikely to Be Longest Lived."
You could claim, "Minorities Reach Majority at age 112."

Or you could be a little less certain. The death rates as I see them do prove something. They prove that in the United States during the period 1999-2007, White persons were less likely to have an age greater than 112 written on their death certificates. We could be reasonably certain that this also translates to a similar expectation for 2008, 2009, and 2010, since the trend was not really changing, and the margin was very muchly significant.

What we couldn't know, however, is whether those records represent the actual ages at which people died. Maybe, for some reason, a gene that makes death less likely in the eleventh and twelfth decade of life is more common in non-white populations, or non-whites do a better job with elder care, or something. Maybe the CDC statistics do represent that. I like to think that's so, just because it's kind of a cool possibility.
But the other possibility is that age is reported differently based on race (or based on something that is statistically different in different racial groups). Maybe, a hundred years ago, white people were more likely to have accurate birth certificates. Maybe for cultural or other reasons, non-white persons are more likely to have their ages misrepresented on their death certificates. Maybe, particularly since these people were born in the era before social security numbers, Black people were more likely to claim to be older than they were when applying for such things. Maybe doctors are sloppier when writing up death certificates for non-white persons.

Which leads to another tenet of mine: if data can support more than one conclusion... look for more data!
In this case, I don't have access to the other data that would help, but the kind of data I'd like to see are survival rates to age 115 or older in a variety of other countries. Do countries with better or worse record keeping have similar or different results? Will this trend in death tables continue past the years where all people dying were likely to have gotten social security numbers at birth? If we look at the people with no age written on American death certificates, will we find a racial bias? Will it be enough bias to explain our data away?
What will CDC data show in twenty or thirty years, when we can expect that the data collection at the births of our oldest citizens will have been much improved?

Not at all D-related: I have a kindergarten chess class on Wednesdays and yesterday in class, one of the students came up and tattled, "Nico said the B-word."
So I look, kind of dubiously, at Nico, who is looking teary eyed. And the kid insists, "He did! He said, 'Sit on your...'" and the kid leaned in and whispered, "butt".
It was all I could do not to laugh.

1 comment:

Reyna said...

My kids think the "S" word is "shut-up". Gotta love the innocence. So cute. AND interesting on the CDC stuff. Again, I love how you don't take anything at face value...the "easy" way I do. You are the analyzer and the skeptic. LOVE IT.