Hidden figures, misleading numbers
Hidden figures, misleading numbers
In times of
uncertainty, it is natural for people to seek solace in “hard data”. And the
lay press obliges with a deluge of numbers: daily new cases and deaths, number
of tests, scary estimates from Covid risk-calculators, vaccination efficacy
figures, and many more. But the provenance of some of these widely publicized
numbers is hardly ever clear to the consumer. Even when the origins are
apparent, the analyses and the jargon surrounding it are a very effective
deterrent to understanding. How many of us will persevere on in the face of “proportional-hazards
regression models”, with some “gradient boosted trees” thrown in for good
measure! Even more exasperating are the misleading ways that some simple data
are presented to the public. In this post, I want to highlight a couple of
instances, one relating simply to bad presentation, and the other possibly
arising out of misinterpretation of the data. Now, please do note that I am not
accusing anyone of deliberate wrongdoing here (though in some cases, I can’t
help but wonder if unconscious biases are at play!).
Vaccine deployment: Jabbering on
As I write
this, vaccination drives are in full swing in many countries around the world. And
people are keeping score. But what I find irksome is, how even some of the most
renowned bastions of scientific reporting, wish to present these data. The percentage
of the population vaccinated is the preferred metric with most sources, and is
presented with great fanfare (literally in some cases! https://www.economist.com/podcasts/the-jab-a-new-podcast-from-the-economist).
By this measure, it would appear that Israel, Bahrain and Serbia are doing a
fantastic job, while most other countries are hopeless laggards. Now, I
completely understand that the proportion of people vaccinated is important to
understand how far away one is from achieving herd immunity. But, ranking
countries on the basis of this relative measure alone is akin to comparing athletes
running a 100 metre dash to those running a full marathon! Israel has no doubt
vaccinated over 60% of its population, but to be fair, the country’s entire
population is just about 9 million, equivalent to half the population of New
Delhi or Mexico City! (Figure) Serbia is even smaller. Nevertheless, we have
experts extolling others to learn from the Israeli or Serbian vaccination
model. I would think that people would want to learn from countries who are
doing it at speed. But, China and India don’t figure in the “rankings” though
each of them (and the US) have vaccinated over 100 million citizens in a 100
days or fewer. Paradoxically, when it comes to the number of cases or deaths,
it’s the absolute numbers that are put out prominently, when it would make more
sense to report it as a percentage of the country’s population! A more populous
country would of course have more cases! But that’s not usual practice. I found
this report particularly galling, which somehow manages to highlight the bad
news and make light of a significant
milestone:
India says it has become the
"fastest country in the world" to administer more than 100 million
doses of coronavirus vaccines…But the country reported a record daily increase
of over 150,000 cases - and more than 800 new deaths - on Sunday. (https://www.bbc.com/news/world-asia-india-56345591)
(Maybe the
unnecessary quotation marks were more irritating, but that is a discussion for
a different forum)
Figure: Total vaccination doses administered (millions)
Data as of
10th April 2021 (Source: https://ourworldindata.org/covid-vaccinations)
Artificial intelligence, common sense,
and COVID-19 risk calculators
There are
numerous risk calculators out there that tell you your risk of getting
hospitalized, or dying of the disease. I was intrigued by one of the
well-publicized estimators (https://www.economist.com/graphic-detail/covid-pandemic-mortality-risk-estimator)
because it predicted a rather high average case fatality rate (CFR) of 2%. The
CFR is simply the proportion of people with a disease who die. The data for
this calculator were obtained from a large US electronic health database (https://covid19researchdatabase.org/)
of people who came in contact with their healthcare providers, and received a
diagnosis of COVID-19. A lot of hard work, and artificial intelligence (think “gradient-boosted
trees” here!) has apparently gone into building the model, in order for it to
reliably predict the risk of death, based on characteristics such as age and
the presence of co-morbidities (https://www.economist.com/graphic-detail/2021/03/11/how-we-built-our-covid-19-risk-estimator).
The paper’s journalists understandably appear somewhat alarmed by the potentially
high risk of death if they were to test positive. But it should be obvious to
anyone that there is a potential for serious bias here. Most people who are
asymptomatic or have mild symptoms (the majority of COVID-19 infections) are
unlikely to be represented in this database. The infection fatality rate (IFR)
would be a more appropriate piece of information that would be of interest to
people who test positive for diseases such as COVID-19. But the IFR is
difficult to determine. (https://www.who.int/bulletin/volumes/99/1/20-265892/en/)
Other calculators such as QCovid (developed by the Oxford University) have
taken a more pragmatic approach to estimating the risk of mortality, which tries
to account for asymptomatic and mild infections. This calculator presents the
risk of catching and dying of COVID-19
in the community, based on your age, sex and comorbidities. One can indirectly
estimate the IFR from this calculator if we input a value for the risk of catching
the disease in the community.* I calculated the risk of death for a 45 year old
man without comorbidities using both these calculators, and sure enough, the Economist’s calculator produced a risk
that was an order of magnitude greater
than that from the QCovid estimator.** (https://qcovid.org/Calculation;
and https://www.economist.com/graphic-detail/covid-pandemic-mortality-risk-estimator)
If only the correspondents (and the readers) of the Economist were to read this piece, they would feel less alarmed,
and be more sceptical of their data journalists’ work.
Numbers do
provide objective information that may be preferable to subjective and qualitative
data. But not all numbers provide valid information. But I do think that on
most occasions, armed with a little patience, common sense, and some middle-school
math, we can avoid drawing misleading conclusions.
____________________________________________________________________
* The risk
of catching and dying of COVID-19 = (the
risk of catching COVID-19 in the community) x (the risk of dying of the
infection once you’ve caught it; ie., the IFR)
IFR = (The
risk of catching and dying of
COVID-19)/ (the risk of catching COVID-19 in the community)
** For a 45
year old white male without comorbidities the QCovid estimates the risk of catching
and dying of COVID-19 to be 0.0031%. If one estimates the risk of catching the
disease in the community to be 10% (based on seroprevalence in the UK), the IFR
turns out to be 0.031%. This is a tenth of the estimate from the Economist’s calculator (0.3%).
Theoretically, the only time both calculators would give the same result is when
your risk of catching COVID-19 is a 100%!

Outstanding text
ReplyDeleteBoy. U do take ur time to dig out the stats!!
ReplyDelete