Hidden figures, misleading numbers

Hidden figures, misleading numbers

In times of uncertainty, it is natural for people to seek solace in “hard data”. And the lay press obliges with a deluge of numbers: daily new cases and deaths, number of tests, scary estimates from Covid risk-calculators, vaccination efficacy figures, and many more. But the provenance of some of these widely publicized numbers is hardly ever clear to the consumer. Even when the origins are apparent, the analyses and the jargon surrounding it are a very effective deterrent to understanding. How many of us will persevere on in the face of “proportional-hazards regression models”, with some “gradient boosted trees” thrown in for good measure! Even more exasperating are the misleading ways that some simple data are presented to the public. In this post, I want to highlight a couple of instances, one relating simply to bad presentation, and the other possibly arising out of misinterpretation of the data. Now, please do note that I am not accusing anyone of deliberate wrongdoing here (though in some cases, I can’t help but wonder if unconscious biases are at play!).

Vaccine deployment: Jabbering on

As I write this, vaccination drives are in full swing in many countries around the world. And people are keeping score. But what I find irksome is, how even some of the most renowned bastions of scientific reporting, wish to present these data. The percentage of the population vaccinated is the preferred metric with most sources, and is presented with great fanfare (literally in some cases! https://www.economist.com/podcasts/the-jab-a-new-podcast-from-the-economist). By this measure, it would appear that Israel, Bahrain and Serbia are doing a fantastic job, while most other countries are hopeless laggards. Now, I completely understand that the proportion of people vaccinated is important to understand how far away one is from achieving herd immunity. But, ranking countries on the basis of this relative measure alone is akin to comparing athletes running a 100 metre dash to those running a full marathon! Israel has no doubt vaccinated over 60% of its population, but to be fair, the country’s entire population is just about 9 million, equivalent to half the population of New Delhi or Mexico City! (Figure) Serbia is even smaller. Nevertheless, we have experts extolling others to learn from the Israeli or Serbian vaccination model. I would think that people would want to learn from countries who are doing it at speed. But, China and India don’t figure in the “rankings” though each of them (and the US) have vaccinated over 100 million citizens in a 100 days or fewer. Paradoxically, when it comes to the number of cases or deaths, it’s the absolute numbers that are put out prominently, when it would make more sense to report it as a percentage of the country’s population! A more populous country would of course have more cases! But that’s not usual practice. I found this report particularly galling, which somehow manages to highlight the bad news and make light of a significant milestone:

India says it has become the "fastest country in the world" to administer more than 100 million doses of coronavirus vaccines…But the country reported a record daily increase of over 150,000 cases - and more than 800 new deaths - on Sunday. (https://www.bbc.com/news/world-asia-india-56345591)

(Maybe the unnecessary quotation marks were more irritating, but that is a discussion for a different forum)

Figure: Total vaccination doses administered (millions)

Data as of 10^th April 2021 (Source: https://ourworldindata.org/covid-vaccinations)

Artificial intelligence, common sense, and COVID-19 risk calculators

There are numerous risk calculators out there that tell you your risk of getting hospitalized, or dying of the disease. I was intrigued by one of the well-publicized estimators (https://www.economist.com/graphic-detail/covid-pandemic-mortality-risk-estimator) because it predicted a rather high average case fatality rate (CFR) of 2%. The CFR is simply the proportion of people with a disease who die. The data for this calculator were obtained from a large US electronic health database (https://covid19researchdatabase.org/) of people who came in contact with their healthcare providers, and received a diagnosis of COVID-19. A lot of hard work, and artificial intelligence (think “gradient-boosted trees” here!) has apparently gone into building the model, in order for it to reliably predict the risk of death, based on characteristics such as age and the presence of co-morbidities (https://www.economist.com/graphic-detail/2021/03/11/how-we-built-our-covid-19-risk-estimator). The paper’s journalists understandably appear somewhat alarmed by the potentially high risk of death if they were to test positive. But it should be obvious to anyone that there is a potential for serious bias here. Most people who are asymptomatic or have mild symptoms (the majority of COVID-19 infections) are unlikely to be represented in this database. The infection fatality rate (IFR) would be a more appropriate piece of information that would be of interest to people who test positive for diseases such as COVID-19. But the IFR is difficult to determine. (https://www.who.int/bulletin/volumes/99/1/20-265892/en/) Other calculators such as QCovid (developed by the Oxford University) have taken a more pragmatic approach to estimating the risk of mortality, which tries to account for asymptomatic and mild infections. This calculator presents the risk of catching and dying of COVID-19 in the community, based on your age, sex and comorbidities. One can indirectly estimate the IFR from this calculator if we input a value for the risk of catching the disease in the community.* I calculated the risk of death for a 45 year old man without comorbidities using both these calculators, and sure enough, the Economist’s calculator produced a risk that was an order of magnitude greater than that from the QCovid estimator.** (https://qcovid.org/Calculation; and https://www.economist.com/graphic-detail/covid-pandemic-mortality-risk-estimator) If only the correspondents (and the readers) of the Economist were to read this piece, they would feel less alarmed, and be more sceptical of their data journalists’ work.

Numbers do provide objective information that may be preferable to subjective and qualitative data. But not all numbers provide valid information. But I do think that on most occasions, armed with a little patience, common sense, and some middle-school math, we can avoid drawing misleading conclusions.

____________________________________________________________________

* The risk of catching and dying of COVID-19 = (the risk of catching COVID-19 in the community) x (the risk of dying of the infection once you’ve caught it; ie., the IFR)

IFR = (The risk of catching and dying of COVID-19)/ (the risk of catching COVID-19 in the community)

** For a 45 year old white male without comorbidities the QCovid estimates the risk of catching and dying of COVID-19 to be 0.0031%. If one estimates the risk of catching the disease in the community to be 10% (based on seroprevalence in the UK), the IFR turns out to be 0.031%. This is a tenth of the estimate from the Economist’s calculator (0.3%). Theoretically, the only time both calculators would give the same result is when your risk of catching COVID-19 is a 100%!

Search This Blog

Random ramblings

Hidden figures, misleading numbers

Comments

Post a Comment

Popular posts from this blog

What’s so sacrosanct about the 10,000-step goal?

Revisiting Rama: A lament about the decline of reading

How to win at Wordl: Or how to suck the fun out of word games