On the trustworthiness of tests

Susan Brommer | 19 August 2020 | statistics

Test, test, test

In the midst of the corona crisis, the entire world tries to control the virus’ spread by encouraging personal hygiene, social distancing and travel restrictions. But doctor Tedros Adhanom Ghebreyesus warns that this is not enough. He says that “we must break the chains of transmission.” To do that, we must do one thing: “Test, test, test.” It is not possible to fight a virus if we don’t know where it is. Or, in doctor Tedros’ words, “We cannot fight a fire blindfolded.”

There are lots of tests that can tell you whether you have a current infection, or if you had a previous infection. But no medical test is perfect, and now and then a test will give an incorrect result. To show the trustworthiness of tests, scientists often use the terms Sensitivity and Specificity. A test can, for example, have a 99% Sensitivity and a 95% Specificity. This might seem a good test, but don’t be fooled by the high percentages. These numbers do not necessarily indicate the test is as trustworthy as you think.

What are these terms anyway?

Let’s first look at the different situations you can be in when you get tested. First, you either are Infected with the virus, or you are Healthy. Second, you either test Positive for the virus, or you test Negative. This gives four situations:

You are Infected and the test correctly comes back Positive.
We call this the True Positive.
You are Infected, but the test incorrectly comes back Negative.
We call this the False Negative.
You are Healthy and the test correctly comes back Negative.
We call this the True Negative.
You are Healthy, but the test incorrectly comes back Positive.
We call this the False Positive.

		Actual
		Infected 🤒	Healthy 😀
Result	Positive ✔	True Positive	False Positive
Result	Negative ✖	False Negative	True Negative

Back to the Sensitivity and Specificity. Sensitivity denotes how many of the Infected people get a Positive test result. We want this number to be high: if you are Infected, you want the test to show you are Infected.

Mathematically, Sensitivity is how many True Positives (TP) we find amongst the Positives (P). So say we test 100 Infected people, and 99 of those test Positive. Then the Sensitivity is 99 / 100 = 99%.

Sensitivity = TP / P

Specificity denotes how many of the Healthy people get a Negative test result. We want this number too to be high: if you are Healthy, you want the test to show you are Healthy.

Mathematically, Specificity is how many True Negatives (TN) we find amongst the Negatives (N). So say we test 100 Healthy people, and 95 of those test Negative. Then the Specificity is 95 / 100 = 95%.¹

Specificity = TN / N

Sounds legit to me. What’s wrong with it?

A test with 99% Sensitivity and 95% Specificity is a good test, right? Well, not necessarily. The information we want a test to give us is not expressed in Sensitivity or Specificity. Let’s explain this with an example.

Say we live in a population of 1000 people, of which 100 people are Infected and 900 people are Healthy. A test with 99% Sensitivity and 95% Specificity would give the following results:

99 out of 100 Infected people test Positive (True Positives);
the other 1 Infected person incorrectly tests Negative (False Negative);
855 out of 900 Healthy people test Negative (True Negatives);
the other 45 Healthy people incorrectly test Positive (False Positive).

		Actual
		Infected 🤒	Healthy 😀	Total
Result	Positive ✔	TP = 99	FP = 45	P = 144
Result	Negative ✖	FN = 1	TN = 855	N = 856
	Total	I = 100	H = 900

Now say you get tested and want to know the probability that the result is correct. Let’s assume your test result is Negative. We are thus interested in the probability you are Healthy. The table shows 856 people test Negative of which 855 are Healthy. That is 855 / 856 = 99.9%. So you can be 99.9% sure you are Healthy if the test result is Negative. We call this probability the Negative Predictive Value, or NPV.

Now assume your test result is Positive. We are thus interested in the probability you are Infected. The table shows 144 people tested Positive, and 99 of them are Infected. That is 99 / 144 = 69%. So you are only 69% sure you are Infected. We call this number the Positive Predictive Value, or PPV.

Think about this number. 69% Positively tested people actually are Infected, and 31% are Healthy. That is roughly a 1 in 3 chance that you are Healthy, despite testing positive. That is not very convincing, is it?

		Actual
		Infected 🤒	Healthy 😀	Total
Result	Positive ✔	TP = 99	FP = 45	P = 144	PPV = 69%
Result	Negative ✖	FN = 1	TN = 855	N = 856	NPV = 99.9%
	Total	I = 100	H = 900
		Sensitivity = 99%	Specificity = 95%

How does this work?

So we have all these high probabilities: 99% Sensitivity, 95% Specificity, and 99.9% Negative Predictive Value. Why is the Positive Predictive Value of 69% so much lower? The explanation lies in the infection’s prevalence. That is the probability you are infected if you know nothing about your test result.

Of the 1000 people, 900 are Healthy. This means that without a test, you would already have a 900 / 1000 = 90% probability of being Healthy. A Positive test result bumps the number up a little to 99.9%. But it was already pretty likely you were Healthy.

On the other hand, of the 1000 people, 100 are Infected. This means that without a test, you would only have a 100 / 1000 = 10% probability of being Infected. A Positive test result bumps the number way up to 69%. So knowing the test is Positive, makes it a lot more likely you are Infected. But there is still a chance you are not Infected at all, because the probability you were Infected in the first place was very low to start with.

So we should use the NPV and PPV instead of Sensitivity and Specificity?

You might wonder why we even use Sensitivity and Specificity if we are only interested in the Negative and Positive Predictive Value. This is because the Specificity and Sensitivity do not depend on the prevalence of the infection. No matter how many people in the population are Infected, the Sensitivity and Specificity stay the same. In contrast, Negative and Positive Predictive Value depend on how many people are Infected or Healthy.

Let’s look at this. Our previous example showed that only 69% of the Positively tested people were Infected. Now look at the example below, where not 100, but 500 people of the population are Infected. The Sensitivity and Specificity stay the same. However, out of the 520 people that test Positive, 495 are Infected. That is 495 / 520 = 95%, which is much more than the 69%. So we can now be 95% sure that the test is correct, because it was already likely that you were Infected before doing the test.

		Actual
		Infected 🤒	Healthy 😀	Total
Result	Positive ✔	TP = 495	FP = 25	P = 520	PPV = 95%
Result	Negative ✖	FN = 5	TN = 475	N = 480	NPV = 99%
	Total	I = 500	H = 500
		Sensitivity = 99%	Specificity = 95%

Summarising, even though we are interested in the Negative and Positive Predictive Value, they are not good measures for the trustworthiness of tests, for these values change with the prevalence of the infection. Sensitivity and Specificity, however, do not depend on the prevalence of the infection. These values depend only on the test itself.

So 69% is the best we can do?

In most countries the number of people infected by the corona virus is closer to the 10% of our first example than the 50% of our second example. The question is: is 69% the best we can do with a 99% Sensitivity and 95% Specificity test? Lucky for us, it is not. We can do different things to be more certain of the result of the test. Let’s briefly go over them:

Often when your test result comes back Positive, you get tested a second time. If this also comes back Positive, you are more certain you are Infected. Specifically, in our first example, the second Positive test result makes the probability you are infected go up from 69% to 90%.²

Another thing we could do is take into account more information. A nurse working with Infected people, who also just came back from a holiday in a risk area, and has a Positively tested spouse, has a higher chance of being Infected than a hermit who has not been in contact with any living being for the past weeks. You could say this information changes the prevalence that the Negative and Positive Predictive Values depend on.

Last, we should remind ourselves that numbers are not the whole story. The numbers are used to decide what we do. So we should take into account the consequence of our actions. You might want to be over 69% sure you are Infected before you start an expensive and risky treatment. But 69% sure is enough to decide not to visit your grandparents for a few weeks to not risk infecting them.

Footnotes

Often there is a trade-off between Sensitivity and Specificity. A test can be made more sensitive by letting it give a positive result more often. This means the True Positive value goes up, and so the Sensitivity goes up. But increasing the number of positive results also often increases the number of False Positives, which means the Specificity goes down. And vice versa. ↩
This is not entirely true, since there might be an underlying factor that causes the test to be Positive. ↩