Loft of an Eldritch Metaphor

Not Quite 30 Mutations

Posted on 29 Apr 2020

I sent out this tweet a few hours ago.

should I rant?

Needless to say, it started gaining traction and friends asked me to explain what it actually means and what it does not actually mean.

First, let’s head over to the said article on Astro Awani and let’s stop at the first paragraph because there are problems already here. I do not wish to say bad things about KKM and the Institute for Medical Research (IMR). These are all honest people working around the clock.

Holy mad cow, 30 strain mutations. We are seriously doomed, world is going to end soon. Or, is it? Let’s do a cursory analysis to trace the virality of this “30 strain mutations” claim.

Yep, definitely it went viral. I do not fault media for this because I understand that it is important to provide information to public in timely manner and sometimes it is just challenging to find an expert with domain-level knowledge. I get that. Thus, I would like to use this opportunity to explain why this claim “30 strain mutations” is inaccurate and misleading.

So, where did it come from?

I found this news article on South China Morning Post (SCMP for short), went online on 20 Apr 2020: Coronavirus’s ability to mutate has been vastly underestimated, and mutations affect deadliness of strains, Chinese study finds. I found the right keywords here:

“Li’s team detected more than 30 mutations”.

Sadly and frustratingly, SCMP does not provide a link to the actual preprint paper. This is not the first time SCMP does this and it always pained me when I had to spend 5–10 min finding the paper when the editor/writer over there could have easily added that in their news.

If you want to be a cool journalist, or if you are cool already and would like to be cooler, please cite your source especially if it came from a scientific paper.

I found the paper on medRxiv: Yao et al. 2020, patient-derived mutations impact pathogenicity of SARS-CoV-2. First thing to not be alarmed here is that the page on medRxiv says the paper was published on 23 Apr 2020 while the news release on SCMP was on 20 Apr 2020. If you look closely at the link, it has v2 at the end of it, suggesting that it is the second revision. According to my note, I first found and read the paper on 22 Apr 2020, meaning that the first revision of the paper must have gone out at least 1–2 days prior to me first reading it.

Let’s compare what SCMP says and what the preprint paper says.

SCMP says:

Li took an unusual approach to investigate the virus mutation. She analysed the viral strains isolated from 11 randomly chosen Covid-19 patients from Hangzhou in the eastern province of Zhejiang, and then tested how efficiently they could infect and kill cells.

Ah, unusual approach. What might that be? And this is what they did:

  1. Characterized 11 SARS-CoV-2 viral isolates from infected patients.
  2. Super-deep sequencing 11 viral isolates to identify mutations.
  3. Infected cells in vitro (cell line: Vero-E6 from African green monkey) and looked at the virus several hours after infection.
  4. Yeah. That’s it.

There is nothing unusual about the approach, but it is certainly an unusual approach to claim that the virus they isolated are different, dangerous, or even important to look at!

So, what is the problem, you asked?

First, the number of samples (denoted as N) is rather small. 11 samples do not and cannot inform a lot.

Second, they tested 11 viral isolates on Vero-E6 cell line. We scientists use many different cell lines to model human disease and pathology, and often times we use cell lines from animal for several reasons. For my flu research, I often use MDCK (Madin-Darby Canine Kidney, from a dog) because that is the standard. Vero-E6 is currently the standard for isolating, growing, and studying the SARS-Cov-2.

But, is it the best cell to model human disease? Of course it is not. You cannot infer what is happening in human lung by looking at what is happening in cells originated from African green monkey’s kidney. It is like comparing apple to durian. Furthermore, Vero cell line has no innate immune response. Think it like a virus barging into a house without door, windows, and there is money on the floor begging to be stolen. The virus could stroll gleefully into the house, taking everything, and leaving it without putting much effort.

If healthy cell should have robust innate immune response. The second it realizes it is being infected, it would do everything to stop the virus from gaining control. You wouldn’t even realize it is happening.

For biomedical students out there, this study lacked evidence to support their claim. They looked at the RNA copy number as their readout for increase in pathogenicity, not infectious titer (i.e. TCID50). What it means is that the authors looked at how many virus got produced after infection, but not how many of them could actually be infectious and can propagate to cause another round of infection.

But but, 30 mutations!!

The genome of coronavirus is about 30,000 nucleotides. 30 mutations is like 0.1% of the whole genome. That is not significant. Moreover, changes in nucleotides do not necessarily lead to change in amino acid composition. You need things on change on the amino acid level to have a real impact on the structure and the function of the protein, and to an extent, the virus itself. This would require a 4-hour lecture on molecular biology, so for now take my word for it.

In order changes to be important and worth looking at, they must occur at the right part of the genome (for example on the RdRp or the RBD of the S protein) and the changes must be functionally meaningful. Changing from alanine to valine would probably lead to nothing, for example.

Update, 30 Apr 2020, 12:15 AM (US Eastern Time)

I went looking for more and I found this:

They found changes on the nucleic acid level, which could or could not affect the amino acid it codes for. Further down there, they further mentioned that one of those mutations (at genomic location of A22301C) causes mutation in the spike protein (residue location at S247R, in the S1 region but not within the RBD), which could affect the antigenicity. But again, this is only 1 amino acid change. You probably need couple more to substantially alter the antigenicity of the virus.

Minor points

  1. They used analyses from Nexstrain, did not even cite that as their source in their figures nor I could find any reference to Bedford et al., the researcher behind Nextstrain. Look at their figures S3, and S8.
  2. Do people use CPE as a qualitative measure? I don’t think that is appropriate. You need something like plaque assay for that.
  3. Hit your Ctrl+F on keyboard to open up the find dialog after opening the PDF version of the preprint paper, then look for “30 mutations”. Yes that’s right, you would find nothing there. SCMP, your pants on fire and no thanks for wasting my time.