24 x 7 World News

Did a Chinese team ‘obscure’ early coronavirus sequences?

0

Science‘s COVID-19 reporting is supported by the Heising-Simons Foundation

Embedded ImageEmbedded Image

Many of the first COVID-19 cases were linked to a seafood market (center) in Wuhan, China, but a new analysis suggests a different strain of SARS-CoV-2 caused early cases elsewhere.

PHOTO: KYODO/AP IMAGES

In a world starved for data to clarify the origin of COVID-19, a study claiming to have unearthed early sequences of SARS-CoV-2 that were deliberately hidden was bound to ignite a sizzling debate. Last week a preprint by evolutionary biologist Jesse Bloom of the Fred Hutchinson Cancer Research Center asserted that Chinese researchers sampled viruses from some of the first COVID-19 patients in Wuhan, China, posted the viral sequences to a National Institutes of Health (NIH) database, and then later had the genetic information removed to “obscure their existence.”

The uproar over the preprint led Senator Josh Hawley (R–MO) to demand answers from NIH on why the agency agreed to “purge” the data and to call for an investigation into the matter. Even for some scientists, Bloom’s work reinforced suspicions that the Chinese government has tried to hide how the pandemic started. “This is a creative and rigorous approach to investigating the provenance of SARS-CoV-2,” says Ian Lipkin, a microbiologist at Columbia University’s Mailman School of Public Health. “There may have been active suppression of epidemiological and sequence data needed to track its origin.”

But critics of Bloom’s bioRxiv preprint call his detective work much ado about nothing, because the Chinese scientists later published the viral information in a different form, and the recovered sequences may add little to the origin hunt. “The idea that the group was trying to hide something is farcical,” says evolutionary biologist Andrew Rambaut at the University of Edinburgh.

Bloom says he has no bias toward a particular origin hypothesis for SARS-CoV-2, and he agrees that the 13 partial sequences he recovered don’t resolve whether the virus originally jumped to humans from an unknown animal or somehow leaked from a Wuhan virology center. “I don’t think this bolsters either the lab origin or zoonosis hypothesis.”

Chinese health officials on 31 December 2019 tied the Huanan Seafood Market in Wuhan to an outbreak of an “unexplained pneumonia,” but within a month it had become clear that many of the earliest COVID-19 cases had no link to the market. Bloom, who studies viral evolution, set out to investigate early cases after a controversial report on the pandemic’s origin issued in March by a joint commission of Chinese and foreign researchers overseen by the World Health Organization. The report deemed it “extremely unlikely” that SARS-CoV-2 escaped from a lab; Bloom helped organize a much-discussed letter, cosigned by 17 other scientists, criticizing that conclusion and calling for further investigation.

Bloom wanted to do his own analyses of the viruses detected in the earliest cases, which led him to a list of all SARS-CoV-2 sequences submitted before 31 March 2020 to the Sequence Read Archive (SRA), an NIH database. But when he checked the SRA for one of the listed projects, he couldn’t find its sequences. Googling some of the project’s information, he found a study, led by Ming Wang from Wuhan University’s Renmin Hospital, that had been posted as a preprint on 6 March 2020 on medRxiv and published in June of that year in Small, a journal little known to virologists. That paper lists some of the earliest COVID-19 patients in Wuhan and the specific mutations in their viruses, but doesn’t give the full sequence data. Further internet sleuthing led Bloom to discover that the SRA backs up its information in Google’s Cloud platform, and a search there turned up files containing some of the Wang team’s earlier data submissions.

The Small paper mentions no corrections to the viral sequences that might explain why they were removed from the SRA, which led Bloom to conclude that “the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan.” (In the wake of criticisms of the initial preprint, Bloom toned down this sentence and other accusatory language.) Bloom acknowledges SARS-CoV-2 sequences can be derived from the data in the Small paper, but he says most virologists expect to be able to download whole viral genomes from a database like the SRA.

Several authors of the Small paper did not reply to requests for comment, but NIH last week noted in a statement that it removed the SRA sequences at the request of the submitting investigator, who the agency says holds the rights to the data. Bloom added NIH’s email exchange to his revised preprint. “I have submitted an updated version of this SRA data to another website,” reads a 15 June email sent to NIH from a Wuhan University researcher whose name was redacted. Yet Bloom says he cannot find the sequences in any other virology database.

Bloom asserts that because the deleted sequences lack three mutations seen in COVID-19 cases linked to the seafood market, the patient viruses Wang’s team analyzed more likely represent a progenitor SARS-CoV-2. But Rambaut says the differences Bloom highlighted are too few to distinguish the “roots” of the SARS-CoV-2 family tree.

Leaving aside the meaning of the sequences Bloom found, the demonstration that researchers can potentially find “new” SARS-CoV-2 data in the cloud is an exciting advance and may prompt similar sleuthing, says genomicist Sudhir Kumar of Temple University, who has also analyzed early SARS-CoV-2 sequences. “Many people feel that there is a lot more Chinese data out there, and they don’t have access to it.”

Leave a Reply