Genomic Data Compression

2019 ◽  
Vol 2 (1) ◽  
pp. 19-37 ◽  
Author(s):  
Mikel Hernaez ◽  
Dmitri Pavlichin ◽  
Tsachy Weissman ◽  
Idoia Ochoa

Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.

mBio ◽  
2016 ◽  
Vol 7 (3) ◽  
Author(s):  
David M. Aanensen ◽  
Edward J. Feil ◽  
Matthew T. G. Holden ◽  
Janina Dordel ◽  
Corin A. Yeats ◽  
...  

ABSTRACTThe implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasiveStaphylococcus aureusisolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL (http://www.microreact.org/project/EkUvg9uY?tt=rc). Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show thatin silicopredictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i) large-scale structured surveys, (ii) WGS, and (iii) community-oriented database infrastructure and analysis tools.IMPORTANCEThe spread of antibiotic-resistant bacteria is a public health emergency of global concern, threatening medical intervention at every level of health care delivery. Several recent studies have demonstrated the promise of routine whole-genome sequencing (WGS) of bacterial pathogens for epidemiological surveillance, outbreak detection, and infection control. However, as this technology becomes more widely adopted, the key challenges of generating representative national and international data sets and the development of bioinformatic tools to manage and interpret the data become increasingly pertinent. This study provides a road map for the integration of WGS data into routine pathogen surveillance. We emphasize the importance of large-scale routine surveys to provide the population context for more targeted or localized investigation and the development of open-access bioinformatic tools to provide the means to combine and compare independently generated data with publicly available data sets.


2019 ◽  
Author(s):  
Silvia Argimón ◽  
Melissa A. L. Masim ◽  
June M. Gayeta ◽  
Marietta L. Lagrada ◽  
Polle K. V. Macaranas ◽  
...  

AbstractDrug-resistant bacterial infections constitute a growing threat to public health globally 1. National networks of laboratory-based surveillance of antimicrobial resistance (AMR) monitor the emergence and spread of resistance and are central to the dissemination of these data to AMR stakeholders 2. Whole-genome sequencing (WGS) can support these efforts by pinpointing resistance mechanisms and uncovering transmission patterns 3, 4. However, genomic surveillance is rare in low- and middle-income countries (LMICs), which are predicted to be the most affected by AMR 5. We implemented WGS within the established Antimicrobial Resistance Surveillance Program (ARSP) of the Philippines via ongoing technology transfer, capacity building in and binational collaboration. In parallel, we conducted an initial large-scale retrospective sequencing survey to characterize bacterial populations and dissect resistance phenotypes of key bug-drug combinations, which is the focus of this article. Starting in 2010, the ARSP phenotypic data indicated increasing carbapenem resistance rates for Pseudomonas aeruginosa, Acinetobacter baumannii, Klebsiella pneumoniae and Escherichia coli. We first identified that this coincided with a marked expansion of specific resistance phenotypes. By then linking the resistance phenotypes to genomic data, we revealed the diversity of genetic lineages (strains), AMR mechanisms, and AMR vehicles underlying this expansion. We discovered a previously unreported plasmid-driven hospital outbreak of carbapenem-resistant K. pneumoniae, uncovered the interplay of carbapenem resistance genes and plasmids in the geographic circulation of epidemic K. pneumoniae ST147, and found that carbapenem-resistant E. coli ST410 consisted of diverse lineages of global circulation that carried both international and local plasmids, resulting in a combination of carbapenemase genes variants previously unreported for this organism. Thus, the WGS data provided an enhanced understanding of the interplay between strains, genes and vehicles driving the dissemination of carbapenem resistance in the Philippines. In addition, our retrospective survey served both as the genetic background to contextualize local prospective surveillance, and as a comprehensive dataset for training in bioinformatics and genomic epidemiology. Continued prospective sequencing, capacity building and collaboration will strengthen genomic surveillance of AMR in the Philippines and the translation of genomic data into public-health action. We generated a blueprint for the integration of WGS and genomic epidemiology into an established national system of laboratory-based surveillance of AMR through international collaboration that can be adapted and utilized within other locations to tackle the global challenge of AMR.


2021 ◽  
Vol 9 (5) ◽  
pp. 955
Author(s):  
Linda Chui ◽  
Christina Ferrato ◽  
Vincent Li ◽  
Sara Christianson

Salmonella surveillance and outbreak management is a key function of public health. Laboratories are shifting from antigenic serotype determination to molecular methods including microarray or whole genome sequencing technologies. The objective of this study was to compare the Check&Trace Salmonella™ DNA microarray (CTS), a commercially available assay with the Salmonella in silico typing resource (SISTR), which uses whole genome sequencing technology for serotyping clinical Salmonella strains in Alberta, Canada, collected over an 18-month period. A high proportion of isolates (96.3%) were successfully typed by both systems. SISTR is a powerful tool for laboratories which already have a WGS infrastructure in place, whereas smaller laboratories can benefit from a commercial microarray system and reduce the processing cost per isolate compared to traditional serotyping.


2020 ◽  
Author(s):  
Femke Wolters ◽  
Jordy P.M. Coolen ◽  
Alma Tostmann ◽  
Lenneke F.J. van Groningen ◽  
Chantal P. Bleeker-Rovers ◽  
...  

AbstractBackgroundCurrent transmission rates of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are still increasing and many countries are facing second waves of infections. Rapid SARS-CoV-2 whole-genome sequencing (WGS) is often unavailable but could support public health organizations and hospitals in monitoring and determining transmission links. Here we report the use of reverse complement polymerase chain reaction (RC-PCR), a novel technology for WGS of SARS-CoV-2 enabling library preparation in a single PCR saving time, resources and enables high throughput screening. Additionally, we show SARS-CoV-2 diversity and possible transmission within the Radboud university medical center (Radboudumc) during September 2020 using RC-PCR WGS.MethodsA total of 173 samples tested positive for SARS-CoV-2 between March and September 2020 were selected for whole-genome sequencing. Ct values of the samples ranged from 16 to 42. They were collected from 83 healthcare workers and three patients at the Radboudumc, in addition to 64 people living in the area around the hospital and tested by the local health services. For validation purposes, nineteen of the included samples were previously sequenced using Oxford Nanopore Technologies and compared to RC-PCR WGS results. The applicability of RC-PCR WGS in outbreak analysis for public health service and hospitals was tested on six suspected clusters containing samples of healthcare workers and patients with an epidemiological link.FindingsRC-PCR resulted in sequencing data for 146 samples. It showed a genome coverage of up to 98,2% for samples with a maximum Ct value of 32. Comparison to Oxford Nanopore technologies gives a near-perfect agreement on 95% of the samples (18 out of 19). Three out of six clusters with a suspected epidemiological link were fully confirmed, in the others, four healthcare workers were not associated. In the public health service samples, a previously unknown chain of transmission was confirmed.Significance statementSAR-CoV-2 whole-genome sequencing using RC-PCR is a reliable technique and applicable for use in outbreak analysis and surveillance. Its ease of use, high-trough screening capacity and wide applicability makes it a valuable addition or replacement during this ongoing SARS-CoV-2 pandemic.FundingNoneResearch in contextEvidence before this studyAt present whole genome sequencing techniques for SARS-CoV-2 have a large turnover time and are not widely available. Only a few laboratories are currently able to perform large scale SARS-CoV-2 sequencing. This restricts the use of sequencing to aid hospital and community infection prevention.Added value of this studyHere we present clinical and technical data on a novel Whole Genome Sequencing technology, implementing reverse-complement PCR. It is able to obtain high genome coverage of SARS-CoV-2 and confirm and exclude epidemiological links in 173 healthcare workers and patients. The RC-PCR technology simplifies the workflow thereby reducing hands on time. It combines targeted PCR and sequence library construction in a single PCR, which normally takes several steps. Additionally, this technology can be used in concordance with the widely available range of Illumina sequencers.Implications of all the available evidenceRC-PCR whole genome sequencing technology enables rapid and targeted surveillance and response to an ongoing outbreak that has great impact on public health and society. Increased use of sequencing technologies in local laboratories can help prevent increase of SARS-CoV-2 spreading by better understanding modes of transmission.


2020 ◽  
Vol 58 (4) ◽  
Author(s):  
Ellen N. Kersh ◽  
Cau D. Pham ◽  
John R. Papp ◽  
Robert Myers ◽  
Richard Steece ◽  
...  

ABSTRACT U.S. gonorrhea rates are rising, and antibiotic-resistant Neisseria gonorrhoeae (AR-Ng) is an urgent public health threat. Since implementation of nucleic acid amplification tests for N. gonorrhoeae identification, the capacity for culturing N. gonorrhoeae in the United States has declined, along with the ability to perform culture-based antimicrobial susceptibility testing (AST). Yet AST is critical for detecting and monitoring AR-Ng. In 2016, the CDC established the Antibiotic Resistance Laboratory Network (AR Lab Network) to shore up the national capacity for detecting several resistance threats including N. gonorrhoeae. AR-Ng testing, a subactivity of the CDC’s AR Lab Network, is performed in a tiered network of approximately 35 local laboratories, four regional laboratories (state public health laboratories in Maryland, Tennessee, Texas, and Washington), and the CDC’s national reference laboratory. Local laboratories receive specimens from approximately 60 clinics associated with the Gonococcal Isolate Surveillance Project (GISP), enhanced GISP (eGISP), and the program Strengthening the U.S. Response to Resistant Gonorrhea (SURRG). They isolate and ship up to 20,000 isolates to regional laboratories for culture-based agar dilution AST with seven antibiotics and for whole-genome sequencing of up to 5,000 isolates. The CDC further examines concerning isolates and monitors genetic AR markers. During 2017 and 2018, the network tested 8,214 and 8,628 N. gonorrhoeae isolates, respectively, and the CDC received 531 and 646 concerning isolates and 605 and 3,159 sequences, respectively. In summary, the AR Lab Network supported the laboratory capacity for N. gonorrhoeae AST and associated genetic marker detection, expanding preexisting notification and analysis systems for resistance detection. Continued, robust AST and genomic capacity can help inform national public health monitoring and intervention.


2021 ◽  
pp. 1-12
Author(s):  
Holly Etchegary ◽  
Daryl Pullman ◽  
Charlene Simmonds ◽  
Zoha Rabie ◽  
Proton Rahman

<b><i>Introduction:</i></b> The growth of global sequencing initiatives and commercial genomic test offerings suggests the public will increasingly be confronted with decisions about sequencing. Understanding public attitudes can assist efforts to integrate sequencing into care and inform the development of public education and outreach strategies. <b><i>Methods:</i></b> A 48-item online survey was advertised on Facebook in Eastern Canada and hosted on SurveyMonkey in late 2018. The survey measured public interest in whole genome sequencing and attitudes toward various aspects of sequencing using vignettes, scaled, and open-ended items. <b><i>Results:</i></b> While interest in sequencing was high, critical attitudes were observed. In particular, items measuring features of patient control and choice regarding genomic data were strongly endorsed by respondents. Majority wanted to specify upfront how their data could be used, retain the ability to withdraw their sample at a later date, sign a written consent form, and speak to a genetic counselor prior to sequencing. Concerns about privacy and unauthorized access to data were frequently observed. Education level was the sociodemographic variable most often related to attitude statements such that those with higher levels of education generally displayed more critical attitudes. <b><i>Conclusions:</i></b> Attitudes identified here could be used to inform the development of implementation strategies for genomic medicine. Findings suggest health systems must address patient concerns about privacy, consent practices, and the strong desire to control what happens to their genomic data through public outreach and education. Specific oversight procedures and policies that are clearly communicated to the public will be required.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kwan Woo Kim ◽  
Sungmi Choi ◽  
Su-Kyoung Shin ◽  
Imchang Lee ◽  
Keun Bon Ku ◽  
...  

Recent coronavirus (CoV) outbreaks, including that of Middle East respiratory syndrome (MERS), have presented a threat to public health worldwide. A primary concern in these outbreaks is the extent of mutations in the CoV, and the content of viral variation that can be determined only by whole genome sequencing (WGS). We aimed to develop a time efficient WGS protocol, using universal primers spanning the entire MERS-CoV genome. MERS and synthetic Neoromicia capensis bat CoV genomes were successfully amplified using our developed PCR primer set and sequenced with MinION. All experimental and analytical processes took 6 h to complete and were also applied to synthetic animal serum samples, wherein the MERS-CoV genome sequence was completely recovered. Results showed that the complete genome of MERS-CoV and related variants could be directly obtained from clinical samples within half a day. Consequently, this method will contribute to rapid MERS diagnosis, particularly in future CoV epidemics.


2018 ◽  
Author(s):  
David R. Greig ◽  
Ulf Schafer ◽  
Sophie Octavia ◽  
Ebony Hunter ◽  
Marie A. Chattaway ◽  
...  

AbstractEpidemiological and microbiological data on Vibrio cholerae isolated between 2004 and 2017 (n=836) and held in the Public Health England culture archive were reviewed. The traditional biochemical species identification and serological typing results were compared with the genome derived species identification and serotype for a sub-set of isolates (n=152). Of the 836 isolates, 750 (89.7%) were from faecal specimens, 206 (24.6%) belonged to serogroup O1 and seven (0.8%) were serogroup O139, and 792 (94.7%) isolates from patients reporting recent travel abroad, most commonly to India (n=209) and Pakistan (n=104). Of the 152 isolates of V. cholerae speciated by kmer identification, 149 (98.1%) were concordant with the traditional biochemical approach. Traditional serotyping results were 100% concordant with the whole genome sequencing (WGS) analysis for identification of serogroups O1 and O139 and Classical and El Tor biotypes. ctxA was detected in all isolates of V. cholerae O1 El Tor and O139 belonging to sequence type (ST) 69, and in V. cholerae O1 Classical variants belonging to ST73. A phylogeny of isolates belonging to ST69 from UK travellers clustered geographically, with isolates from India and Pakistan located on separate branches. Moving forward, WGS data from UK travellers will contribute to global surveillance programs, and the monitoring of emerging threats to public health and the global dissemination of pathogenic lineages. At the national level, these WGS data will inform the timely reinforcement of direct public health messaging to travellers and mitigate the impact of imported infections and the associated risks to public health.


2020 ◽  
Author(s):  
Sivakumar Shanmugam ◽  
Nathan L Bachmann ◽  
Elena Martinez ◽  
Ranjeeta Menon ◽  
Gopalan Narendran ◽  
...  

AbstractDifferentiation between relapse and reinfection in cases with tuberculosis (TB) recurrence has important implications for public health, especially in patients with human immunodeficiency virus (HIV) co-infection. Forty-one paired M. tuberculosis isolates collected from 20 HIV-positive and 21 HIV-negative patients, who experienced TB recurrence after previous successful treatment, were subjected to whole genome sequencing (WGS) in addition to spoligotyping and mycobacterial interspersed repeat unit (MIRU) typing. Comparison of M. tuberculosis genomes indicated that 95% of TB recurrences in the HIV-negative cohort were due to relapse, while the majority of TB recurrences (75%) in the HIV-positive cohort was due to re-infection (P=0.0001). Drug resistance conferring mutations were documented in four pairs (9%) of isolates associated with relapse. The high contribution of re-infection to TB among HIV patients warrants further study to explore risk factors for TB exposure in the community.


Sign in / Sign up

Export Citation Format

Share Document