genome reconstruction Latest Research Papers

Rapid screening and identification of viral pathogens in metagenomic data

BMC Medical Genomics ◽

10.1186/s12920-021-01138-z ◽

2021 ◽

Vol 14 (S6) ◽

Author(s):

Shiyang Song ◽

Liangxiao Ma ◽

Xintian Xu ◽

Han Shi ◽

Xuan Li ◽

...

Keyword(s):

Viral Genome ◽

Rapid Screening ◽

Metagenomic Data ◽

Viral Sequence ◽

Viral Pathogens ◽

Viral Pathogen ◽

Genome Reconstruction ◽

Screening And Identification ◽

Metagenomics Data ◽

Ngs Data

Abstract Background Virus screening and viral genome reconstruction are urgent and crucial for the rapid identification of viral pathogens, i.e., tracing the source and understanding the pathogenesis when a viral outbreak occurs. Next-generation sequencing (NGS) provides an efficient and unbiased way to identify viral pathogens in host-associated and environmental samples without prior knowledge. Despite the availability of software, data analysis still requires human operations. A mature pipeline is urgently needed when thousands of viral pathogen and viral genome reconstruction samples need to be rapidly identified. Results In this paper, we present a rapid and accurate workflow to screen metagenomics sequencing data for viral pathogens and other compositions, as well as enable a reference-based assembler to reconstruct viral genomes. Moreover, we tested our workflow on several metagenomics datasets, including a SARS-CoV-2 patient sample with NGS data, pangolins tissues with NGS data, Middle East Respiratory Syndrome (MERS)-infected cells with NGS data, etc. Our workflow demonstrated high accuracy and efficiency when identifying target viruses from large scale NGS metagenomics data. Our workflow was flexible when working with a broad range of NGS datasets from small (kb) to large (100 Gb). This took from a few minutes to a few hours to complete each task. At the same time, our workflow automatically generates reports that incorporate visualized feedback (e.g., metagenomics data quality statistics, host and viral sequence compositions, details about each of the identified viral pathogens and their coverages, and reassembled viral pathogen sequences based on their closest references). Conclusions Overall, our system enabled the rapid screening and identification of viral pathogens from metagenomics data, providing an important piece to support viral pathogen research during a pandemic. The visualized report contains information from raw sequence quality to a reconstructed viral sequence, which allows non-professional people to screen their samples for viruses by themselves (Additional file 1).

ChromeBat: A Bio-Inspired Approach to 3D Genome Reconstruction

Genes ◽

10.3390/genes12111757 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1757

Author(s):

Brandon Collins ◽

Oluwatosin Oluwadare ◽

Philip Brown

Keyword(s):

Genome Structure ◽

Three Dimensional ◽

Genomic Analysis ◽

Bat Algorithm ◽

Empirical Measure ◽

Reconstruction Problem ◽

Genome Reconstruction ◽

3D Genome ◽

A Genome ◽

Contact Data

With the advent of Next Generation Sequencing and the Hi-C experiment, high quality genome-wide contact data are becoming increasingly available. These data represents an empirical measure of how a genome interacts inside the nucleus. Genome conformation is of particular interest as it has been experimentally shown to be a driving force for many genomic functions from regulation to transcription. Thus, the Three Dimensional-Genome Reconstruction Problem (3D-GRP) seeks to take Hi-C data and produces a complete physical genome structure as it appears in the nucleus for genomic analysis. We propose and develop a novel method to solve the Chromosome and Genome Reconstruction problem based on the Bat Algorithm (BA) which we called ChromeBat. We demonstrate on real Hi-C data that ChromeBat is capable of state-of-the-art performance. Additionally, the domain of Genome Reconstruction has been criticized for lacking algorithmic diversity, and the bio-inspired nature of ChromeBat contributes algorithmic diversity to the problem domain. ChromeBat is an effective approach for solving the Genome Reconstruction Problem.

poreCov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing

Frontiers in Genetics ◽

10.3389/fgene.2021.711437 ◽

2021 ◽

Vol 12 ◽

Author(s):

Christian Brandt ◽

Sebastian Krautwurst ◽

Riccardo Spott ◽

Mara Lohde ◽

Mateusz Jundzill ◽

...

Keyword(s):

Sequence Data ◽

Viral Evolution ◽

Relevant Information ◽

Data Sets ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Genome Reconstruction ◽

Long Read ◽

Major Bottleneck ◽

Parallel Workflow

In response to the SARS-CoV-2 pandemic, a highly increased sequencing effort has been established worldwide to track and trace ongoing viral evolution. Technologies, such as nanopore sequencing via the ARTIC protocol are used to reliably generate genomes from raw sequencing data as a crucial base for molecular surveillance. However, for many labs that perform SARS-CoV-2 sequencing, bioinformatics is still a major bottleneck, especially if hundreds of samples need to be processed in a recurring fashion. Pipelines developed for short-read data cannot be applied to nanopore data. Therefore, specific long-read tools and parameter settings need to be orchestrated to enable accurate genotyping and robust reference-based genome reconstruction of SARS-CoV-2 genomes from nanopore data. Here we present poreCov, a highly parallel workflow written in Nextflow, using containers to wrap all the tools necessary for a routine SARS-CoV-2 sequencing lab into one program. The ease of installation, combined with concise summary reports that clearly highlight all relevant information, enables rapid and reliable analysis of hundreds of SARS-CoV-2 raw sequence data sets or genomes. poreCov is freely available on GitHub under the GNUv3 license: github.com/replikation/poreCov.

Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing

10.1101/2021.06.14.448389 ◽

2021 ◽

Author(s):

Gherman Uritskiy ◽

Maximillian Press ◽

Christine Sun ◽

Guillermo Dominguez Huerta ◽

Ahmed A. Zayed ◽

...

Keyword(s):

Viral Genome ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Fecal Microbiome ◽

Genome Reconstruction ◽

Viral Genomes ◽

Proximity Ligation ◽

Sequencing Technologies ◽

Microbiome Research ◽

Long Read

Viruses play crucial roles in the ecology of microbial communities, yet they remain relatively understudied in their native environments. Despite many advancements in high-throughput whole-genome sequencing (WGS), sequence assembly, and annotation of viruses, the reconstruction of full-length viral genomes directly from metagenomic sequencing is possible only for the most abundant phages and requires long-read sequencing technologies. Additionally, the prediction of their cellular hosts remains difficult from conventional metagenomic sequencing alone. To address these gaps in the field and to accelerate the study of viruses directly in their native microbiomes, we developed an end-to-end bioinformatics platform for viral genome reconstruction and host attribution from metagenomic data using proximity-ligation sequencing (i.e., Hi-C). We demonstrate the capabilities of the platform by recovering and characterizing the metavirome of a variety of metagenomes, including a fecal microbiome that has also been sequenced with accurate long reads, allowing for the assessment and benchmarking of the new methods. The platform can accurately extract numerous near-complete viral genomes even from highly fragmented short-read assemblies and can reliably predict their cellular hosts with minimal false positives. To our knowledge, this is the first software for performing these tasks. Being significantly cheaper than long-read sequencing of comparable depth, the incorporation of proximity-ligation sequencing in microbiome research shows promise to greatly accelerate future advancements in the field.

What do Eulerian and Hamiltonian cycles have to do with genome assembly?

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008928 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1008928

Author(s):

Paul Medvedev ◽

Mihai Pop

Keyword(s):

Genome Assembly ◽

Linear Time ◽

Hamiltonian Cycles ◽

De Bruijn Graphs ◽

Genome Reconstruction ◽

Assembly Algorithm ◽

A Genome ◽

De Bruijn ◽

Do So

Many students are taught about genome assembly using the dichotomy between the complexity of finding Eulerian and Hamiltonian cycles (easy versus hard, respectively). This dichotomy is sometimes used to motivate the use of de Bruijn graphs in practice. In this paper, we explain that while de Bruijn graphs have indeed been very useful, the reason has nothing to do with the complexity of the Hamiltonian and Eulerian cycle problems. We give 2 arguments. The first is that a genome reconstruction is never unique and hence an algorithm for finding Eulerian or Hamiltonian cycles is not part of any assembly algorithm used in practice. The second is that even if an arbitrary genome reconstruction was desired, one could do so in linear time in both the Eulerian and Hamiltonian paradigms.

Genetic insights into the dark matter of the mammalian gut microbiota through targeted genome reconstruction

Environmental Microbiology ◽

10.1111/1462-2920.15559 ◽

2021 ◽

Author(s):

Gabriele Andrea Lugli ◽

Giulia Alessandri ◽

Christian Milani ◽

Alice Viappiani ◽

Federico Fontana ◽

...

Keyword(s):

Dark Matter ◽

Gut Microbiota ◽

Genome Reconstruction

poreCov - an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing

10.1101/2021.05.07.443089 ◽

2021 ◽

Author(s):

Christian Brandt ◽

Sebastian Krautwurst ◽

Riccardo Spott ◽

Mara Lohde ◽

Mateusz Jundzill ◽

...

Keyword(s):

Sequence Data ◽

Viral Evolution ◽

Relevant Information ◽

Data Sets ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Genome Reconstruction ◽

Long Read ◽

Major Bottleneck ◽

Parallel Workflow

In response to the SARS-CoV-2 pandemic, a highly increased sequencing effort has been established worldwide to track and trace ongoing viral evolution. Technologies such as nanopore sequencing via the ARTIC protocol are used to reliably generate genomes from raw sequencing data as a crucial base for molecular surveillance. However, for many labs that perform SARS-CoV-2 sequencing, bioinformatics is still a major bottleneck, especially if hundreds of samples need to be processed in a recurring fashion. Pipelines developed for short-read data cannot be applied to nanopore data. Therefore, specific long-read tools and parameter settings need to be orchestrated to enable accurate genotyping and robust reference-based genome reconstruction of SARS-CoV-2 genomes from nanopore data. Here we present poreCov, a highly parallel workflow written in Nextflow, using containers to wrap all the tools necessary for a routine SARS-CoV-2 sequencing lab into one program. The ease of installation, combined with concise summary reports that clearly highlight all relevant information, enables rapid and reliable analysis of hundreds of SARS-CoV-2 raw sequence data sets or genomes. poreCov is freely available on GitHub under the GNUv3 license: github.com/replikation/poreCov.

Genome Reconstruction problem : By using De Bruijn graph

10.1109/caibda53561.2021.00019 ◽

2021 ◽

Author(s):

YuTong Li

Keyword(s):

De Bruijn Graph ◽

Reconstruction Problem ◽

Genome Reconstruction ◽

De Bruijn

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0036 ◽

2021 ◽

Vol 2021 (3) ◽

pp. 28-48

Author(s):

Kerem Ayoz ◽

Erman Ayday ◽

A. Ercument Cicek

Keyword(s):

Data Sharing ◽

Scientific Progress ◽

Genomic Data ◽

Substantial Part ◽

Genome Reconstruction ◽

Eye Color ◽

Genome Data ◽

Hair Type ◽

Major Bottleneck ◽

Inference Attacks

Abstract Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an attacker can use the inherent correlations in the genome and clustering techniques to run such an attack in an efficient and accurate way. We also show that even if multiple individuals are added to the beacon during the same update, it is possible to identify the victim’s genome with high confidence using traits that are easily accessible by the attacker (e.g., eye color or hair type). Moreover, we show how a reconstructed genome using a beacon that is not associated with a sensitive phenotype can be used for membership inference attacks to beacons with sensitive phenotypes (e.g., HIV+). The outcome of this work will guide beacon operators on when and how to update the content of the beacon and help them (along with the beacon participants) make informed decisions.

ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections

Genomics ◽

10.1016/j.ygeno.2021.04.008 ◽

2021 ◽

Author(s):

Luca Marcolungo ◽

Cristina Beltrami ◽

Chiara Degli Esposti ◽

Giulia Lopatriello ◽

Chiara Piubelli ◽

...

Keyword(s):

Clinical Samples ◽

Viral Diversity ◽

Genome Reconstruction

genome reconstruction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Rapid screening and identification of viral pathogens in metagenomic data

ChromeBat: A Bio-Inspired Approach to 3D Genome Reconstruction

poreCov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing

Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing

What do Eulerian and Hamiltonian cycles have to do with genome assembly?

Genetic insights into the dark matter of the mammalian gut microbiota through targeted genome reconstruction

poreCov - an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing

Genome Reconstruction problem : By using De Bruijn graph

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections

Export Citation Format

genome reconstructionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Rapid screening and identification of viral pathogens in metagenomic data

ChromeBat: A Bio-Inspired Approach to 3D Genome Reconstruction

poreCov-An Easy to Use, Fast, and Robust Workflow for SARS-CoV-2 Genome Reconstruction via Nanopore Sequencing

Accurate viral genome reconstruction and host assignment with proximity-ligation sequencing

What do Eulerian and Hamiltonian cycles have to do with genome assembly?

Genetic insights into the dark matter of the mammalian gut microbiota through targeted genome reconstruction

poreCov - an easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing

Genome Reconstruction problem : By using De Bruijn graph

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections

genome reconstruction
Recently Published Documents