Clustering Metagenome Short Reads Using Weighted Proteins

0748 - Let-it-bin an optimised workflow for binning metagenomic short reads from multiple samples

10.26226/morressier.5b5199c0b1b87b000ecf0365 ◽

2018 ◽

Author(s):

Quentin Letourneur

Keyword(s):

Short Reads ◽

Multiple Samples

Download Full-text

An Error Correction and DeNovo Assembly Approach for Nanopore Reads Using Short Reads

Current Bioinformatics ◽

10.2174/1574893612666170530073736 ◽

2018 ◽

Vol 13 (3) ◽

pp. 241-252 ◽

Cited By ~ 2

Author(s):

Mehdi Kchouk ◽

Mourad Elloumi

Keyword(s):

Error Correction ◽

Short Reads

Download Full-text

nPhase: an accurate and contiguous phasing method for polyploids

Genome Biology ◽

10.1186/s13059-021-02342-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Omar Abou Saada ◽

Andreas Tsouris ◽

Chris Eberlein ◽

Anne Friedrich ◽

Joseph Schacherer

Keyword(s):

Genome Sequencing ◽

Population Genomics ◽

Short Reads ◽

Link Type ◽

Long Reads

AbstractWhile genome sequencing and assembly are now routine, we do not have a full, precise picture of polyploid genomes. No existing polyploid phasing method provides accurate and contiguous haplotype predictions. We developed nPhase, a ploidy agnostic tool that leverages long reads and accurate short reads to solve alignment-based phasing for samples of unspecified ploidy (https://github.com/OmarOakheart/nPhase). nPhase is validated by tests on simulated and real polyploids. nPhase obtains on average over 95% accuracy and a contiguous 1.25 haplotigs per haplotype to cover more than 90% of each chromosome (heterozygosity rate ≥ 0.5%). nPhase allows population genomics and hybrid studies of polyploids.

Download Full-text

Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads

BMC Genomics ◽

10.1186/s12864-021-07702-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Seth Commichaux ◽

Kiran Javkar ◽

Padmini Ramachandran ◽

Niranjan Nagarajan ◽

Denis Bertrand ◽

...

Keyword(s):

Public Health ◽

Public Health Response ◽

High Quality ◽

Short Read ◽

Short Reads ◽

The Core ◽

Long Reads ◽

Health Response ◽

Long Read ◽

Core Genes

Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.

Download Full-text

A Novel Binning Algorithm Using Topic Modelling and k-mer Frequency on Groups of Non-Overlapping Short Reads

2020 5th International Conference on Green Technology and Sustainable Development (GTSD) ◽

10.1109/gtsd50082.2020.9303095 ◽

2020 ◽

Author(s):

Hoang D. Quach ◽

Hoang T. Lam ◽

Dang H. N. Nguyen ◽

Phuong V. D. Van ◽

Van Hoai Tran

Keyword(s):

Topic Modelling ◽

Short Reads

Download Full-text

Sequencing an F1 hybrid of Silurus asotus and S. meridionalis enabled the assembly of high-quality parental genomes

Scientific Reports ◽

10.1038/s41598-021-93257-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Weitao Chen ◽

Ming Zou ◽

Yuefei Li ◽

Shuli Zhu ◽

Xinhui Li ◽

...

Keyword(s):

De Novo ◽

Parental Species ◽

F1 Hybrids ◽

Pelteobagrus Fulvidraco ◽

F1 Hybrid ◽

Short Reads ◽

Final Assembly ◽

Genome Complexity ◽

Hybrid Genome ◽

Silurus Asotus

AbstractGenome complexity such as heterozygosity may heavily influence its de novo assembly. Sequencing somatic cells of the F1 hybrids harboring two sets of genetic materials from both of the paternal and maternal species may avoid alleles discrimination during assembly. However, the feasibility of this strategy needs further assessments. We sequenced and assembled the genome of an F1 hybrid between Silurus asotus and S. meridionalis using the SequelII platform and Hi-C scaffolding technologies. More than 300 Gb raw data were generated, and the final assembly obtained 2344 scaffolds composed of 3017 contigs. The N50 length of scaffolds and contigs was 28.55 Mb and 7.49 Mb, respectively. Based on the mapping results of short reads generated for the paternal and maternal species, each of the 29 chromosomes originating from S. asotus and S. meridionalis was recognized. We recovered nearly 94% and 96% of the total length of S. asotus and S. meridionalis. BUSCO assessments and mapping analyses suggested that both genomes had high completeness and accuracy. Further analyses demonstrated the high collinearity between S. asotus, S. meridionalis, and the related Pelteobagrus fulvidraco. Comparison of the two genomes with that assembled only using the short reads from non-hybrid parental species detected a small portion of sequences that may be incorrectly assigned to the different species. We supposed that at least part of these situations may have resulted from mitotic recombination. The strategy of sequencing the F1 hybrid genome can recover the vast majority of the parental genomes and may improve the assembly of complex genomes.

Download Full-text

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs

Nature Communications ◽

10.1038/s41467-021-24378-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Tsung-Yu Lu ◽

Katherine M. Munson ◽

Alexandra P. Lewis ◽

Qihui Zhu ◽

Luke J. Tallon ◽

...

Keyword(s):

Tandem Repeats ◽

Traditional Approach ◽

Variable Number Tandem Repeat ◽

Variable Number ◽

Population Diversity ◽

Protein Coding ◽

Short Reads ◽

Repeat Structure ◽

Continental Population ◽

Develop Software

AbstractVariable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

Download Full-text

POPULATION SEQUENCING USING SHORT READS: HIV AS A CASE STUDY

Biocomputing 2008 ◽

10.1142/9789812776136_0013 ◽

2007 ◽

Cited By ~ 2

Author(s):

VLADIMIR JOJIC ◽

TOMER HERTZ ◽

NEBOJSA JOJIC

Keyword(s):

Short Reads

Download Full-text

A new algorithm for genome assembly from short reads

2008 1st International Conference on Information Technology ◽

10.1109/inftech.2008.4621681 ◽

2008 ◽

Cited By ~ 1

Author(s):

Jacek Blazewicz ◽

Marcin Bryja ◽

Marek Figlerowicz ◽

Piotr Gawron ◽

Marta Kasprzak ◽

...

Keyword(s):

Genome Assembly ◽

Short Reads

Download Full-text

Draft Genome Sequence of Saccharomyces cerevisiae Strain Awamori Number 101, Commonly Used to Make Awamori, a Traditional Spirit, in Okinawa, Japan

Microbiology Resource Announcements ◽

10.1128/mra.01414-20 ◽

2021 ◽

Vol 10 (25) ◽

Author(s):

Masatoshi Tsukahara ◽

Kotaro Ise ◽

Maiko Nezuo ◽

Haruna Azuma ◽

Takeshi Akao ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

Next Generation Sequencing ◽

Genome Sequence ◽

Draft Genome ◽

Draft Genome Sequence ◽

Next Generation ◽

Industrial Strain ◽

Content Type ◽

Short Reads ◽

Generation Sequencing

We report here the draft genome sequence for Saccharomyces cerevisiae strain Awamori number 101, an industrial strain used for producing awamori, a distilled alcohol beverage. It was constructed by assembling the short reads obtained by next-generation sequencing. The 315 contigs constitute an 11.5-Mbp genome sequence coding 6,185 predicted proteins.

Download Full-text