scholarly journals Clustering Metagenome Short Reads Using Weighted Proteins

Author(s):  
Gianluigi Folino ◽  
Fabio Gori ◽  
Mike S. M. Jetten ◽  
Elena Marchiori
Keyword(s):  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Omar Abou Saada ◽  
Andreas Tsouris ◽  
Chris Eberlein ◽  
Anne Friedrich ◽  
Joseph Schacherer

AbstractWhile genome sequencing and assembly are now routine, we do not have a full, precise picture of polyploid genomes. No existing polyploid phasing method provides accurate and contiguous haplotype predictions. We developed nPhase, a ploidy agnostic tool that leverages long reads and accurate short reads to solve alignment-based phasing for samples of unspecified ploidy (https://github.com/OmarOakheart/nPhase). nPhase is validated by tests on simulated and real polyploids. nPhase obtains on average over 95% accuracy and a contiguous 1.25 haplotigs per haplotype to cover more than 90% of each chromosome (heterozygosity rate ≥ 0.5%). nPhase allows population genomics and hybrid studies of polyploids.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Seth Commichaux ◽  
Kiran Javkar ◽  
Padmini Ramachandran ◽  
Niranjan Nagarajan ◽  
Denis Bertrand ◽  
...  

Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Weitao Chen ◽  
Ming Zou ◽  
Yuefei Li ◽  
Shuli Zhu ◽  
Xinhui Li ◽  
...  

AbstractGenome complexity such as heterozygosity may heavily influence its de novo assembly. Sequencing somatic cells of the F1 hybrids harboring two sets of genetic materials from both of the paternal and maternal species may avoid alleles discrimination during assembly. However, the feasibility of this strategy needs further assessments. We sequenced and assembled the genome of an F1 hybrid between Silurus asotus and S. meridionalis using the SequelII platform and Hi-C scaffolding technologies. More than 300 Gb raw data were generated, and the final assembly obtained 2344 scaffolds composed of 3017 contigs. The N50 length of scaffolds and contigs was 28.55 Mb and 7.49 Mb, respectively. Based on the mapping results of short reads generated for the paternal and maternal species, each of the 29 chromosomes originating from S. asotus and S. meridionalis was recognized. We recovered nearly 94% and 96% of the total length of S. asotus and S. meridionalis. BUSCO assessments and mapping analyses suggested that both genomes had high completeness and accuracy. Further analyses demonstrated the high collinearity between S. asotus, S. meridionalis, and the related Pelteobagrus fulvidraco. Comparison of the two genomes with that assembled only using the short reads from non-hybrid parental species detected a small portion of sequences that may be incorrectly assigned to the different species. We supposed that at least part of these situations may have resulted from mitotic recombination. The strategy of sequencing the F1 hybrid genome can recover the vast majority of the parental genomes and may improve the assembly of complex genomes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Tsung-Yu Lu ◽  
Katherine M. Munson ◽  
Alexandra P. Lewis ◽  
Qihui Zhu ◽  
Luke J. Tallon ◽  
...  

AbstractVariable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.


Author(s):  
VLADIMIR JOJIC ◽  
TOMER HERTZ ◽  
NEBOJSA JOJIC
Keyword(s):  

Author(s):  
Jacek Blazewicz ◽  
Marcin Bryja ◽  
Marek Figlerowicz ◽  
Piotr Gawron ◽  
Marta Kasprzak ◽  
...  
Keyword(s):  

2021 ◽  
Vol 10 (25) ◽  
Author(s):  
Masatoshi Tsukahara ◽  
Kotaro Ise ◽  
Maiko Nezuo ◽  
Haruna Azuma ◽  
Takeshi Akao ◽  
...  

We report here the draft genome sequence for Saccharomyces cerevisiae strain Awamori number 101, an industrial strain used for producing awamori, a distilled alcohol beverage. It was constructed by assembling the short reads obtained by next-generation sequencing. The 315 contigs constitute an 11.5-Mbp genome sequence coding 6,185 predicted proteins.


Sign in / Sign up

Export Citation Format

Share Document