Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data

Mapping Intimacies ◽

10.1101/026922 ◽

2015 ◽

Cited By ~ 2

Author(s):

Jeremy A. Frank ◽

Yao Pan ◽

Ave Tooming-Klunderud ◽

Vincent G.H. Eijsink ◽

Alice C. McHardy ◽

...

Keyword(s):

Sequence Data ◽

Consensus Sequence ◽

Dna Assembly ◽

Illumina Hiseq ◽

Average Contig Length ◽

Long Read ◽

And Performance ◽

And Function ◽

Taxonomic Binning ◽

Large Contigs

DNA assembly is a core methodological step in metagenomic pipelines used to study the structure and function within microbial communities. Here we investigate the utility of Pacific Biosciences long and high accuracy circular consensus sequencing (CCS) reads for metagenomics projects. We compared the application and performance of both PacBio CCS and Illumina HiSeq data with assembly and taxonomic binning algorithms using metagenomic samples representing a complex microbial community. Eight SMRT cells produced approximately 94 Mb of CCS reads from a biogas reactor microbiome sample, which averaged 1319 nt in length and 99.7 % accuracy. CCS data assembly generated a comparative number of large contigs greater than 1 kb, to those assembled from a ~190x larger HiSeq dataset (~18 Gb) produced from the same sample (i.e approximately 62 % of total contigs). Hybrid assemblies using PacBio CCS and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length and number of large contigs. The incorporation of CCS data produced significant enhancements in taxonomic binning and genome reconstruction of two dominant phylotypes, which assembled and binned poorly using HiSeq data alone. Collectively these results illustrate the value of PacBio CCS reads in certain metagenomics applications.

Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data

Scientific Reports ◽

10.1038/srep25373 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 87

Author(s):

J. A. Frank ◽

Y. Pan ◽

A. Tooming-Klunderud ◽

V. G. H. Eijsink ◽

A. C. McHardy ◽

...

Keyword(s):

Sequence Data ◽

Consensus Sequence ◽

Long Read ◽

Taxonomic Binning

Draft Genome Sequence of Pandrug-Resistant Pseudomonas aeruginosa SPA03, Isolated from a Patient with Benign Prostatic Hyperplasia

Microbiology Resource Announcements ◽

10.1128/mra.00336-21 ◽

2021 ◽

Vol 10 (22) ◽

Author(s):

Chanakya Pachi Pulusu ◽

Balaram Khamari ◽

Manmath Lama ◽

Arun Sai Kumar Peketi ◽

Prakash Kumar ◽

...

Keyword(s):

Pseudomonas Aeruginosa ◽

Benign Prostatic Hyperplasia ◽

Sequence Data ◽

Draft Genome ◽

Prostatic Hyperplasia ◽

Illumina Hiseq ◽

Content Type ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

The draft genome of pandrug-resistant Pseudomonas aeruginosa strain SPA03, which belongs to global high-risk sequence type 357 (ST357) and was isolated from a patient with benign prostatic hyperplasia, is presented in this report. The genome assembly was generated by combining short-read Illumina HiSeq-X Ten and long-read Oxford Nanopore Technologies MinION sequence data using the Unicycler assembler.

Annotated mitochondrial genome with Nanopore R9 signal for Nippostrongylus brasiliensis

F1000Research ◽

10.12688/f1000research.10545.1 ◽

2017 ◽

Vol 6 ◽

pp. 56 ◽

Cited By ~ 11

Author(s):

Jodie Chandler ◽

Mali Camberis ◽

Tiffany Bouchery ◽

Mark Blaxter ◽

Graham Le Gros ◽

...

Keyword(s):

Mitochondrial Genome ◽

Reference Genome ◽

De Novo ◽

Consensus Sequence ◽

Nippostrongylus Brasiliensis ◽

Hookworm Infection ◽

Illumina Hiseq ◽

Long Read ◽

Achievable Goal ◽

Parasitic Life Cycle

Nippostrongylus brasiliensis, a nematode parasite of rodents, has a parasitic life cycle that is an extremely useful model for the study of human hookworm infection, particularly in regards to the induced immune response. The current reference genome for this parasite is highly fragmented with minimal annotation, but new advances in long-read sequencing suggest that a more complete and annotated assembly should be an achievable goal. We de-novo assembled a single contig mitochondrial genome from N. brasiliensis using MinION R9 nanopore data. The assembly was error-corrected using existing Illumina HiSeq reads, and annotated in full (i.e. gene boundary definitions without substantial gaps) by comparing with annotated genomes from similar parasite relatives. The mitochondrial genome has also been annotated with a preliminary electrical consensus sequence, using raw signal data generated from a Nanopore R9 flow cell.

Bioinformatic Analysis of Structure and Function of LIM Domains of Human Zyxin Family Proteins

International Journal of Molecular Sciences ◽

10.3390/ijms22052647 ◽

2021 ◽

Vol 22 (5) ◽

pp. 2647

Author(s):

M. Quadir Siddiqui ◽

Maulik D. Badmalia ◽

Trushar R. Patel

Keyword(s):

Nucleic Acid ◽

Nuclear Export ◽

Consensus Sequence ◽

Nuclear Export Signal ◽

Bioinformatic Analysis ◽

Nucleic Acid Binding ◽

Protein Protein Interaction ◽

Lim Domains ◽

Protein Nucleic Acid ◽

And Function

Members of the human Zyxin family are LIM domain-containing proteins that perform critical cellular functions and are indispensable for cellular integrity. Despite their importance, not much is known about their structure, functions, interactions and dynamics. To provide insights into these, we used a set of in-silico tools and databases and analyzed their amino acid sequence, phylogeny, post-translational modifications, structure-dynamics, molecular interactions, and functions. Our analysis revealed that zyxin members are ohnologs. Presence of a conserved nuclear export signal composed of LxxLxL/LxxxLxL consensus sequence, as well as a possible nuclear localization signal, suggesting that Zyxin family members may have nuclear and cytoplasmic roles. The molecular modeling and structural analysis indicated that Zyxin family LIM domains share similarities with transcriptional regulators and have positively charged electrostatic patches, which may indicate that they have previously unanticipated nucleic acid binding properties. Intrinsic dynamics analysis of Lim domains suggest that only Lim1 has similar internal dynamics properties, unlike Lim2/3. Furthermore, we analyzed protein expression and mutational frequency in various malignancies, as well as mapped protein-protein interaction networks they are involved in. Overall, our comprehensive bioinformatic analysis suggests that these proteins may play important roles in mediating protein-protein and protein-nucleic acid interactions.

Genome sequences of human cytomegalovirus strain TB40/E variants propagated in fibroblasts and epithelial cells

Virology Journal ◽

10.1186/s12985-021-01583-3 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Ahmed Al Qaffas ◽

Salvatore Camiolo ◽

Mai Vo ◽

Alexis Aguiar ◽

Amine Ourahmane ◽

...

Keyword(s):

Epithelial Cells ◽

Human Cytomegalovirus ◽

Viral Entry ◽

Sequence Data ◽

Laboratory Strain ◽

Serial Passage ◽

Wild Type Virus ◽

Protein Coding ◽

Genetic Changes ◽

Long Read

AbstractThe advent of whole genome sequencing has revealed that common laboratory strains of human cytomegalovirus (HCMV) have major genetic deficiencies resulting from serial passage in fibroblasts. In particular, tropism for epithelial and endothelial cells is lost due to mutations disrupting genes UL128, UL130, or UL131A, which encode subunits of a virion-associated pentameric complex (PC) important for viral entry into these cells but not for entry into fibroblasts. The endothelial cell-adapted strain TB40/E has a relatively intact genome and has emerged as a laboratory strain that closely resembles wild-type virus. However, several heterogeneous TB40/E stocks and cloned variants exist that display a range of sequence and tropism properties. Here, we report the use of PacBio sequencing to elucidate the genetic changes that occurred, both at the consensus level and within subpopulations, upon passaging a TB40/E stock on ARPE-19 epithelial cells. The long-read data also facilitated examination of the linkage between mutations. Consistent with inefficient ARPE-19 cell entry, at least 83% of viral genomes present before adaptation contained changes impacting PC subunits. In contrast, and consistent with the importance of the PC for entry into endothelial and epithelial cells, genomes after adaptation lacked these or additional mutations impacting PC subunits. The sequence data also revealed six single noncoding substitutions in the inverted repeat regions, single nonsynonymous substitutions in genes UL26, UL69, US28, and UL122, and a frameshift truncating gene UL141. Among the changes affecting protein-coding regions, only the one in UL122 was strongly selected. This change, resulting in a D390H substitution in the encoded protein IE2, has been previously implicated in rendering another viral protein, UL84, essential for viral replication in fibroblasts. This finding suggests that IE2, and perhaps its interactions with UL84, have important functions unique to HCMV replication in epithelial cells.

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Physical activity in elderly

European Journal of Translational Myology ◽

10.4081/ejtm.2015.5280 ◽

2015 ◽

Vol 25 (4) ◽

pp. 249 ◽

Cited By ~ 21

Author(s):

Jan Cvecka ◽

Veronika Tirpakova ◽

Milan Sedliak ◽

Helmut Kern ◽

Winfried Mayr ◽

...

Keyword(s):

Physical Activity ◽

Electrical Stimulation ◽

Muscle Mass ◽

Irreversible Process ◽

Significant Decline ◽

Positive Effects ◽

Age Related ◽

And Performance ◽

And Function ◽

Age Related Changes

Aging is a multifactorial irreversible process associated with significant decline in muscle mass and neuromuscular functions. One of the most efficient methods to counteract age-related changes in muscle mass and function is physical exercise. An alternative effective intervention to improve muscle structure and performance is electrical stimulation. In the present work we present the positive effects of physical activity in elderly and a study where the effects of a 8-week period of functional electrical stimulation and strength training with proprioceptive stimulation in elderly are compared.

A species-specific satellite DNA from the entomopathogenic nematode Heterorhabditis indicus

Genome ◽

10.1139/g98-005 ◽

1998 ◽

Vol 41 (2) ◽

pp. 148-153 ◽

Cited By ~ 8

Author(s):

Monique Abadon ◽

Eric Grenier ◽

Christian Laumond ◽

Pierre Abad

Keyword(s):

Dna Sequence ◽

Satellite Dna ◽

Tandem Repeats ◽

Sequence Data ◽

Entomopathogenic Nematode ◽

Consensus Sequence ◽

Repeated Sequence ◽

Nucleotide Sequence Analysis ◽

Specific Sequence ◽

Species Specific

An AluI satellite DNA family has been cloned from the entomopathogenic nematode Heterorhabditis indicus. This repeated sequence appears to be an unusually abundant satellite DNA, since it constitutes about 45% of the H. indicus genome. The consensus sequence is 174 nucleotides long and has an A + T content of 56%, with the presence of direct and inverted repeat clusters. DNA sequence data reveal that monomers are quite homogeneous. Such homogeneity suggests that some mechanism is acting to maintain the homogeneity of this satellite DNA, despite its abundance, or that this repeated sequence could have appeared recently in the genome of H. indicus. Hybridization analysis of genomic DNAs from different Heterorhabditis species shows that this satellite DNA sequence is specific to the H. indicus genome. Considering the species specificity and the high copy number of this AluI satellite DNA sequence, it could provide a rapid and powerful tool for identifying H. indicus strains.Key words: AluI repeated DNA, tandem repeats, species-specific sequence, nucleotide sequence analysis.

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions

Briefings in Bioinformatics ◽

10.1093/bib/bby017 ◽

2018 ◽

Vol 20 (4) ◽

pp. 1542-1559 ◽

Cited By ~ 44

Author(s):

Damla Senol Cali ◽

Jeremie S Kim ◽

Saugata Ghose ◽

Can Alkan ◽

Onur Mutlu

Keyword(s):

Sequence Analysis ◽

Genome Assembly ◽

Sequence Data ◽

Error Rates ◽

Nanopore Sequencing ◽

Memory Usage ◽

Sequencing Technology ◽

Assembly Pipeline ◽

And Performance ◽

Polishing Tool

Abstract Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.

Predicting Chromosome Flexibility from the Genomic Sequence Based on Deep Learning Neural Networks

Current Bioinformatics ◽

10.2174/1574893616666210827095829 ◽

2021 ◽

Vol 16 ◽

Author(s):

Jinghao Peng ◽

Jiajie Peng ◽

Haiyin Piao ◽

Zhang Luo ◽

Kelin Xia ◽

...

Keyword(s):

Deep Learning ◽

High Performance ◽

Genomic Sequence ◽

Sequence Data ◽

Function Analysis ◽

Double Helix ◽

Gm12878 Cell ◽

Genomic Sequence Analysis ◽

And Function ◽

Nuclear Processes

Background: The open and accessible regions of the chromosome are more likely to be bound by transcription factors which are important for nuclear processes and biological functions. Studying the change of chromosome flexibility can help to discover and analyze disease markers and improve the efficiency of clinical diagnosis. Current methods for predicting chromosome flexibility based on Hi-C data include the flexibility-rigidity index (FRI) and the Gaussian network model (GNM), which have been proposed to characterize chromosome flexibility. However, these methods require the chromosome structure data based on 3D biological experiments, which is time-consuming and expensive. Objective: Generally, the folding and curling of the double helix sequence of DNA have a great impact on chromosome flexibility and function. Motivated by the success of genomic sequence analysis in biomolecular function analysis, we hope to propose a method to predict chromosome flexibility only based on genomic sequence data. Method: We propose a new method (named "DeepCFP") using deep learning models to predict chromosome flexibility based on only genomic sequence features. The model has been tested in the GM12878 cell line. Results: The maximum accuracy of our model has reached 91%. The performance of DeepCFP is close to FRI and GNM. Conclusion: The DeepCFP can achieve high performance only based on genomic sequence.