scholarly journals rKOMICS: an R package for processing mitochondrial minicircle assemblies in population-scale genome projects

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Manon Geerts ◽  
Achim Schnaufer ◽  
Frederik Van den Broeck

Abstract Background The advent of population-scale genome projects has revolutionized our biological understanding of parasitic protozoa. However, while hundreds to thousands of nuclear genomes of parasitic protozoa have been generated and analyzed, information about the diversity, structure and evolution of their mitochondrial genomes remains fragmentary, mainly because of their extraordinary complexity. Indeed, unicellular flagellates of the order Kinetoplastida contain structurally the most complex mitochondrial genome of all eukaryotes, organized as a giant network of homogeneous maxicircles and heterogeneous minicircles. We recently developed KOMICS, an analysis toolkit that automates the assembly and circularization of the mitochondrial genomes of Kinetoplastid parasites. While this tool overcomes the limitation of extracting mitochondrial assemblies from Next-Generation Sequencing datasets, interpreting and visualizing the genetic (dis)similarity within and between samples remains a time-consuming process. Results Here, we present a new analysis toolkit—rKOMICS—to streamline the analyses of minicircle sequence diversity in population-scale genome projects. rKOMICS is a user-friendly R package that has simple installation requirements and that is applicable to all 27 trypanosomatid genera. Once minicircle sequence alignments are generated, rKOMICS allows to examine, summarize and visualize minicircle sequence diversity within and between samples through the analyses of minicircle sequence clusters. We showcase the functionalities of the (r)KOMICS tool suite using a whole-genome sequencing dataset from a recently published study on the history of diversification of the Leishmania braziliensis species complex in Peru. Analyses of population diversity and structure highlighted differences in minicircle sequence richness and composition between Leishmania subspecies, and between subpopulations within subspecies. Conclusion The rKOMICS package establishes a critical framework to manipulate, explore and extract biologically relevant information from mitochondrial minicircle assemblies in tens to hundreds of samples simultaneously and efficiently. This should facilitate research that aims to develop new molecular markers for identifying species-specific minicircles, or to study the ancestry of parasites for complementary insights into their evolutionary history.

Plant Disease ◽  
2021 ◽  
Author(s):  
Xuejin Cui ◽  
Kehong Liu ◽  
Jie Huang ◽  
Shimin Fu ◽  
Qingdong Chen ◽  
...  

Citrus Huanglongbing (HLB) is present in 10 provinces in China and is associated with “Candidatus Liberibacter asiaticus” (CLas), which is transmitted by the Asian citrus psyllid (Diaphorina citri, ACP). To date, HLB and ACP have expanded to Yibin city of Sichuan Province, posing an imminent threat to the citrus belt of upper and middle reach of Yangtze River, an important late maturing citrus-producing area in China. To understand the epidemiological route of CLas and ACP in newly invaded regions of Sichuan and thereby better establish an HLB-interception zone ranging from Leibo to Yibin, we evaluated the molecular variability of 19 CLas draft genomes from citrus or dodder (Cuscuta campestris). They include three type-specific prophage loci, three variable numbers of tandem repeat (VNTR) loci, a miniature inverted-repeat transposable element (MITE) types, and population diversity of 44 ACP mitochondrial genomes. The results indicated that CLas isolates in the newly invaded area (Pingshan) were more diverse than those in the HLB endemic areas (Leibo and Ningnan). Phylogenetic analysis based on mitochondrial genomes demonstrated that ACPs in Leibo, Pingshan and Xuzhou (rural areas) represent a new group (MG4), distinguished by the three unique SNPs in cox1, nad4 and cytb. However, the ACPs sampled from the urban areas of Cuiping and Xuzhou belonged to the southeastern China group (MG2-1). Altogether, our study revealed multiple sources of ACP and CLas in the HLB-interception zone and proposed their transmission route. This study contributes to the formulation of precise HLB prevention and control strategies in the HLB-interception zone in Sichuan and could be useful for HLB management efforts in other regions.


2018 ◽  
Author(s):  
Zhe Sun ◽  
Li Chen ◽  
Hongyi Xin ◽  
Qianhui Huang ◽  
Anthony R Cillo ◽  
...  

AbstractThe recently developed droplet-based single cell transcriptome sequencing (scRNA-seq) technology makes it feasible to perform a population-scale scRNA-seq study, in which the transcriptome is measured for tens of thousands of single cells from multiple individuals. Despite the advances of many clustering methods, there are few tailored methods for population-scale scRNA-seq studies. Here, we have developed a BAyesiany Mixture Model for Single Cell sequencing (BAMM-SC) method to cluster scRNA-seq data from multiple individuals simultaneously. Specifically, BAMM-SC takes raw data as input and can account for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Results from extensive simulations and application of BAMM-SC to in-house scRNA-seq datasets using blood, lung and skin cells from humans or mice demonstrated that BAMM-SC outperformed existing clustering methods with improved clustering accuracy and reduced impact from batch effects. BAMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/~Cwec47/singlecell.html.


2020 ◽  
Author(s):  
Cory D. Dunn

AbstractPhylogenetic analyses can take advantage of multiple sequence alignments as input. These alignments typically consist of homologous nucleic acid or protein sequences, and the inclusion of outlier or aberrant sequences can compromise downstream analyses. Here, I describe a program, SequenceBouncer, that uses the Shannon entropy values of alignment columns to identify outlier alignment sequences in a manner responsive to overall alignment context. I demonstrate the utility of this software using alignments of available mammalian mitochondrial genomes, bird cytochrome c oxidase-derived DNA barcodes, and COVID-19 sequences.


2020 ◽  
Vol 48 (18) ◽  
pp. e105-e105 ◽  
Author(s):  
Volodymyr Tsybulskyi ◽  
Mohamed Mounir ◽  
Irmtraud M Meyer

Abstract Interactions between biological entities are key to understanding their potential functional roles. Three fields of research have recently made particular progress: the investigation of transRNA–RNA and RNA–DNA transcriptome interactions and of trans DNA–DNA genome interactions. We now have both experimental and computational methods for examining these interactions in vivo and on a transcriptome- and genome-wide scale, respectively. Often, key insights can be gained by visually inspecting figures that manage to combine different sources of evidence and quantitative information. We here present R-chie, a web server and R package for visualizing cis and transRNA–RNA, RNA–DNA and DNA–DNA interactions. For this, we have completely revised and significantly extended an earlier version of R-chie (1) which was initially introduced for visualizing RNA secondary structure features. The new R-chie offers a range of unique features for visualizing cis and transRNA–RNA, RNA–DNA and DNA–DNA interactions. Particularly note-worthy features include the ability to incorporate evolutionary information, e.g. multiple-sequence alignments, to compare two alternative sets of information and to incorporate detailed, quantitative information. R-chie is readily available via a web server as well as a corresponding R package called R4RNA which can be used to run the software locally.


2016 ◽  
Author(s):  
Vasco Elbrecht ◽  
Florian Leese

1) DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. Its success is often limited due to variable binding sites that introduce amplification biases. Thus the development of optimized primers for communities or taxa under study in a certain geographic region and/or ecosystems is of critical importance. However, no tool for obtaining and processing of reference sequence data in bulk that serve as a backbone for primer design is currently available. 2) We developed the R package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI databases for specified target taxonomic groups and then applies sequence clustering into operational taxonomic units (OTUs) to reduce biases introduced by the different number of available sequences per species. Additionally, PrimerMiner offers functionalities to evaluate primers in silico, which are in our opinion more realistic then the strategy employed in another available software for that purpose, ecoPCR. 3) We used PrimerMiner to download cytochrome c oxidase subunit I (COI) sequences for 15 important freshwater invertebrate groups, relevant for ecosystem assessment. By processing COI markers from both databases, we were able to increase the amount of reference data 249-fold on average, compared to using complete mitochondrial genomes alone. Furthermore, we visualized the generated OTU sequence alignments and describe how to evaluate primers in silico using PrimerMiner. 4) With PrimerMiner we provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. The OTU based reference alignments generated with PrimerMiner can be used for manual primer design, or processed with bioinformatic tools for primer development.


2017 ◽  
Author(s):  
Kushal K. Dey ◽  
Dongyue Xie ◽  
Matthew Stephens

AbstractBackgroundSequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Current alternatives that try to highlight depletion often produce visually cluttered logos.ResultsWe introduce a new sequence logo plot, the EDLogo plot, that highlights both enrichment and depletion, while minimizing visual clutter. We provide an easy-to-use and highly customizable R package Logolas to produce a range of logo plots, including EDLogo plots. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles.ConclusionOur new EDLogo plots, and flexible software implementation, can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc) across a wide range of applications.


2011 ◽  
Vol 10 (4) ◽  
pp. 483-493 ◽  
Author(s):  
Scott M. Landfear

ABSTRACTParasitic protozoa, such as malaria parasites, trypanosomes, andLeishmania, acquire a plethora of nutrients from their hosts, employing transport proteins located in the plasma membrane of the parasite. Application of molecular genetic approaches and the completion of genome projects have allowed the identification and functional characterization of a cohort of transporters and their genes in these parasites. This review focuses on a subset of these permeases that have been studied in some detail, that import critical nutrients, and that provide examples of approaches being undertaken broadly with these and other parasite transporters. Permeases reviewed include those for hexoses, purines, iron, polyamines, carboxylates, and amino acids. Topics of special emphasis include structure-function approaches, critical roles for transporters in parasite viability and physiology, regulation of transporter expression, and subcellular targeting. Investigations of parasite transporters impact a broad spectrum of basic biological problems in these protozoa.


Author(s):  
Walter R Gilks ◽  
Chinying Wang

Specificity determining sites (SDSs) in alignments of protein sequences are sites at which subfamilies of the aligned sequences have been under differential selective pressure. Identifying SDSs is important because they are key in understanding the functional specificity of each subfamily. Differential selection at an SDS will result in differences between subfamilies in the distribution of amino-acids at that site. However, statistical analysis of such differences is complicated by phylogenetic relationships within each subfamily, which profoundly influence these differences. We develop a non-parametric approach to evaluating purely statistical SDS evidence in a sequence alignment, taking account of phylogeny through a novel tree-respecting randomisation based on the principle of parsimony. Our approach does not exploit bioinformatic measures based on amino-acid properties or rates of evolution, as do other methods. Our intention is thereby to supplement and strengthen other methods of SDS prediction, not to compete with them. Our methodology is implemented in the R package called SDSparsimony, freely downloadable from http://www.maths.leeds.ac.uk/%7Ewally.gilks/SDSparsimonyPackage/Welcome.html.


Sign in / Sign up

Export Citation Format

Share Document