scholarly journals BASE: a novel workflow to integrate non-ubiquitous genes in comparative genomics analyses for selection

2020 ◽  
Author(s):  
Giobbe Forni ◽  
Angelo Alberto Ruggeri ◽  
Giovanni Piccinini ◽  
Andrea Luchetti

AbstractInferring the selective forces that different ortholog genes underwent across different lineages can make us understand the evolutionary processes which shaped their extant diversity. The more widespread metric to estimate coding sequences selection regimes across across their sites and species phylogeny is the ratio of nonsynonymous to synonymous substitutions (dN/dS, also known as ω). Nowadays, modern sequencing technologies and the large amount of already available sequence data allow the retrieval of thousands of genes orthology groups across large numbers of species. Nonetheless, the tools available to explore selection regimes are not designed to automatically process all orthogroups and practical usage is often restricted to those consisting of single-copy genes which are ubiquitous across the species considered (i.e. the subset of genes which is shared by all the species considered). This approach limits the scale of the analysis to a fraction of single-copy genes, which can be as lower as an order of magnitude in respect to non-ubiquitous ones (i.e. those which are not present across all the species considered). Here we present a workflow named BASE that - leveraging the CodeML framework - ease the inference and interpretation of selection regimes in the context of comparative genomics. Although a number of bioinformatics tools have already been developed to facilitate this kind of analyses, BASE is the first to be specifically designed to ease the integration of non-ubiquitous genes orthogroups. The workflow - along with all the relevant documentation - is available at github.com/for-giobbe/BASE.

2021 ◽  
Author(s):  
Parsoa Khorsand ◽  
Fereydoun Hormozdiari

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.


1971 ◽  
Vol 10 ◽  
pp. 15-19
Author(s):  
George B. Rybicki

AbstractIt is shown that the time of relaxation by particle encounters of self-gravitating systems in the plane interacting by 1/r2 forces is of the same order of magnitude as the mean orbit time. Therefore such a system does not have a Vlasov limit for large numbers of particles, unless appeal is made to some non-zero thickness of the disk. The relevance of this result to numerical experiments on galactic structure is discussed.


2013 ◽  
Vol 5 ◽  
pp. BECB.S10886 ◽  
Author(s):  
Brijesh Singh Yadav ◽  
Venkateswarlu Ronda ◽  
Dinesh P. Vashista ◽  
Bhaskar Sharma

The recent advances in sequencing technologies and computational approaches are propelling scientists ever closer towards complete understanding of human-microbial interactions. The powerful sequencing platforms are rapidly producing huge amounts of nucleotide sequence data which are compiled into huge databases. This sequence data can be retrieved, assembled, and analyzed for identification of microbial pathogens and diagnosis of diseases. In this article, we present a commentary on how the metagenomics incorporated with microarray and new sequencing techniques are helping microbial detection and characterization.


2017 ◽  
Vol 2 ◽  
pp. 73 ◽  
Author(s):  
Muna F. Abry ◽  
Kelvin M. Kimenyi ◽  
Daniel K Masiga ◽  
Benard W. Kulohoma

Accessory gland proteins (ACPs) are important reproductive proteins produced by the male accessory glands (MAGs) of most insect species. These proteins are essential for male insect fertility, and are transferred alongside semen to females during copulation. ACPs are poorly characterized in Glossina species (tsetse fly), the principal vector of the parasite that causes life-threatening Human African Trypanosomiasis and Animal trypanosomiasis in endemic regions in Africa. The tsetse fly has a peculiar reproductive cycle because of the absence of oviposition. Females mate once and store sperm in a spermathecal, and produce a single fully developed larva at a time that pupates within minutes of exiting their uterus. This slow reproductive cycle, compared to other insects, significantly restricts reproduction to only 3 to 6 larvae per female lifespan. This unique reproductive cycle is an attractive vector control strategy entry point. We exploit comparative genomics approaches to explore the diversity of ACPs in the recently available whole genome sequence data from five tsetse fly species ( Glossina morsitans, G. austeni, G. brevipalpis, G. pallidipes and G. fuscipes). We used previously described ACPs in Drosophila melanogaster and Anopheles gambiae as reference sequences. We identified 36, 27, 31, 29 and 33 diverse ACP orthologous genes in G. austeni, G. brevipalpis, G. fuscipes, G. pallidipes and G. morsitans genomes respectively, which we classified into 21 functional classes. Our findings provide genetic evidence of MAG proteins in five recently sequenced Glossina genomes. It highlights new avenues for molecular studies that evaluate potential field control strategies of these important vectors of human and animal disease.


2019 ◽  
Author(s):  
Kamela Charmaine S. Ng ◽  
Jean Claude S. Ngabonziza ◽  
Pauline Lempens ◽  
Bouke Catherine de Jong ◽  
Frank van Leth ◽  
...  

AbstractBackgroundMycobacterium tuberculosis rapid diagnostic tests (RDTs) are widely employed in routine laboratories and national surveys for detection of rifampicin-resistant (RR)-TB. However, as next generation sequencing technologies have become more commonplace in research and surveillance programs, RDTs are being increasingly complemented by whole genome sequencing (WGS). While comparison between RDTs is difficult, all RDT results can be derived from WGS data. This can facilitate continuous analysis of RR-TB burden regardless of the data generation technology employed. By converting WGS to RDT results, we enable comparison of data with different formats and sources particularly for low and middle income high TB burden countries that employ different diagnostic algorithms for drug resistance surveys. This allows national TB control programs (NTPs) and epidemiologists to utilize all available data in the setting for improved RR-TB surveillance.MethodsWe developed the Python-based MTB Genome to Test (MTBGT) tool that transforms WGS-derived data into laboratory-validated results of the primary RDTs – Xpert MTB/RIF, XpertMTB/RIF Ultra, GenoType MDRTBplus v2.0, and GenoscholarNTM+MDRTB II. The tool was validated through RDT results of RR-TB strains with diverse resistance patterns and geographic origins and applied on routine-derived WGS data.ResultsThe MTBGT tool correctly transformed the SNP data into the RDT results and generated tabulated frequencies of the RDT probes as well as rifampicin susceptible cases. The tool supplemented the RDT probe reactions output with the RR-conferring mutation based on identified SNPs. The MTBGT tool facilitated continuous analysis of RR-TB and Xpert probe reactions from different platforms and collection periods in Rwanda.ConclusionOverall, the MTBGT tool allows low and middle income countries to make sense of the increasingly generated WGS in light of the readily available RDT results, and assess whether currently implemented RDTs adequately detect RR-TB in their setting. With its feature to transform WGS to RDT results and facilitate continuous RR-TB data analysis, the MTBGT tool may bridge the gap between and among data from periodic surveys, continuous surveillance, research, and routine tests, and may be integrated within the existing national connectivity platform for use by the NTP and epidemiologists to improve setting-specific RR-TB control. The MTBGT source code and accompanying documentation is available at https://github.com/KamelaNg/MTBGT.


Author(s):  
Robert C. Edgar

AbstractMapping of reads to reference sequences is an essential step in a wide range of biological studies. The large size of datasets generated with next-generation sequencing technologies motivates the development of fast mapping software. Here, I describe URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA and Bowtie2 with comparable accuracy on a benchmark test using simulated paired 150nt reads of a well-studied human genome. Software is freely available at https://drive5.com/urmap.


2021 ◽  
Author(s):  
Romain Feron ◽  
Robert Michael Waterhouse

Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. In order to guide forthcoming genome generation efforts and promote efficient prioritisation of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. Here we present an automated analysis workflow that surveys genome assemblies from the United States National Center for Biotechnology Information (NCBI), assesses their completeness using the relevant Benchmarking Universal Single-Copy Orthologue (BUSCO) datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, we examine how key assembly metrics relate to gene content completeness, and we compare results from using different BUSCO lineage datasets. These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritisations for ongoing and future sampling, sequencing, and genome generation initiatives.


2012 ◽  
pp. 1885-1903
Author(s):  
Bertil Schmidt ◽  
Chen Chen ◽  
Weiguo Liu ◽  
Wayne P. Mitchell

In this chapter we present PheGee@Home, a grid-based comparative genomics tool that nominates candidate genes responsible for a given phenotype. A phenotype is the physical manifestation of the interplay of genetic, epigenetic and environmental factors. Our tool is designed to facilitate the discovery and prioritization of candidate genes controlling or contributing to the genetically determined portion of a specified phenotype. However, in order to make reliable nominations of candidate genes from sequence data, several genome-size sequence datasets are required. This makes the approach impractical on traditional computer architectures leading to prohibitively long runtimes. Therefore, we use a computational architecture based on a desktop grid environment and commodity graphics hardware to significantly accelerate PheGee. We validate this approach by showing the deployment and evaluation on a grid testbed for the comparison of microbial genomes.


Molecules ◽  
2019 ◽  
Vol 24 (2) ◽  
pp. 261 ◽  
Author(s):  
Yongfu Li ◽  
Steven Paul Sylvester ◽  
Meng Li ◽  
Cheng Zhang ◽  
Xuan Li ◽  
...  

Magnolia zenii is a critically endangered species known from only 18 trees that survive on Baohua Mountain in Jiangsu province, China. Little information is available regarding its molecular biology, with no genomic study performed on M. zenii until now. We determined the complete plastid genome of M. zenii and identified microsatellites. Whole sequence alignment and phylogenetic analysis using BI and ML methods were also conducted. The plastome of M. zenii was 160,048 bp long with 39.2% GC content and included a pair of inverted repeats (IRs) of 26,596 bp that separated a large single-copy (LSC) region of 88,098 bp and a small single-copy (SSC) region of 18,757 bp. One hundred thirty genes were identified, of which 79 were protein-coding genes, 37 were transfer RNAs, and eight were ribosomal RNAs. Thirty seven simple sequence repeats (SSRs) were also identified. Comparative analyses of genome structure and sequence data of closely-related species revealed five mutation hotspots, useful for future phylogenetic research. Magnolia zenii was placed as sister to M. biondii with strong support in all analyses. Overall, this study providing M. zenii genomic resources will be beneficial for the evolutionary study and phylogenetic reconstruction of Magnoliaceae.


2018 ◽  
Vol 29 (08) ◽  
pp. 1249-1255
Author(s):  
Kamil Salikhov

Modern DNA sequencing technologies generate prodigious volumes of sequence data consisting of short DNA fragments (reads). Storing and transferring this data is often challenging. With this motivation, several specialized compression methods have been developed. In this paper, we present an improvement of the lossless reference-free compression algorithm, suggested by Rozov et al., based on the technique of cascading Bloom filters. Through computational experiments on real data, we demonstrate that our method results in a significant associated memory reduction in practice.


Sign in / Sign up

Export Citation Format

Share Document