scholarly journals cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large, bacterial whole genome sequencing datasets

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ryan D. Crawford ◽  
Evan S. Snitkin

Abstract Background The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. Results We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h. Conclusions We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package (https://github.com/rdcrawford/cognac) with customizable parameters for adaptation to diverse applications.

2020 ◽  
Author(s):  
Ryan D. Crawford ◽  
Evan S. Snitkin

AbstractThe quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. We present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. We applied this tool to generate core gene alignments for very large genomic datasets, including a dataset of over 11,000 genomes from the genus Escherichia containing 1,353 genes, which was constructed in less than 17 hours. We have released cognac as an R package (https://github.com/rdcrawford/cognac) with customizable parameters for adaptation to diverse applications.


2021 ◽  
Author(s):  
Leonie V. D. E. Vogelsmeier ◽  
Jeroen K. Vermunt ◽  
Kim De Roover

Intensive longitudinal data (ILD) have become popular for studying within-person dynamics in psychological constructs (or between-person differences therein). Prior to investigating what the dynamics look like, it is important to examine whether the measurement model (MM) is the same across subjects and time and, thus, whether the measured constructs have the same meaning. If the MM differs (e.g., because of changes in item interpretation or response styles), observations cannot be validly compared. Exploring differences in the MM for ILD can be done with latent Markov factor analysis (LMFA), which classifies observations based on the underlying MM (for many subjects and time-points simultaneously) and thus shows which observations are comparable. However, the complexity of the method or the fact that no open-source software for LMFA existed until now may have hindered researchers from applying the method in practice. In this article, we introduce the new user-friendly software package lmfa, which allows researchers to perform the analysis in the freely available software R. We provide a step-by-step tutorial for the lmfa package so that researchers can easily investigate MM differences in their own ILD.


2021 ◽  
Author(s):  
Piyal Karunarathne ◽  
Nicolas Pocquet ◽  
Pierrick Labbé ◽  
Pascal Milesi

Abstract Dose-response relationships reflect the effects of a substance on organisms, and are widely used in broad research areas, from medicine and physiology, to vector control and pest management in agronomy. Furthermore, reporting on the response of organisms to stressors is an essential component of many public policies (e.g. public health, environment), and assessment of xenobiotic responses is an integral part of the World Health Organization recommendations. Building upon an R script that we previously made available, and considering its popularity, we have now developed a software package in the R environment, BioRssay, to efficiently analyze dose-response relationships. It has more user-friendly functions, more flexibility, and proposes an easy interpretation of the results. The functions in the BioRssay package are built on robust statistical analyses to compare the dose/exposure-response of various bioassays and effectively visualize them in probit-graphs.


2020 ◽  
Vol 37 (5) ◽  
pp. 1530-1534 ◽  
Author(s):  
Bui Quang Minh ◽  
Heiko A Schmidt ◽  
Olga Chernomor ◽  
Dominik Schrempf ◽  
Michael D Woodhams ◽  
...  

Abstract IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.


2019 ◽  
Author(s):  
Bui Quang Minh ◽  
Heiko Schmidt ◽  
Olga Chernomor ◽  
Dominik Schrempf ◽  
Michael Woodhams ◽  
...  

AbstractIQ-TREE (http://www.iqtree.org) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Thomas P Quinn ◽  
Ionas Erb

Abstract Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.


Author(s):  
Erhan Batuhan Arisoy ◽  
Guannan Ren ◽  
Erva Ulu ◽  
Nurcan Gecer Ulu ◽  
Suraj Musuvathy

The wide spread use of 3D acquisition devices with high-performance processing tools has facilitated rapid generation of digital twin models for large production plants and factories for optimizing work cell layouts and improving human operator effectiveness, safety and ergonomics. Although recent advances in digital simulation tools have enabled users to analyze the workspace using virtual human and environment models, these tools are still highly dependent on user input to configure the simulation environment such as how humans are picking and moving different objects during manufacturing. As a step towards, alleviating user involvement in such analysis, we introduce a data-driven approach for estimating natural grasp point locations on objects that human interact with in industrial applications. Proposed system takes a CAD model as input and outputs a list of candidate natural grasping point locations. We start with generation of a crowdsourced grasping database that consists of CAD models and corresponding grasping point locations that are labeled as natural or not. Next, we employ a Bayesian network classifier to learn a mapping between object geometry and natural grasping locations using a set of geometrical features. Then, for a novel object, we create a list of candidate grasping positions and select a subset of these possible locations as natural grasping contacts using our machine learning model. We evaluate the advantages and limitations of our method by investigating the ergonomics of resulting grasp postures.


2020 ◽  
Author(s):  
Xiaoyu Lu ◽  
Szu-Wei Tu ◽  
Wennan Chang ◽  
Changlin Wan ◽  
Jiashi Wang ◽  
...  

ABSTRACTDeconvolution of mouse transcriptomic data is challenged by the fact that mouse models carry various genetic and physiological perturbations, making it questionable to assume fixed cell types and cell type marker genes for different dataset scenarios. We developed a Semi-Supervised Mouse data Deconvolution (SSMD) method to study the mouse tissue microenvironment (TME). SSMD is featured by (i) a novel non-parametric method to discover data set specific cell type signature genes; (ii) a community detection approach for fixing cell types and their marker genes; (iii) a constrained matrix decomposition method to solve cell type relative proportions that is robust to diverse experimental platforms. In summary, SSMD addressed several key challenges in the deconvolution of mouse tissue data, including: (1) varied cell types and marker genes caused by highly divergent genotypic and phenotypic conditions of mouse experiment, (2) diverse experimental platforms of mouse transcriptomics data, (3) small sample size and limited training data source, and (4) capable to estimate the proportion of 35 cell types in blood, inflammatory, central nervous or hematopoietic systems. In silico and experimental validation of SSMD demonstrated its high sensitivity and accuracy in identifying (sub) cell types and predicting cell proportions comparing to state-of-the-arts methods. A user-friendly R package and a web server of SSMD are released via https://github.com/xiaoyulu95/SSMD.Key pointsWe provide a novel tissue deconvolution method, namely SSMD, which is specifically designed for mouse data to handle the variations caused by different mouse strain, genetic and phenotypic background, and experimental platforms.SSMD is capable to detect data set and tissue microenvironment specific cell markers for more than 30 cell types in mouse blood, inflammatory tissue, cancer, and central nervous system.SSMD achieve much improved performance in estimating relative proportion of the cell types compared with state-of-the-art methods.The semi-supervised setting enables the application of SSMD on transcriptomics, DNA methylation and ATAC-seq data.A user friendly R package and a R shiny of SSMD based webserver are also developed.


Sign in / Sign up

Export Citation Format

Share Document