alignment length
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 9)

H-INDEX

3
(FIVE YEARS 2)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hayato Takihara ◽  
Nobuaki Miura ◽  
Kiyoko F. Aoki-Kinoshita ◽  
Shujiro Okuda

Abstract Background Glycan-related genes play a fundamental role in various processes for energy acquisition and homeostasis maintenance while adapting to the environment in which the organism exists; however, their role in the microbiome in the environment is unclear. Methods Sequence alignment was performed between known glycan-related genes and complete genomes of microorganisms, and optimal parameters for identifying glycan-related genes were determined based on the alignments. Using the constructed scheme (> 90% of identity and > 25 aa of alignment length), glycan-related genes in various environments were identified from 198 different metagenome data. Results As a result, we identified 86.73 million glycan-related genes from the metagenome data. Among the 12 environments classified in this study, the percentage of glycan-related genes was high in the human-associated environment, suggesting that these environments utilize glycan metabolism better than other environments. On the other hand, the relative abundances of both glycoside hydrolases and glycosyltransferases surprisingly had a coverage of over 80% in all the environments. These glycoside hydrolases and glycosyltransferases were classified into two groups of (1) general enzyme families identified in various environments and (2) specific enzymes found only in certain environments. The general enzyme families were mostly from genes involved in monosaccharide metabolism, and most of the specific enzymes were polysaccharide degrading enzymes. Conclusion These findings suggest that environmental microorganisms could change the composition of their glycan-related genes to adapt the processes involved in acquiring energy from glycans in their environments. Our functional glyco-metagenomics approach has made it possible to clarify the relationship between the environment and genes from the perspective of carbohydrates, and the existence of glycan-related genes that exist specifically in the environment.


Antibiotics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 632
Author(s):  
Frida Svanberg Frisinger ◽  
Bimal Jana ◽  
Stefano Donadio ◽  
Luca Guardabassi

Novel antimicrobials interfering with pathogen-specific targets can minimize the risk of perturbations of the gut microbiota (dysbiosis) during therapy. We employed an in silico approach to identify essential proteins in Escherichia coli that are either absent or have low sequence identity in seven beneficial taxa of the gut microbiota: Faecalibacterium, Prevotella, Ruminococcus, Bacteroides, Lactobacillus, Lachnospiraceae and Bifidobacterium. We identified 36 essential proteins that are present in hyper-virulent E. coli ST131 and have low similarity (bitscore < 50 or identity < 30% and alignment length < 25%) to proteins in mammalian hosts and beneficial taxa. Of these, 35 are also present in Klebsiella pneumoniae. None of the proteins are targets of clinically used antibiotics, and 3D structure is available for 23 of them. Four proteins (LptD, LptE, LolB and BamD) are easily accessible as drug targets due to their location in the outer membrane, especially LptD, which contains extracellular domains. Our results indicate that it may be possible to selectively interfere with essential biological processes in Enterobacteriaceae that are absent or mediated by unrelated proteins in beneficial taxa residing in the gut. The identified targets can be used to discover antimicrobial drugs effective against these opportunistic pathogens with a decreased risk of causing dysbiosis.


2021 ◽  
Vol 4 (3) ◽  
pp. e202000880
Author(s):  
Esther Wershof ◽  
Danielle Park ◽  
David J Barry ◽  
Robert P Jenkins ◽  
Antonio Rullan ◽  
...  

Diverse extracellular matrix patterns are observed in both normal and pathological tissue. However, most current tools for quantitative analysis focus on a single aspect of matrix patterning. Thus, an automated pipeline that simultaneously quantifies a broad range of metrics and enables a comprehensive description of varied matrix patterns is needed. To this end, we have developed an ImageJ plugin called TWOMBLI, which stands for The Workflow Of Matrix BioLogy Informatics. This pipeline includes metrics of matrix alignment, length, branching, end points, gaps, fractal dimension, curvature, and the distribution of fibre thickness. TWOMBLI is designed to be quick, versatile and easy-to-use particularly for non-computational scientists. TWOMBLI can be downloaded from https://github.com/wershofe/TWOMBLI together with detailed documentation and tutorial video. Although developed with the extracellular matrix in mind, TWOMBLI is versatile and can be applied to vascular and cytoskeletal networks. Here we present an overview of the pipeline together with examples from a wide range of contexts where matrix patterns are generated.


2020 ◽  
Author(s):  
Andrew F. Magee ◽  
Sarah K. Hilton ◽  
William S. DeWitt

AbstractLikelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pair-wise epistasis and can be used in posterior predictive checks.


Author(s):  
Joseph F. Walker ◽  
Xing-Xing Shen ◽  
Antonis Rokas ◽  
Stephen A. Smith ◽  
Edwige Moyroud

AbstractThe genomic data revolution has enabled biologists to develop innovative ways to infer key episodes in the history of life. Whether genome-scale data will eventually resolve all branches of the Tree of Life remains uncertain. However, through novel means of interrogating data, some explanations for why evolutionary relationships remain recalcitrant are emerging. Here, we provide four biological and analytical factors that explain why certain genes may exhibit “outlier” behavior, namely, rate of molecular evolution, alignment length, misidentified orthology, and errors in modeling. Using empirical and simulated data we show how excluding genes based on their likelihood or inferring processes from the topology they support in a supermatrix can mislead biological inference of conflict. We next show alignment length accounts for the high influence of two genes reported in empirical datasets. Finally, we also reiterate the impact misidentified orthology and short alignments have on likelihoods in large scale phylogenetics. We suggest that researchers should systematically investigate and describe the source of influential genes, as opposed to discarding them as outliers. Disentangling whether analytical or biological factors are the source of outliers will help uncover new patterns and processes that are shaping the Tree of Life.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7747 ◽  
Author(s):  
Joseph F. Walker ◽  
Nathanael Walker-Hale ◽  
Oscar M. Vargas ◽  
Drew A. Larson ◽  
Gregory W. Stull

Evolutionary relationships among plants have been inferred primarily using chloroplast data. To date, no study has comprehensively examined the plastome for gene tree conflict. Using a broad sampling of angiosperm plastomes, we characterize gene tree conflict among plastid genes at various time scales and explore correlates to conflict (e.g., evolutionary rate, gene length, molecule type). We uncover notable gene tree conflict against a backdrop of largely uninformative genes. We find alignment length and tree length are strong predictors of concordance, and that nucleotides outperform amino acids. Of the most commonly used markers, matK, greatly outperforms rbcL; however, the rarely used gene rpoC2 is the top-performing gene in every analysis. We find that rpoC2 reconstructs angiosperm phylogeny as well as the entire concatenated set of protein-coding chloroplast genes. Our results suggest that longer genes are superior for phylogeny reconstruction. The alleviation of some conflict through the use of nucleotides suggests that stochastic and systematic error is likely the root of most of the observed conflict, but further research on biological conflict within plastome is warranted given documented cases of heteroplasmic recombination. We suggest that researchers should filter genes for topological concordance when performing downstream comparative analyses on phylogenetic data, even when using chloroplast genomes.


2019 ◽  
Author(s):  
Denis Jacob Machado ◽  
Santiago Castroviejo-Fisher ◽  
Taran Grant

We evaluated the effects of variation in the number and distribution of gaps (i.e., no base; coded as IUPAC “.” or “–”) treated as missing data (i.e., any base, coded as “?” or IUPAC “N”) in standard maximum likelihood (ML) analysis. We obtained alignments with variable numbers and arrangements of gaps by aligning seven diverse empirical datasets under different gap opening costs using MAFFT. We selected the optimal substitution model for each alignment using the corrected Akaike Information Criterion (AICc) in jModelTest2 and searched for the optimal trees for each alignment using default search parameters and the selected models in GARLI. We also employed a Monte Carlo approach to randomly insert gaps (treated as missing data) into an empirical dataset to understand more precisely the effects of their variable numbers and distributions. To compare alignments quantitatively, we used several measures to quantify the number and distribution of gaps in all alignments (e.g., alignment length, total number of gaps, total number of characters containing gaps, number of gap openings). We then used these variables to derive four indices (ranging from 0 to 1) that summarize the distribution of gaps both within and among terminals, including an index that takes into account their optimization on the tree. Our most important observation is that ML scores correlate negatively with gap opening costs, and the amount of missing data. These variables also cause unpredictable effects on tree topologies. We discuss the implications of our results for the traditional and tree-alignment approaches in ML.


2019 ◽  
Vol 36 (4) ◽  
pp. 757-765 ◽  
Author(s):  
Jürgen F H Strassert ◽  
Mahwash Jamy ◽  
Alexander P Mylnikov ◽  
Denis V Tikhonenkov ◽  
Fabien Burki

AbstractThe resolution of the broad-scale tree of eukaryotes is constantly improving, but the evolutionary origin of several major groups remains unknown. Resolving the phylogenetic position of these “orphan” groups is important, especially those that originated early in evolution, because they represent missing evolutionary links between established groups. Telonemia is one such orphan taxon for which little is known. The group is composed of molecularly diverse biflagellated protists, often prevalent although not abundant in aquatic environments. Telonemia has been hypothesized to represent a deeply diverging eukaryotic phylum but no consensus exists as to where it is placed in the tree. Here, we established cultures and report the phylogenomic analyses of three new transcriptome data sets for divergent telonemid lineages. All our phylogenetic reconstructions, based on 248 genes and using site-heterogeneous mixture models, robustly resolve the evolutionary origin of Telonemia as sister to the Sar supergroup. This grouping remains well supported when as few as 60% of the genes are randomly subsampled, thus is not sensitive to the sets of genes used but requires a minimal alignment length to recover enough phylogenetic signal. Telonemia occupies a crucial position in the tree to examine the origin of Sar, one of the most lineage-rich eukaryote supergroups. We propose the moniker “TSAR” to accommodate this new mega-assemblage in the phylogeny of eukaryotes.


2018 ◽  
Author(s):  
Jürgen F. H. Strassert ◽  
Mahwash Jamy ◽  
Alexander P. Mylnikov ◽  
Denis V. Tikhonenkov ◽  
Fabien Burki

AbstractThe broad-scale tree of eukaryotes is constantly improving, but the evolutionary origin of several major groups remains unknown. Resolving the phylogenetic position of these ‘orphan’ groups is important, especially those that originated early in evolution, because they represent missing evolutionary links between established groups. Telonemia is one such orphan taxon for which little is known. The group is composed of molecularly diverse biflagellated protists, often prevalent although not abundant in aquatic environments. Telonemia has been hypothesized to represent a deeply diverging eukaryotic phylum but no consensus exists as to where it is placed in the tree. Here, we established cultures and report the phylogenomic analyses of three new transcriptome datasets for divergent telonemid lineages. All our phylogenetic reconstructions, based on 248 genes and using site-heterogeneous mixture models, robustly resolve the evolutionary origin of Telonemia as sister to the Sar supergroup. This grouping remains well supported when as few as 60% of the genes are randomly subsampled, thus is not sensitive to the sets of genes used but requires a minimal alignment length to recover enough phylogenetic signal. Telonemia occupies a crucial position in the tree to examine the origin of Sar, one of the most lineage-rich eukaryote supergroups. We propose the moniker ‘TSAR’ to accommodate this new mega-assemblage in the phylogeny of eukaryotes.


Sign in / Sign up

Export Citation Format

Share Document