sequence logos
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 16)

H-INDEX

12
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Yaowen Chen ◽  
Zhen He ◽  
Yahui Men ◽  
Guohua Dong ◽  
Shuofeng Hu ◽  
...  

Sequence logos are used to visually display sequence conservations and variations. They can indicate the fixed patterns or conserved motifs in a batch of DNA or protein sequences. However, most of the popular sequence logo generators can only draw logos for sequences of the same length, let alone for groups of sequences with different characteristics besides lengths. To solve these problems, we developed MetaLogo, which can draw sequence logos for sequences of different lengths or from different groups in one single plot and align multiple logos to highlight the sequence pattern dynamics across groups, thus allowing users to investigate functional motifs in a more delicate and dynamic perspective. We provide users a public MetaLogo web server (http://metalogo.omicsnet.org), a standalone Python package (https://github.com/labomics/MetaLogo), and also a built-in web server available for local deployment. Using MetaLogo, users can draw informative, customized, aesthetic, and publishable sequence logos without any programming experience.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253836
Author(s):  
Shoaib Ur Rehman ◽  
Ghulam Qanmber ◽  
Muhammad Hammad Nadeem Tahir ◽  
Ahsan Irshad ◽  
Sajid Fiaz ◽  
...  

Vascular plant one-zinc-finger (VOZ) transcription factors regulate plant growth and development under drought conditions. Six VOZ transcription factors encoding genes exist in soybean genome (both in Glycine max and Glycine soja). Herein, GmVOZs and GsVOZs were identified through in silico analysis and characterized with different bioinformatics tools and expression analysis. Phylogenetic analysis classified VOZ genes in four groups. Sequence logos analysis among G. max and G. soja amino acid residues revealed higher conservation. Presence of stress related cis-elements in the upstream regions of GmVOZs and GsVOZs highlights their role in tolerance against abiotic stresses. The collinearity analysis identified 14 paralogous/orthologous gene pairs within and between G. max and G. soja. The Ka/Ks values showed that soybean VOZ genes underwent selection pressure with limited functional deviation arising from whole genome and segmental duplication. The GmVOZs and GsVOZs were found to express in roots and leaves at seedling stage. The qRT-PCR revealed that GmVOZs and GsVOZs transcripts can be regulated by abiotic stresses such as polyethylene glycol (PEG). The findings of this study will provide a reference to decipher physiological and molecular functions of VOZ genes in soybean.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nicola De Maio ◽  
Alexander V. Alekseyenko ◽  
William J. Coleman-Smith ◽  
Fabio Pardi ◽  
Marc A. Suchard ◽  
...  

Abstract Background Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are ‘novel’ compared to the others in the same dataset, and low weights to sequences that are over-represented. Results We formalise this principle by rigorously defining the evolutionary ‘novelty’ of a sequence within an alignment. This results in new sequence weights that we call ‘phylogenetic novelty scores’. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column—important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. Conclusions Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0240947
Author(s):  
Ping Li ◽  
Chao Chen ◽  
Ping Li ◽  
Yibo Dong

Rocky desertification is a bottleneck that reduces ecological and environmental security in karst areas. Paper mulberry, a unique deciduous tree, shows good performance in rocky desertification areas. Its resistance mechanisms are therefore of high interest. In this study, a lysine acetylation proteomics analysis of paper mulberry seedling leaves was conducted in combination with the purification of acetylated protein by high-precision nano LC-MS/MS. We identified a total of 7130 acetylation sites in 3179 proteins. Analysis of the modified sites showed a predominance of nine motifs. Six positively charged residues: lysine (K), arginine (R), and histidine (H), serine (S), threonine (T), and tyrosine (Y) occurred most frequently at the +1 position, phenylalanine (F) was both detected both upstream and downstream of the acetylated lysines; and the sequence logos showed a strong preference for lysine and arginine around acetylated lysines. Functional annotation revealed that the identified enzymes were mainly involved in translation, transcription, ribosomal structure and biological processes, showing that lysine acetylation can regulate various aspects of primary carbon and nitrogen metabolism and secondary metabolism. Acetylated proteins were enriched in the chloroplast, cytoplasm, and nucleus, and many stress response-related proteins were also discovered to be acetylated, including PAL, HSP70, and ERF. HSP70, an important protein involved in plant abiotic and disease stress responses, was identified in paper mulberry, although it is rarely found in woody plants. This may be further examined in research in other plants and could explain the good adaptation of paper mulberry to the karst environment. However, these hypotheses require further verification. Our data can provide a new starting point for the further analysis of the acetylation function in paper mulberry and other plants.


2020 ◽  
Author(s):  
Nicola De Maio ◽  
Alexander V. Alekseyenko ◽  
William J. Coleman-Smith ◽  
Fabio Pardi ◽  
Marc A. Suchard ◽  
...  

AbstractBackgroundMany important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are ‘novel’ compared to the others in the same dataset, and low weights to sequences that are over-represented.ResultsWe formalise this principle by rigorously defining the evolutionary ‘novelty’ of a sequence within an alignment. This results in new sequence weights that we call ‘phylogenetic novelty scores’. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column — important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they improve the accuracy of character frequency estimation compared to existing sequence weighting schemes.ConclusionsOur phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.


Genetics ◽  
2020 ◽  
Vol 216 (2) ◽  
pp. 353-358
Author(s):  
Mengchi Wang ◽  
David Wang ◽  
Kai Zhang ◽  
Vu Ngo ◽  
Shicai Fan ◽  
...  

Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Typically, motifs are represented as position weight matrices (PWMs) and visualized using sequence logos. However, in many scenarios, in order to interpret the motif information or search for motif matches, it is compact and sufficient to represent motifs by wildcard-style consensus sequences (such as [GC][AT]GATAAG[GAC]). Based on mutual information theory and Jensen-Shannon divergence, we propose a mathematical framework to minimize the information loss in converting PWMs to consensus sequences. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized characters. We show that this representation provides a simple and efficient way to identify the binding sites of 1156 common transcription factors (TFs) in the human genome. The effectiveness of the method was benchmarked by comparing sequence matches found by Motto with PWM scanning results found by FIMO. On average, our method achieves a 0.81 area under the precision-recall curve, significantly (P-value < 0.01) outperforming all existing methods, including maximal positional weight, Cavener’s method, and minimal mean square error. We believe this representation provides a distilled summary of a motif, as well as the statistical justification.


2020 ◽  
Author(s):  
Carlos Soto ◽  
Darshan Bryner ◽  
Nicola Neretti ◽  
Anuj Srivastava

AbstractThe study of the 3-dimensional (3D) structure of chromosomes – the largest macromolecules in biology – is one of the most challenging to date in structural biology. Here, we develop a novel representation of chromosomes, as sequences of shape letters from a finite shape alphabet, which provides a compact and efficient way to analyze ensembles of chromosome shape data, akin to the analysis of texts in a language by using letters. We construct a Chromosome Shape Alphabet (CSA) from an ensemble of chromosome 3D structures inferred from Hi-C data – via SIMBA3D or other methods – by segmenting curves based on topologically associating domains (TADs) boundaries, and by clustering all TADs’ 3D structures into groups of similar shapes. The median shapes of these groups, with some pruning and processing, form the Chromosome Shape Letters (CSLs) of the alphabet. We provide a proof-of-concept for these CSLs by reconstructing independent test curves using only CSLs (and corresponding transformations) and comparing these reconstructions with the original curves. Finally, we demonstrate how CSLs can be used to summarize the variability of shapes in an ensemble of chromosome 3D structures using generalized sequence logos.


Author(s):  
Lei Zheng ◽  
Dongyang Liu ◽  
Wuritu Yang ◽  
Lei Yang ◽  
Yongchun Zuo

Abstract Sequence logos give a fast and concise display in visualizing consensus sequence. Protein exhibits greater complexity and diversity than DNA, which usually affects the graphical representation of the logo. Reduced amino acids perform powerful ability for simplifying complexity of sequence alignment, which motivated us to establish RaacLogo. As a new sequence logo generator by using reduced amino acid alphabets, RaacLogo can easily generate many different simplified logos tailored to users by selecting various reduced amino acid alphabets that consisted of more than 40 clustering algorithms. This current web server provides 74 types of reduced amino acid alphabet, which were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with protein alignment. A two-dimensional selector was proposed for easily selecting desired RAACs with underlying biology knowledge. It is anticipated that the RaacLogo web server will play more high-potential roles for protein sequence alignment, topological estimation and protein design experiments. RaacLogo is freely available at http://bioinfor.imu.edu.cn/raaclogo.


2020 ◽  
Vol 36 (11) ◽  
pp. 3573-3575
Author(s):  
Henry Pratt ◽  
Zhiping Weng

Abstract Summary Sequence logos were introduced nearly 30 years ago as a human-readable format for representing consensus sequences, and they remain widely used. As new experimental and computational techniques have developed, logos have been extended: extra symbols represent covalent modifications to nucleotides, logos with multiple letters at each position illustrate models with multi-nucleotide features and symbols extending below the x-axis may represent a binding energy penalty for a residue or a negative weight output from a neural network. Web-based visualization tools for genomic data are increasingly taking advantage of modern web technology to offer dynamic, interactive figures to users, but support for sequence logos remains limited. Here, we present LogoJS, a Javascript package for rendering customizable, interactive, vector-graphic sequence logos and embedding them in web applications. LogoJS supports all the aforementioned logo extensions and is bundled with a companion web application for creating and sharing logos. Availability and implementation LogoJS is implemented both in plain Javascript and ReactJS, a popular user-interface framework. The web application is hosted at logojs.wenglab.org. All major browsers and operating systems are supported. The package and application are open-source; code is available at GitHub. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2272-2274 ◽  
Author(s):  
Ammar Tareen ◽  
Justin B Kinney

Abstract Summary Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. Availability and implementation Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.


Sign in / Sign up

Export Citation Format

Share Document