position weight matrix
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 7)

H-INDEX

9
(FIVE YEARS 1)

Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1809
Author(s):  
Xuhua Xia

Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.


2021 ◽  
Author(s):  
Jaya Srivastava ◽  
Petety V. Balaji

Novel functions can emerge in an enzyme family while conserving catalytic mechanism, motif or fold. PLP-dependent enzymes have evolved into seven fold types and catalyse diverse reactions using the same mechanism for the formation of external aldimine. Nucleotide sugar aminotransferases (NSATs) and dehydratases (NSDs) belong to fold type I and mediate the biosynthesis of several monosaccharides. NSATs use diverse substrates but are highly selective to the C3 or C4 carbon to which amine group is transferred. Factors responsible for reaction specificity in NSDs are known but remain unexplored in NSATs. Profile HMMs were able to identify NSATs but could not capture reaction specificity. A search for discriminating features led to the discovery of a sequence motif that is located near the pyranose binding site suggesting their role in imparting reaction specificity. Using a position weight matrix for this motif, we were able to assign reaction specificity to a large number of NSATs. Residues which upon mutation could convert NSD to NSAT have been reported in literature and we deduced that these are not conserved. This suggested the occurrence of non-generic family specific mutations underlying the evolution of dehydratases. Inferences from this analysis set way for future experiments that can shed light on mechanisms of functional diversification in enzymes of fold type I.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yulia M. Suvorova ◽  
Anastasia M. Kamionskaya ◽  
Eugene V. Korotkov

Abstract Background Transposable elements (TEs) constitute a significant part of eukaryotic genomes. Short interspersed nuclear elements (SINEs) are non-autonomous TEs, which are widely represented in mammalian genomes and also found in plants. After insertion in a new position in the genome, TEs quickly accumulate mutations, which complicate their identification and annotation by modern bioinformatics methods. In this study, we searched for highly divergent SINE copies in the genome of rice (Oryza sativa subsp. japonica) using the Highly Divergent Repeat Search Method (HDRSM). Results The HDRSM considers correlations of neighboring symbols to construct position weight matrix (PWM) for a SINE family, which is then used to perform a search for new copies. In order to evaluate the accuracy of the method and compare it with the RepeatMasker program, we generated a set of SINE copies containing nucleotide substitutions and indels and inserted them into an artificial chromosome for analysis. The HDRSM showed better results both in terms of the number of identified inserted repeats and the accuracy of determining their boundaries. A search for the copies of 39 SINE families in the rice genome produced 14,030 hits; among them, 5704 were not detected by RepeatMasker. Conclusions The HDRSM could find divergent SINE copies, correctly determine their boundaries, and offer a high level of statistical significance. We also found that RepeatMasker is able to find relatively short copies of the SINE families with a higher level of similarity, while HDRSM is able to find more diverged copies. To obtain a comprehensive profile of SINE distribution in the genome, combined application of the HDRSM and RepeatMasker is recommended.


Genes ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 765
Author(s):  
Cui ◽  
Xu ◽  
Li

Nucleosomes are the basic units of eukaryotes. The accurate positioning of nucleosomes plays a significant role in understanding many biological processes such as transcriptional regulation mechanisms and DNA replication and repair. Here, we describe the development of a novel method, termed ZCMM, based on Z-curve theory and position weight matrix (PWM). The ZCMM was trained and tested using the nucleosomal and linker sequences determined by support vector machine (SVM) in Saccharomyces cerevisiae (S. cerevisiae), and experimental results showed that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews correlation coefficient (MCC) values for ZCMM were 91.40%, 96.56%, 96.75%, and 0.88, respectively, and the average area under the receiver operating characteristic curve (AUC) value was 0.972. A ZCMM predictor was developed to predict nucleosome positioning in Homo sapiens (H. sapiens), Caenorhabditis elegans (C. elegans), and Drosophila melanogaster (D. melanogaster) genomes, and the accuracy (Acc) values were 77.72%, 85.34%, and 93.62%, respectively. The maximum AUC values of the four species were 0.982, 0.861, 0.912 and 0.911, respectively. Another independent dataset for S. cerevisiae was used to predict nucleosome positioning. Compared with the results of Wu's method, it was found that the Sn, Sp, Acc, and MCC of ZCMM results for S. cerevisiae were all higher, reaching 96.72%, 96.54%, 94.10%, and 0.88. Compared with the Guo’s method ‘iNuc-PseKNC’, the results of ZCMM for D. melanogaster were better. Meanwhile, the ZCMM was compared with some experimental data in vitro and in vivo for S. cerevisiae, and the results showed that the nucleosomes predicted by ZCMM were highly consistent with those confirmed by these experiments. Therefore, it was further confirmed that the ZCMM method has good accuracy and reliability in predicting nucleosome positioning.


2018 ◽  
Vol 16 (02) ◽  
pp. 1840013 ◽  
Author(s):  
Oxana A. Volkova ◽  
Yury V. Kondrakhin ◽  
Timur A. Kashapov ◽  
Ruslan N. Sharipov

RNA plays an important role in the intracellular cell life and in the organism in general. Besides the well-established protein coding RNAs (messenger RNAs, mRNAs), long non-coding RNAs (lncRNAs) have gained the attention of recent researchers. Although lncRNAs have been classified as non-coding, some authors reported the presence of corresponding sequences in ribosome profiling data (Ribo-seq). Ribo-seq technology is a powerful experimental tool utilized to characterize RNA translation in cell with focus on initiation (harringtonine, lactimidomycin) and elongation (cycloheximide). By exploiting translation starts obtained from the Ribo-seq experiment, we developed a novel position weight matrix model for the prediction of translation starts. This model allowed us to achieve 96% accuracy of discrimination between human mRNAs and lncRNAs. When the same model was used for the prediction of putative ORFs in RNAs, we discovered that the majority of lncRNAs contained only small ORFs ([Formula: see text][Formula: see text]nt) in contrast to mRNAs.


Sign in / Sign up

Export Citation Format

Share Document