MULTIPLE ALIGNMENT OF PROMOTER SEQUENCES FROM HUMAN GENOM

A new multiple alignment algorithm has been developed. With its help, the alignment of promoter sequences from the human genome is calculated. Based on the calculated multiple alignments, 17 classes of promoter sequences were created.

2020 ◽  
Vol 36 (4) ◽  
pp. 7-14
Author(s):  
A.M. Kamionskaya ◽  
M.A. Korotkova

A new algorithm for multiple alignment of nucleotide sequences of MAHDS has been developed. A statistically significant multiple alignment of promoter sequences from the human genome was first created using this algorithm. Based on the constructed alignments, 25 classes of promoter sequences were created with the volume of each class exceeding 100 sequences. The classes of promoters can be used to search for promoter sequences in eukaryotic genomes. promoter, class, dynamic programming, human genome. The work was partially supported by the Russian Foundation for Basic Research (Grant no. 20-016-00057).


2010 ◽  
Vol 08 (03) ◽  
pp. 503-517 ◽  
Author(s):  
BORIS BURKOV ◽  
BORIS NAGAEV ◽  
SERGEI SPIRIN ◽  
ANDREI ALEXEEVSKI

It makes sense to speak of alignment of protein sequences only within the regions, where the sequences are related to each other. This simple consideration is often disregarded by programs of multiple alignment construction. A package for alignment analysis MAlAKiTE (Multiple Alignment Automatic Kinship Tiling Engine) is introduced. It aims to find the blocks of reliable alignment, which contain related regions only, within the whole alignment and allows for dealing with them. The validity of the detection of reliable blocks' was verified by comparison with structural data.


2004 ◽  
Vol 02 (04) ◽  
pp. 719-745 ◽  
Author(s):  
ARUN SIDDHARTH KONAGURTHU ◽  
JAMES WHISSTOCK ◽  
PETER J. STUCKEY

In this paper we demonstrate a practical approach to construct progressive multiple alignments using sequence triplet optimizations rather than a conventional pairwise approach. Using the sequence triplet alignments progressively provides a scope for the synthesis of a three-residue exchange amino acid substitution matrix. We develop such a 20×20×20 matrix for the first time and demonstrate how its use in optimal sequence triplet alignments increases the sensitivity of building multiple alignments. Various comparisons were made between alignments generated using the progressive triplet methods and the conventional progressive pairwise procedure. The assessment of these data reveal that, in general, the triplet based approaches generate more accurate sequence alignments than the traditional pairwise based procedures, especially between more divergent sets of sequences.


2005 ◽  
Vol 2005 (2) ◽  
pp. 124-131 ◽  
Author(s):  
Anna Gambin ◽  
Rafał Otto

In a recently proposed contextual alignment model, efficient algorithms exist for global and local pairwise alignment of protein sequences. Preliminary results obtained for biological data are very promising. Our main motivation was to adopt the idea of context dependency to the multiple-alignment setting. To this aim the relaxation of the model was developed (we call this new modelaveraged contextual alignment) and a new family of amino acids substitution matrices are constructed. In this paper we present a contextual multiple-alignment algorithm and report the outcomes of experiments performed for the BAliBASE test set. The contextual approach turned out to give much better results for the set of sequences containing orphan genes.


2005 ◽  
Vol 03 (02) ◽  
pp. 243-255 ◽  
Author(s):  
YI WANG ◽  
KUO-BIN LI

We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.


2003 ◽  
Vol 01 (03) ◽  
pp. 505-520 ◽  
Author(s):  
Mounir Errami ◽  
Christophe Geourjon ◽  
Gilbert Deléage

We present an original strategy, that involves a bioinformatic software structure, in order to perform an exhaustive and objective statistical analysis of three-dimensional structures of proteins. We establish the relationship between multiple sequences alignments and various structural features of proteins. We show that amino acids implied in disulfide bonds, salt bridges and hydrophobic interactions are particularly conserved. Effects of identity, global similarity within alignments, and accessibility of interactions have been studied. Furthermore, we point out that the more variable the sequences within a multiple alignment, the more informative the multiple alignment. The results support multiple alignments usefulness for predictions of structural features.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 135
Author(s):  
Eugene V. Korotkov ◽  
Yulia M. Suvorova ◽  
Dmitrii O. Kostenko ◽  
Maria A. Korotkova

In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from −499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.


Author(s):  
Wenbin Chen ◽  
Andrea M. Rocha ◽  
William Hendrix ◽  
Matthew Schmidt ◽  
Nagiza F. Samatova

2019 ◽  
Vol 20 (19) ◽  
pp. 4842 ◽  
Author(s):  
Andreia Albuquerque-Wendt ◽  
Hermann J. Hütte ◽  
Falk F. R. Buettner ◽  
Françoise H. Routier ◽  
Hans Bakker

Glycosyltransferases that use polyisoprenol-linked donor substrates are categorized in the GT-C superfamily. In eukaryotes, they act in the endoplasmic reticulum (ER) lumen and are involved in N-glycosylation, glypiation, O-mannosylation, and C-mannosylation of proteins. We generated a membrane topology model of C-mannosyltransferases (DPY19 family) that concurred perfectly with the 13 transmembrane domains (TMDs) observed in oligosaccharyltransferases (STT3 family) structures. A multiple alignment of family members from diverse organisms highlighted the presence of only a few conserved amino acids between DPY19s and STT3s. Most of these residues were shown to be essential for DPY19 function and are positioned in luminal loops that showed high conservation within the DPY19 family. Multiple alignments of other eukaryotic GT-C families underlined the presence of similar conserved motifs in luminal loops, in all enzymes of the superfamily. Most GT-C enzymes are proposed to have an uneven number of TDMs with 11 (POMT, TMTC, ALG9, ALG12, PIGB, PIGV, and PIGZ) or 13 (DPY19, STT3, and ALG10) membrane-spanning helices. In contrast, PIGM, ALG3, ALG6, and ALG8 have 12 or 14 TMDs and display a C-terminal dilysine ER-retrieval motif oriented towards the cytoplasm. We propose that all members of the GT-C superfamily are evolutionary related enzymes with preserved membrane topology.


2010 ◽  
Vol 26 (15) ◽  
pp. 1903-1904 ◽  
Author(s):  
P. Di Tommaso ◽  
M. Orobitg ◽  
F. Guirado ◽  
F. Cores ◽  
T. Espinosa ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document