Post-Alignment Adjustment and Its Automation

Xuhua Xia

doi:10.3390/genes12111809

Post-Alignment Adjustment and Its Automation

Genes ◽

10.3390/genes12111809 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1809

Author(s):

Xuhua Xia

Keyword(s):

Amino Acid ◽

Large Scale ◽

Position Weight Matrix ◽

Pairwise Alignment ◽

Amino Acid Sequences ◽

Weight Matrix ◽

Multiple Sequence ◽

Manual Adjustment ◽

Alignment Errors ◽

Almost All

Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.

Get full-text (via PubEx)

Correlated mutations in hydroxysteroid dehydrogenases family

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2016-0024 ◽

2017 ◽

Vol 13 (1) ◽

Author(s):

Agata Żyźniewska ◽

Jacek Leluk ◽

Gabriela Żaroffe

Keyword(s):

Amino Acid ◽

Consensus Sequence ◽

Hydroxysteroid Dehydrogenase ◽

Amino Acid Sequences ◽

Multiple Sequence ◽

Hydroxysteroid Dehydrogenases ◽

Uniprot Database ◽

Correlated Mutations ◽

Almost All ◽

Degree Of Similarity

AbstractBackgroundHydroxysteroid dehydrogenase enzymes belong to the short-chain dehydrogenase/reductase (SDR) superfamily and aldo-keto reductases (AKRs). SDR is involved in the metabolism of many compounds (hormones, lipids, etc.) and is present in almost all studied genomes. Two hundred members of hydroxysteroid dehydrogenases have been analysed in terms of natural mutational variability. The second superfamily comprises AKR superfamily group enzymes whose function is catalysing the oxidation and reduction of many substrates by binding NAD(P)H as a cofactor. This kind of study is the first approach for the hydroxysteroid dehydrogenase family. This information grants practical meaning to designing potential specific drugs to fight specific diseases caused by mutations.MethodsIn the research, amino acid sequences of representatives of the hydroxysteroid dehydrogenase family were extracted from the UniProt database. In total, the analysed 200 sequences with the highest degree of similarity were shown by BLAST searches. In the sequence analyses, we used the following software: ClustalX (multiple sequence alignment), Consensus Constructor (creating consensus sequence), and CORM (finding correlated mutations).ResultsThe CORM program identified potential sites of correlated mutations in hydroxysteroid dehydrogenases. This program generated 18 tables of results that contain the amino acid positions of mutations. Seven of these are presented in this paper.ConclusionsThe primary structure of the hydroxysteroid dehydrogenase family shows high variation.

Get full-text (via PubEx)

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Get full-text (via PubEx)

Molecular characterization and phylogenetic analysis of NBS-LRR genes in wild relatives of eggplant (Solanum melongena L

Indian Journal of Agricultural Research ◽

10.18805/ijare.a-4793 ◽

2018 ◽

Author(s):

Sona. S Dev ◽

P. Poornima ◽

Akhil Venu

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Sequence Similarity ◽

Interleukin 1 ◽

Preliminary Investigation ◽

Solanum Melongena ◽

Wild Relatives ◽

Amino Acid Sequences ◽

R Genes ◽

Multiple Sequence

Eggplantor brinjal (Solanum melongena L.), is highly susceptible to various soil-borne diseases. The extensive use of chemical fungicides to combat these diseases can be minimized by identification of resistance gene analogs (RGAs) in wild species of cultivated plants.In the present study, degenerate PCR primers for the conserved regions ofnucleotide binding site-leucine rich repeat (NBS-LRR) were used to amplify RGAs from wild relatives of eggplant (Black nightshade (Solanum nigrum), Indian nightshade (Solanumviolaceum)and Solanu mincanum) which showed resistance to the bacterial wilt pathogen, Ralstonia solanacearumin the preliminary investigation. The amino acid sequence of the amplicons when compared to each other and to the amino acid sequences of known RGAs deposited in Gen Bank revealed significant sequence similarity. The phylogenetic analysis indicated that they belonged to the toll interleukin-1 receptors (TIR)-NBS-LRR type R-genes. Multiple sequence alignment with other known R genes showed significant homology with P-loop, Kinase 2 and GLPL domains of NBS-LRR class genes. There has been no report on R genes from these wild eggplants and hence the diversity analysis of these novel RGAs can lead to the identification of other novel R genes within the germplasm of different brinjal plants as well as other species of Solanum.

Get full-text (via PubEx)

Insertions and deletions trigger adaptive walks in Drosophila proteins

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2011.2571 ◽

2012 ◽

Vol 279 (1740) ◽

pp. 3075-3082 ◽

Cited By ~ 11

Author(s):

Evgeny V. Leushkin ◽

Georgii A. Bazykin ◽

Alexey S. Kondrashov

Keyword(s):

Amino Acid ◽

Amino Acid Sequences ◽

Molecular Adaptation ◽

Amino Acid Substitutions ◽

Protein Coding ◽

Evolution Of Life ◽

High Fraction ◽

Adaptive Walks ◽

The Difference ◽

Almost All

Maps that relate all possible genotypes or phenotypes to fitness—fitness landscapes—are central to the evolution of life, but remain poorly known. An insertion or a deletion (indel) of one or several amino acids constitutes a substantial leap of a protein within the space of amino acid sequences, and it is unlikely that after such a leap the new sequence corresponds precisely to a fitness peak. Thus, one can expect an indel in the protein-coding sequence that gets fixed in a population to be followed by some number of adaptive amino acid substitutions, which move the new sequence towards a nearby fitness peak. Here, we study substitutions that occur after a frame-preserving indel in evolving proteins of Drosophila . An insertion triggers 1.03 ± 0.75 amino acid substitutions within the protein region centred at the site of insertion, and a deletion triggers 4.77 ± 1.03 substitutions within such a region. The difference between these values is probably owing to a higher fraction of effectively neutral insertions. Almost all of the triggered amino acid substitutions can be attributed to positive selection, and most of them occur relatively soon after the triggering indel and take place upstream of its site. A high fraction of substitutions that follow an indel occur at previously conserved sites, suggesting that an indel substantially changes selection that shapes the protein region around it. Thus, an indel is often followed by an adaptive walk of length that is in agreement with the theory of molecular adaptation.

Get full-text (via PubEx)

Identification of amino acid sequences determining interaction between the cucumber mosaic virus-encoded 2a polymerase and 3a movement proteins

Journal of General Virology ◽

10.1099/vir.0.83207-0 ◽

2007 ◽

Vol 88 (12) ◽

pp. 3445-3451 ◽

Cited By ~ 11

Author(s):

Min Sook Hwang ◽

Kyung Nam Kim ◽

Jeong Hyun Lee ◽

Young In Park

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Cucumber Mosaic Virus ◽

Mosaic Virus ◽

Critical Role ◽

Amino Acid Sequences ◽

Multiple Sequence ◽

Movement Proteins ◽

Gdd Motif

The cucumber mosaic virus (CMV)-encoded 3a movement protein (MP) is indispensable for CMV movement in plants. We have previously shown that MP interacts directly with the CMV-encoded 2a polymerase protein in vitro. Here, we further dissected this interaction and determined the amino acid sequences that are responsible for the MP and 2a polymerase protein interaction. Both the N-terminal 21 amino acids and the central GDD motif of the 2a polymerase protein were important for interacting with the MP. Although each of the regions alone was sufficient for the interaction with MP, quantitative yeast two-hybrid analyses showed that they acted synergistically to enhance the binding affinity. The MP N-terminal 20 amino acids were sufficient for interacting with the 2a polymerase protein, and the serine residue at position 14 played a critical role in the interaction. Multiple sequence alignment showed that the 2a protein interacting regions and the serine at position 14 in the MP are highly conserved among subgroup I and II CMV isolates.

Get full-text (via PubEx)

Investigation of Two Evolutionarily Unrelated Halocarboxylic Acid Dehalogenase Gene Families

Journal of Bacteriology ◽

10.1128/jb.181.8.2535-2547.1999 ◽

1999 ◽

Vol 181 (8) ◽

pp. 2535-2547 ◽

Cited By ~ 72

Author(s):

Katja E. Hill ◽

Julian R. Marchesi ◽

Andrew J. Weightman

Keyword(s):

Amino Acid ◽

Molecular Analysis ◽

Phylogenetic Trees ◽

Gene Families ◽

Amino Acid Sequences ◽

Group I ◽

Silent Genes ◽

Degrading Bacteria ◽

Group Ii ◽

Almost All

ABSTRACT Dehalogenases are key enzymes in the metabolism of halo-organic compounds. This paper describes a systematic approach to the isolation and molecular analysis of two families of bacterial α-halocarboxylic acid (αHA) dehalogenase genes, called group I and group II deh genes. The two families are evolutionarily unrelated and together represent almost all of the αHAdeh genes described to date. We report the design and evaluation of degenerate PCR primer pairs for the separate amplification and isolation of group I and II dehgenes. Amino acid sequences derived from 10 of 11 group Ideh partial gene products of new and previously reported bacterial isolates showed conservation of five residues previously identified as essential for activity. The exception, DehD from a Rhizobium sp., had only two of these five residues. Group II deh gene sequences were amplified from 54 newly isolated strains, and seven of these sequences were cloned and fully characterized. Group II dehalogenases were stereoselective, dechlorinating l- but not d-2-chloropropionic acid, and derived amino acid sequences for all of the genes exceptdehII°P11 showed conservation of previously identified essential residues. Molecular analysis of the twodeh families highlighted four subdivisions in each, which were supported by high bootstrap values in phylogenetic trees and by enzyme structure-function considerations. Group Ideh genes included two putative cryptic or silent genes, dehI°PP3 anddehI°17a, produced by different organisms. Group II deh genes included two cryptic genes and an active gene, dehII PP3, that can be switched off and on. All αHA-degrading bacteria so far described were Proteobacteria, a result that may be explained by limitations either in the host range fordeh genes or in isolation methods.

Get full-text (via PubEx)

Phylogenetic and topological analyses of the bovine interferon-induced transmembrane protein (IFITM3)

Acta Veterinaria Hungarica ◽

10.1556/004.2021.00010 ◽

2021 ◽

Author(s):

Yong-Chan Kim ◽

Byung-Hoon Jeong

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Multiple Sequence Alignment ◽

Transmembrane Domain ◽

Transmembrane Protein ◽

Phylogenetic Analyses ◽

Amino Acid Sequences ◽

Multiple Sequence ◽

Interspecific Differences ◽

Distinct Features

AbstractInterferon-induced transmembrane protein 3 (IFITM3) plays a pivotal role in antiviral capacity in several species. However, to date, investigations of the IFITM3 protein in cattle have been rare. According to recent studies, interspecific differences in the IFITM3 protein result in several unique features of the IFITM3 protein relative to primates and birds. Thus, in the present study, we investigated the bovine IFITM3 protein based on nucleotide and amino acid sequences to find its distinct features. We found that the bovine IFITM3 gene showed a significantly different length and homology relative to other species, including primates, rodents and birds. Phylogenetic analyses indicated that the bovine IFITM3 gene and IFITM3 protein showed closer evolutionary distance with primates than with rodents. However, cattle showed an independent clade among primates, rodents and birds. Multiple sequence alignment of the IFITM3 protein indicated that the bovine IFITM3 protein contains 36 bovine-specific amino acids. Notably, the bovine IFITM3 protein was predicted to prefer inside-to-outside topology of intramembrane domain 1 (IMD1) and inside-to-outside topology of transmembrane domain 2 by TMpred and three membrane embedding domains according to the SOSUI system.

Get full-text (via PubEx)

Snake venom disintegrins: novel dimeric disintegrins and structural diversification by disulphide bond engineering

Biochemical Journal ◽

10.1042/bj20021739 ◽

2003 ◽

Vol 372 (3) ◽

pp. 725-734 ◽

Cited By ~ 124

Author(s):

Juan J. CALVETE ◽

M. Paz MORENO-MURCIANO ◽

R. David G. THEAKSTON ◽

Dariusz G. KISIEL ◽

Cezary MARCINKIEWICZ

Keyword(s):

Amino Acid ◽

K562 Cells ◽

Amino Acid Sequences ◽

Disulphide Bond ◽

Vascular Cell Adhesion ◽

Integrin Α5β1 ◽

Multiple Sequence ◽

Vipera Lebetina ◽

Cell Adhesion Molecule 1 ◽

Structural Diversification

We report the isolation and amino acid sequences of six novel dimeric disintegrins from the venoms of Vipera lebetina obtusa (VLO), V. berus (VB), V. ammodytes (VA), Echis ocellatus (EO) and Echis multisquamatus (EMS). Disintegrins VLO4, VB7, VA6 and EO4 displayed the RGD motif and inhibited the adhesion of K562 cells, expressing the integrin α5β1 to immobilized fibronectin. A second group of dimeric disintegrins (VLO5 and EO5) had MLD and VGD motifs in their subunits and blocked the adhesion of the α4β1 integrin to vascular cell adhesion molecule 1 with high selectivity. On the other hand, disintegrin EMS11 inhibited both α5β1 and α4β1 integrins with almost the same degree of specificity. Comparison of the amino acid sequences of the dimeric disintegrins with those of other disintegrins by multiple-sequence alignment and phylogenetic analysis, in conjunction with current biochemical and genetic data, supports the view that the different disintegrin subfamilies evolved from a common ADAM (adisintegrin and metalloproteinase-like) scaffold and that structural diversification occurred through disulphide bond engineering.

Get full-text (via PubEx)

DNA barcoding of different Triticum species

Bulletin of the National Research Centre ◽

10.1186/s42269-019-0192-9 ◽

2019 ◽

Vol 43 (1) ◽

Cited By ~ 1

Author(s):

Samira A. Osman ◽

Walaa A. Ramadan

Keyword(s):

Amino Acid ◽

Nucleotide Sequence ◽

Phylogenetic Tree ◽

Dna Barcoding ◽

Critical Role ◽

Dna Barcode ◽

Species Group ◽

Amino Acid Sequences ◽

Pairwise Distance ◽

Multiple Sequence

Abstract Background The genus Triticum L. includes diploid, tetraploid, and hexaploid species. DNA barcoding is a new method to identify plant taxa by using short sequences of DNA and within a short time. In this investigation, we determined a phylogenetic analysis of 20 different Triticum species by partial chloroplast Maturase encoding gene (matK). Materials and methods Twenty accessions of different Triticum species diploid, tetraploid, and hexaploid were obtained from different countries. Genomic DNA was isolated from young leaves of studied samples and then used as a template for PCR reaction. PCR products were checked by electrophoresis, purified, sequenced, and submitted in the GenBank nucleotide sequence database, the nucleotide sequence was translated into an amino acid sequence. The nucleotide and amino acid sequences were aligned with Clustal W multiple sequence alignment programs to obtain the phylogenetic tree depending on two statistical data analysis such as bootstrapping and pairwise distance from both nucleotide and amino acid sequences. Results The phylogenetic tree obtained from both nucleotide and amino acid sequences divided the 20 Triticum species into two groups, A and B. Group A represented the diploid Triticum species. Group B was divided into two subgroup, I and II. Subgroup I represented the hexaploid Triticum species and subgroup II represented the tetraploid species. Conclusion The matK gene sequence has a critical role in discriminating the closely related Triticum species. So these sequences could be used as a DNA barcode for detecting the evolutionary history of Triticum species.

Get full-text (via PubEx)

Complete Genome Sequence of Tomato Spotted Wilt Virus from Paprika in Korea

International Journal of Phytopathology ◽

10.33687/phytopath.002.03.0378 ◽

2013 ◽

Vol 2 (3) ◽

pp. 121-136 ◽

Cited By ~ 1

Author(s):

Jae-Hyun Kim ◽

Young-Soo Kim ◽

Soo-Won Jang ◽

Yong-Ho Jeon

Keyword(s):

Amino Acid ◽

Tomato Spotted Wilt Virus ◽

Cell Movement ◽

Datura Stramonium ◽

Amino Acid Sequences ◽

Viral Glycoprotein ◽

Multiple Sequence ◽

Genomic Rnas ◽

Tomato Spotted Wilt ◽

Wilt Virus

We isolated tomato spotted wilt virus (TSWV-KP) from a diseased Capsicum annuum var. grossum with malformed leaves and necrotic spotted fruits. TSWV-KP produced necrosis or necrotic ring spots on inoculated leaves along with mosaic, vein necrosis, or death on the upper leaves on Datura stramonium, Nicotiana clevarandii, N. rustica, and N. tabacum cvs. Ultrastructurally, typical tospovirus particles were observed in the cytoplasm. The virion contained three molecules of genomic RNAs of approximately 9.0, 4.9, and 3.0 kb. The nucleocapsid (N) protein of the purified virion migrated as a single band with ~29 kDa molecular weight in SDS-PAGE. Complete nucleotide sequences of the large (L) genome segments of TSWV-KP were determined. Defective forms of L-RNA containing core polymerase regions were observed. L-RNA (8,917 nucleotides) contained a single open reading frame (ORF) in the viral complementary (vc) strand and encoded a 330-kDa protein. The L-protein had high identity in the “core-polymerase domain” with the corresponding regions of other tospoviruses. The complete nucleotide sequence of TSWV-KP medium-sized (M) RNA comprised 4,768 nucleotides and indicated a typical tospovirus with two genes in ambisense arrangement. The vRNA OFR coded for a potential cell-to-cell movement (NSm) 34.8-kDa protein; and the vcRNA ORF, for the viral glycoprotein (G1/G2) 128.0-kDa precursor. Multiple sequence alignment of the M-RNA showed highest homologies to TSWV-BR01. Amino acid sequences of TSWV-KP NSm and G1/G2 exhibited 48.7–85.3% and 34.9–96.2% identity, respectively. TSWV-KP small (S) RNA comprised 2,991 nucleotides with ambisense coding strategy. The sequence contained two ORFs—one in the viral sense, encoding a protein with predicted 52.4-kDa Mr; and another in the viral complementary sense, encoding the viral nucleocapsid protein of 28.8-kDa Mr. Amino acid sequences of TSWV-KP of S-RNA NSs and N exhibited 35.9–87.9% and 19.9–98.4% identity, respectively.

Get full-text (via PubEx)