scholarly journals Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences

2020 ◽  
Vol 11 ◽  
Author(s):  
Kenichiro Imai ◽  
Kenta Nakai

At the time of translation, nascent proteins are thought to be sorted into their final subcellular localization sites, based on the part of their amino acid sequences (i.e., sorting or targeting signals). Thus, it is interesting to computationally recognize these signals from the amino acid sequences of any given proteins and to predict their final subcellular localization with such information, supplemented with additional information (e.g., k-mer frequency). This field has a long history and many prediction tools have been released. Even in this era of proteomic atlas at the single-cell level, researchers continue to develop new algorithms, aiming at accessing the impact of disease-causing mutations/cell type-specific alternative splicing, for example. In this article, we overview the entire field and discuss its future direction.

2020 ◽  
Author(s):  
Angela Lopez-del Rio ◽  
Maria Martin ◽  
Alexandre Perera-Lluna ◽  
Rabie Saidi

Abstract Background The use of raw amino acid sequences as input for protein-based deep learning models has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. Results We analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Our results show that padding has an effect on model performance even when there are convolutional layers implied. We propose and implement four novel types of padding the amino acid sequences. Conclusions The present study highlights the relevance of the step of padding the one-hot encoded amino acid sequences when building deep learning-based models for Enzyme Commission number prediction. The fact that this has an effect on model performance should raise awareness on the need of justifying the details of this step on future works. The code of this analysis is available at https://github.com/b2slab/padding_benchmark.


Genome ◽  
1992 ◽  
Vol 35 (2) ◽  
pp. 360-371 ◽  
Author(s):  
Hugh Tyson

Optimum alignment in all pairwise combinations among a group of amino acid sequences generated a distance matrix. These distances were clustered to evaluate relationships among the sequences. The degree of relationship among sequences was also evaluated by calculating specific distances from the distance matrix and examining correlations between patterns of specific distances for pairs of sequences. The sequences examined were a group of 20 amino acid sequences of scorpion toxins originally published and analyzed by M.J. Dufton and H. Rochat in 1984. Alignment gap penalties were constant for all 190 pairwise sequence alignments and were chosen after assessing the impact of changing penalties on resultant distances. The total distances generated by the 190 pairwise sequence aligments were clustered using complete (farthest neighbour) linkage. The square, symmetrical input distance matrix is analogous to diallel cross data where reciprocal and parental values are absent. Diallel analysis methods provided analogues for the distance matrix to genetical specific combining abilities, namely specific distances between all sequence pairs that are independent of the average distances shown by individual sequences. Correlation of specific distance patterns, with transformation to modified z values and a stringent probability level, were used to delineate subgroups of related sequences. These were compared with complete linkage clustering results. Excellent agreement between the two approaches was found. Three originally outlying sequences were placed within the four new subgroups.Key words: sequence alignment, specific distances, sequence relationships.


2021 ◽  
Vol 12 ◽  
Author(s):  
Peter Lipsky ◽  
Patrick T. Vallano ◽  
Jeffrey Smith ◽  
Walter Owens ◽  
Daniel Snider ◽  
...  

The objective of the current work was to demonstrate the equivalence of Mylan’s glatiramer acetate (GA) to that of the reference product Copaxone® (COP) using the four criteria for active pharmaceutical ingredient sameness as established by the US Food and Drug Administration (FDA). The reaction scheme used to produce Mylan’s glatiramer acetate (MGA) was compared with that of COP, determined from publicly available literature. Comparative analyses of MGA and COP were performed for physicochemical properties such as amino acid composition and molecular weight distributions. Spectroscopic fingerprints were obtained using circular dichroism spectroscopy. Structural signatures for polymerization and depolymerization including total diethylamine (DEA) content, relative proportions of DEA-adducted amino acids, and N-and C-terminal amino acid sequences were probed with an array of highly sensitive analytical methods. Biological activity of the products was assessed using validated murine Experimental autoimmune encephalomyelitis (EAE) models of multiple sclerosis. MGA is produced using the same fundamental reaction scheme as COP and was shown to have equivalent physicochemical properties and composition. Analyses of multiple structural signatures demonstrated equivalence of MGA and COP with regard to polymerization, depolymerization, and propagational shift. Examination of the impact on prevention and treatment of EAE demonstrated equivalence of MGA and COP with respect to both activity and toxicity, and thereby provided confirmatory evidence of sameness. A rigorous, multi-pronged comparison of MGA and COP produced using an equivalent fundamental reaction scheme demonstrated equivalent physicochemical properties, structural signatures for polymerization and depolymerization, and biological activity as evidenced by comparable effects in EAE. These studies demonstrate the equivalence of MGA and COP, establishing active ingredient sameness by the US Food and Drug Administration (FDA) criteria for GA, and provide compelling evidence that the FDA-approved generic MGA can be substituted for COP for the treatment of patients with relapsing-remitting MS.


2006 ◽  
Vol 04 (06) ◽  
pp. 1181-1195 ◽  
Author(s):  
JIAN GUO ◽  
XIAN PU ◽  
YUANLIE LIN ◽  
HOWARD LEUNG

Subcellular location is an important functional annotation of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is necessary for large-scale genome analysis. This paper describes a protein subcellular localization method which extracts features from protein profiles rather than from amino acid sequences. The protein profile represents a protein family, discards part of the sequence information that is not conserved throughout the family and therefore is more sensitive than the amino acid sequence. The amino acid compositions of whole profile and the N-terminus of the profile are extracted, respectively, to train and test the probabilistic neural network classifiers. On two benchmark datasets, the overall accuracies of the proposed method reach 89.1% and 68.9%, respectively. The prediction results show that the proposed method perform better than those methods based on amino acid sequences. The prediction results of the proposed method are also compared with Subloc on two redundance-reduced datasets.


2019 ◽  
Author(s):  
Jacopo Marchi ◽  
Ezequiel A. Galpern ◽  
Rocio Espada ◽  
Diego U. Ferreiro ◽  
Aleksandra M. Walczak ◽  
...  

AbstractThe coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family —the total number of sequences in that family— can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.


2021 ◽  
Vol 12 ◽  
Author(s):  
Sebastian Fischer ◽  
Frauke Stanke ◽  
Burkhard Tümmler

Sixteen monozygotic cystic fibrosis (CF) twin pairs of whom 14 pairs were homozygous for the most common p.Phe508del CFTR mutation were selected from the European Cystic Fibrosis Twin and Sibling Study Cohort. The monozygotic twins were examined in their T cell receptor (TCR) repertoire in peripheral blood by amplicon sequencing of the CDR3 variable region of the ß-chain. The recruitment of TCR J and V genes for recombination and selection in the thymus showed a strong genetic influence in the CF twin cohort as indicated by the shortest Jensen-Shannon distance to the twin individual. Exceptions were the clinically most discordant and/or most severely affected twin pairs where clonal expansion probably caused by recurrent pulmonary infections overshadowed the impact of the identical genomic blueprint. In general the Simpson clonality was low indicating that the population of TCRß clonotypes of the CF twins was dominated by the naïve T-cell repertoire. Intrapair sharing of clonotypes was significantly more frequent among monozygotic CF twins than among pairs of unrelated CF patients. Complete nucleotide sequence identity was observed in about 0.11% of CDR3 sequences which partially should represent persisting fetal clones derived from the same progenitor T cells. Complete amino acid sequence identity was noted in 0.59% of clonotypes. Of the nearly 40,000 frequent amino acid clonotypes shared by at least two twin siblings 99.8% were already known within the immuneACCESS database and only 73 had yet not been detected indicating that the CDR3ß repertoire of CF children and adolescents does not carry a disease-specific signature but rather shares public clones with that of the non-CF community. Clonotypes shared within twin pairs and between unrelated CF siblings were highly abundant among healthy non-CF people, less represented in individuals with infectious disease and uncommon in patients with cancer. This subset of shared CF clonotypes defines CDR3 amino acid sequences that are more common in health than in disease.


2014 ◽  
Vol 8 (05) ◽  
pp. 570-580 ◽  
Author(s):  
Houssam A. Shaib ◽  
Nelly Cochet ◽  
Thierry Ribeiro ◽  
Afif M Abdel Nour ◽  
Georges Nemer ◽  
...  

Introduction: Avian influenza viruses of the H9N2 subtype have been reported to cause human infections. This study demonstrates the impact of nasal viral passaging of avian H9N2 in hamsters on its cross species-pathogenic adaptability and variability of amino acid sequences of the hemagglutinin (HA) and neuraminidase (NA) stalk. Methodology: Three intranasal passagings of avian H9N2 in hamsters P1, P2, and P3 were accomplished. Morbidity signs and lesions were observed three days post viral inoculation. The HA test was used for presumptive detection of H9N2 virus in the trachea and lungs of the hamsters challenged with the differently passaged viruses. Different primers were used for PCR amplification of the HA1 and NA stalk regions of the differently passaged H9N2 viruses, followed by sequence alignment. Results: The morbidity signs indicated low pathogenicity of the differently passaged H9N2 viruses in hamsters. The frequency of gross and microscopic lesions in the tracheas and lungs were insignificantly different among hamsters challenged with the differently passaged H9N2 viruses (p > 0.05). There was 100% similarity in the amino acid sequence of the HA gene of most passaged viruses. The amino acid sequence of the neuraminidase in the third passaged H9N2 virus recovered from lungs showed a R46P mutation that might have a role in the pathogenic adaptability of P3 viruses in hamsters’ lungs. Conclusions: The apparent adaptation of avian H9N2 virus to mammalian cells is in agreement with the World Health Organization’s alertness for a possible public health threat by this adaptable virus.


2015 ◽  
Vol 93 (4) ◽  
pp. 381-388 ◽  
Author(s):  
Christopher T. Lohans ◽  
Marco J. van Belkum ◽  
Jing Li ◽  
John C. Vederas

Campylobacter jejuni is one of the major causes of food poisoning, often resulting from the consumption of improperly cooked poultry products. The emergence of C. jejuni strains resistant to conventional antibiotics necessitates the evaluation of other possible treatments or preventative measures to minimize the impact and prevalence of infections. Antimicrobial peptides produced by bacteria have begun to emerge as a potential means of decreasing the levels of C. jejuni in poultry, thereby limiting Campylobacter contamination in associated food products. A number of bacteriocins produced by Gram-positive bacteria have unexpectedly been described as having antimicrobial activity against the Gram-negative C. jejuni. Additionally, some nonribosomal lipopeptides produced by Bacillus and Paenibacillus spp. show efficacy against this pathogen. This review will describe the bacterial antimicrobial peptides reported to be active against C. jejuni, with an emphasis on the characterization of their primary structures. However, for many of these peptides, little is known about their amino acid sequences and structures. Furthermore, there are unusual inconsistencies associated with the reported amino acid sequences for several of the more well-studied bacteriocins. Clarifying the chemical nature of these promising antimicrobial peptides is necessary before their potential utility for livestock protection from C. jejuni can be fully explored. Once these peptides are better characterized, they may prove to be strong candidates for minimizing the impact of Campylobacter on human health.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Gregory C. Antell ◽  
Will Dampier ◽  
Benjamas Aiamkitsumrit ◽  
Michael R. Nonnemacher ◽  
Vanessa Pirrone ◽  
...  

Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into thevprgene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients.


Sign in / Sign up

Export Citation Format

Share Document