Many protein products from a few loci: assignment of human salivary proline-rich proteins to specific loci.

Genetics ◽  
1988 ◽  
Vol 120 (1) ◽  
pp. 255-265 ◽  
Author(s):  
K M Lyons ◽  
E A Azen ◽  
P A Goodman ◽  
O Smithies

Abstract Earlier studies of protein polymorphisms led to the description of 13 linked loci thought to encode the human salivary proline-rich proteins (PRPs). However, more recent studies at the DNA level have shown that there are only six genes which encode PRPs. The present study was undertaken in order to reconcile these observations. Nucleotide and decoded amino acid sequences from each of the six genes were compared with the available protein sequence data for PRPs. This analysis allowed assignment of the PmF, PmS and Pe proteins to the PRB1 locus, the G1 protein to the PRB3 locus, the Po protein to the PRB4 locus, the Ps protein to the PRB2 locus, and the CON1 and CON2 proteins to the PRB4 locus. Correlations between insertion/deletion RFLPs and PRP protein phenotypes were observed for the PmF, PmS, Gl and CON2 proteins. Our overall analysis indicates that in many instances several proteins previously considered to be the products of separate loci are actually proteolytic cleavage products of a large precursor specified by one or other of the six genes identified at the DNA level. Our analysis also demonstrates that some of the "null" alleles proposed to occur at 11 of the 13 loci in the earlier genetic studies, are actually productive alleles having alterations at proteolytic cleavage sites within the relevant precursor protein. The absence of cleavage leads to the persistence of longer precursor peptides not resolved electrophoretically, concurrently with an absence of the smaller PRPs seen when cleavage occurs.

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 530
Author(s):  
Milton Silva ◽  
Diogo Pratas ◽  
Armando J. Pinho

Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2–9% and 6–7% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences’ input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.


1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


Sign in / Sign up

Export Citation Format

Share Document