scholarly journals Principal Components of Genetic Sequences: Correlations and Significance

Author(s):  
V.M. Efimov ◽  
K.V. Efimov ◽  
V.Yu. Kovaleva ◽  
Yu.G. Matushkin

Any numerical series can be decomposed into principal components using singular spectral analysis. We have recently proposed a new analysis method ‒ PCA-Seq, which allows calculating numerical principal components for a sequence of elements of any type. In particular, the sequence may be composed of nucleotide base pairs or amino acid residues. Two questions inevitably arise about interpretation of the obtained principal components and about the assessment of their reliability. For interpretation of the symbolic sequence principal components, it is reasonable to evaluate their correlations with numerical characteristics of the sequence elements. To assess the significance of correlations between sequences, one should bear in mind that standard significance criteria are based on the assumption of independence of observations, which, as a rule, is not fulfilled for real sequences. The article discusses the use of an anchor bootstrap technique for these purposes also previously developed by the authors of the article. In this approach it is assumed, that points of a metric space can represent the objects. When taken together they make up some fixed structure in it, in particular, a sequence. The objects are assigned the same random integer weights as in the classical bootstrap. This is sufficient to obtain the bootstrap distribution of the correlation coefficients and assess their significance. The coding sequence of the SLC9A1 gene (synonyms APNH, NHE1, PPP1R143) were taken as an example of use the anchor bootstrap technique in the genetic sequence analysis. Significant correlations of the first principal component were revealed with the hydrophobicity/“transmembraneity” of the corresponding fragments of the amino acid sequence, the phenylalanine content in them, as well as the difference in the T- and A-content in the corresponding nucleotide fragments. Earlier a similar pattern was found by other authors for other genes. Very likely, that it is of a more general nature.

1995 ◽  
Vol 89 (4) ◽  
pp. 405-415 ◽  
Author(s):  
R. L. Sidebotham ◽  
J. H. Baron ◽  
J. Schrager ◽  
J. Spencer ◽  
J. R. Clamp ◽  
...  

1. The content and distribution of carbohydrate was examined in mucus glycopolypeptides from human antral mucosae. 2. The mean amount of carbohydrate per 1000 amino acid residues was found to be similar in glycopolypeptides with A, B or H activity. It was slightly, though significantly, less in glycopolypeptides lacking these determinants, because carbohydrate chains were of a shorter average length than in the A-, B- or H-active preparations. This difference was reflected in the sizes of oligosaccharide—alcohols released from representative glycopolypeptides with alkaline borohydride. 3. Differences between A-, B- or H-active and non-secretor glycopolypeptides in terms of the mean number of carbohydrate chains per 1000 amino acid residues were found to be small, and without significance. 4. The average number of peripheral monosaccharide units per 1000 amino acid residues was greater in A-active than in H-active, and least in non-secretor, glycopolypeptides. This order was reversed for monosaccharide units incorporated into skeletal (core plus backbone) structures. The difference in each case was statistically significant. 5. These findings suggest that the increased risk of peptic ulcer associated with blood group O and non-secretor status is unlikely to be attributable to an inherent deficiency in the protective mucus layer, linked to differences between mucins that are associated with A, B or H activity. Other hypotheses linked to infection with Helicobacter pylori are examined.


2015 ◽  
Vol 36 (6) ◽  
pp. 3909
Author(s):  
Michelle Santos da Silva ◽  
Luciana Shiotsuki ◽  
Raimundo Nonato Braga Lôbo ◽  
Olivardo Facó

A multivariate approach was adopted to evaluate the relationship among traits measured in the performance testing of Morada Nova sheep, verify the efficiency of a ranking method used in these tests and identify the most significant traits for use in future analyses. Data from 150 young rams participating in five versions of the performance tests for the Morada Nova breed were used. Twenty traits were measured in each animal: initial weight (IW), final weight (FW), average daily weight gain (ADG), loin eye area (LEA), scrotal circumference (SC), fat thickness (FT), conformation (C), precocity (Pc), muscularity (M), breed features (BF), legs (L), withers height (WH), chest width (CW), rump height (RH), rump width (RW), rump length (RL), body length (BL), body depth (BD), heart girth (HG) and body condition scoring (BCS). The Pearson’s correlation coefficients ranged from –0.10 to 0.93, with the highest correlations were between body weight variables and morphometric measurements. The three first principal components explained 72.28% of the total variability among all traits. The variables related to animal size defined the first principal component, whereas those related to visual appraisal and suitability for meat production defined the second and third principal components, respectively. The combination of traits from the principal component analysis showed that the ranking method currently used in the performance testing of Morada Nova sheep is efficient for selecting larger rams with better breed features and higher degrees of specialization for meat production.


1971 ◽  
Vol 49 (9) ◽  
pp. 999-1004 ◽  
Author(s):  
M. C. Shaw ◽  
T. Viswanatha

The physicochemical properties of chymotrypsin-P obtained by the papain activation of chymotrypsinogen have been investigated. The molecular weight of this enzyme as determined by gel filtration technique has been found to be 24 000 ± 1000. The amino acid residues occupying the N-terminal positions and the composition of the B- and C-chains of chymotrypsin-P are identical with those found in α-chymotrypsin. Thus the difference between the two enzymes is restricted to the composition of their A-chains.


2009 ◽  
Vol 390 (3) ◽  
Author(s):  
Takayuki K. Nemoto ◽  
Toshio Ono ◽  
Yu Shimoyama ◽  
Shigenobu Kimura ◽  
Yuko Ohara-Nemoto

Abstract Staphylococcus aureus, Staphylococcus epidermidis, and Staphylococcus warneri secrete glutamyl endopeptidases, designated GluV8, GluSE, and GluSW, respectively. The order of their protease activities is GluSE<GluSW<<GluV8. In the present study, we investigated the mechanism that causes these differences. Expression of chimeric proteins between GluV8 and GluSE revealed that the difference is primarily attributed to amino acid residues 170–195, which define the intrinsic protease activity, and additionally to residues 119–169, which affect the proteolytic sensitivity. Among nine substitutions present in residues 170–195 of the three proteases, the substitutions at positions 185, 188, and 189 were responsible for the changes in their activities, and the combination of W185, V188, and P189, which naturally occurs in GluV8, exerts the highest protease activity. W185 and P189 were indispensable for full activity, but V188 could be replaced by hydrophobic amino acids. These three amino acid residues appear to create a substrate-binding pocket together with the catalytic triad and the N-terminal V1, and therefore define the K m values of the proteases. We also describe a method to produce a chimeric form of GluSE and GluV8 that is resistant to proteolysis, and therefore possesses 4-fold higher activity than the wild-type recombinant GluV8.


2009 ◽  
Vol 191 (17) ◽  
pp. 5553-5562 ◽  
Author(s):  
Dominik Schilling ◽  
Ulrike Gerischer

ABSTRACT In gammaproteobacteria the Hfq protein shows a great variation in size, especially in its C-terminal part. Extremely large Hfq proteins consisting of almost 200 amino acid residues and more are found within the gammaproteobacterial family Moraxellaceae. The difference in size compared to other Hfq proteins is due to a glycine-rich domain near the C-terminal end of the protein. Acinetobacter baylyi, a nonpathogenic soil bacterium and member of the Moraxellaceae encodes a large 174-amino-acid Hfq homologue containing the unique and repetitive amino acid pattern GGGFGGQ within the glycine-rich domain. Despite the presence of the C-terminal extension, A. baylyi Hfq complemented an Escherichia coli hfq mutant in vivo. By using polyclonal anti-Hfq antibodies, we detected the large A. baylyi Hfq that corresponds to its annotated size indicating the expression and stability of the full protein. Deletion of the complete A. baylyi hfq open reading frame resulted in severe reduction of growth. In addition, a deletion or overexpression of Hfq was accompanied by the loss of cell chain assembly. The glycine-rich domain was not responsible for growth and cell phenotypes. hfq gene localization in A. baylyi is strictly conserved within the mutL-miaA-hfq operon, and we show that hfq expression starts within the preceding miaA gene or further upstream.


2014 ◽  
Vol 38 (2) ◽  
pp. 372-385 ◽  
Author(s):  
Rodnei Rizzo ◽  
José A. M. Demattê ◽  
Fabrício da Silva Terra

Considering that information from soil reflectance spectra is underutilized in soil classification, this paper aimed to evaluate the relationship of soil physical, chemical properties and their spectra, to identify spectral patterns for soil classes, evaluate the use of numerical classification of profiles combined with spectral data for soil classification. We studied 20 soil profiles from the municipality of Piracicaba, State of São Paulo, Brazil, which were morphologically described and classified up to the 3rd category level of the Brazilian Soil Classification System (SiBCS). Subsequently, soil samples were collected from pedogenetic horizons and subjected to soil particle size and chemical analyses. Their Vis-NIR spectra were measured, followed by principal component analysis. Pearson's linear correlation coefficients were determined among the four principal components and the following soil properties: pH, organic matter, P, K, Ca, Mg, Al, CEC, base saturation, and Al saturation. We also carried out interpretation of the first three principal components and their relationships with soil classes defined by SiBCS. In addition, numerical classification of the profiles based on the OSACA algorithm was performed using spectral data as a basis. We determined the Normalized Mutual Information (NMI) and Uncertainty Coefficient (U). These coefficients represent the similarity between the numerical classification and the soil classes from SiBCS. Pearson's correlation coefficients were significant for the principal components when compared to sand, clay, Al content and soil color. Visual analysis of the principal component scores showed differences in the spectral behavior of the soil classes, mainly among Argissolos and the others soils. The NMI and U similarity coefficients showed values of 0.74 and 0.64, respectively, suggesting good similarity between the numerical and SiBCS classes. For example, numerical classification correctly distinguished Argissolos from Latossolos and Nitossolos. However, this mathematical technique was not able to distinguish Latossolos from Nitossolos Vermelho férricos, but the Cambissolos were well differentiated from other soil classes. The numerical technique proved to be effective and applicable to the soil classification process.


1998 ◽  
Vol 77 (4) ◽  
pp. 301-306 ◽  
Author(s):  
Takahiro Seki ◽  
Masabumi Minami ◽  
Chiaki Kimura ◽  
Tomoya Uehara ◽  
Takayuki Nakagawa ◽  
...  

2005 ◽  
Vol 2 (3) ◽  
pp. 207-212
Author(s):  
Wang Yi-Zhen ◽  
Chu Xiao-Na ◽  
Huang Hai-Qing ◽  
Han Fei-Fei ◽  
Liu Jian-Xin

AbstractGenomic RNA was extracted from the subcutaneous adipose tissue of piglets (Duroc×Landrace×Tai-hu) at 1, 14, 28, 42 and 56 days of age, and obese gene (ob) mRNA was amplified using reverse transcriptase-polymerase chain reaction (RT-PCR). A DNA fragment of about 504 bp was obtained and the PCR product was cloned into a pGEM-T vector. The ob gene was isolated and sequenced from the positive clones screened. Sequence analysis suggested that this fragment was a partial sequence of ob cDNA, coding 167 amino acid residues, which constituted the major part of leptin mature protein. The gene homology of the fragment obtained in this study compared to the reported ob cDNA sequence in adipocytes of pig was 99.405%, and amino acid homology was 98.94%. Based on the ob gene clone, we successfully constructed an optimal semi-quantitative RT-PCR method. Using β-actin as the internal control, we investigated the difference of ob gene expression at different ages of piglets. Results showed that ob mRNA levels increased steadily at postnatal days 1–28 (preweaning), peaked at postnatal day 28, when piglets were weaned, and decreased from day 28 to 56.


2015 ◽  
Vol 32 (6) ◽  
pp. 843-849 ◽  
Author(s):  
Rhys Heffernan ◽  
Abdollah Dehzangi ◽  
James Lyons ◽  
Kuldip Paliwal ◽  
Alok Sharma ◽  
...  

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 2 (3) ◽  
pp. 221-225 ◽  
Author(s):  
A. Ahmad ◽  
S. Quegan

Two methods of cloud masking tuned to tropical conditions have been developed, based on spectral analysis and Principal Components Analysis (PCA) of Moderate Resolution Imaging Spectroradiometer (MODIS) data. In the spectral approach, thresholds were applied to four reflective bands (1, 2, 3, and 4), three thermal bands (29, 31 and 32), the band 2/band 1 ratio, and the difference between band 29 and 31 in order to detect clouds. The PCA approach applied a threshold to the first principal component derived from the seven quantities used for spectral analysis. Cloud detections were compared with the standard MODIS cloud mask, and their accuracy was assessed using reference images and geographical information on the study area.


Sign in / Sign up

Export Citation Format

Share Document