Artificial intelligence analysis of FTIR and CD spectroscopic data for predicting and quantifying the length and content of protein secondary structures

2021 ◽  
pp. 1-7
Author(s):  
P.I. Haris ◽  
J.A. Hering

Besides NMR and X-ray crystallography, FTIR and CD spectroscopy are widely considered to be useful for determining protein secondary structure. These techniques can be used to obtain data in few minutes, using small quantities of proteins, which make them amenable for proteomics research. Here we explore the possibility of using artificial intelligence techniques to simultaneously analyse both FTIR and CD spectroscopic data for an identical set of proteins. Neural network analysis was carried out on normalised regions of FTIR (1700-1600 cm−1) and CD (180-259 nm) spectral data both with and without boxcar averaging in order to quantify the average length and percentages of secondary structures. A hybrid genetic algorithm/neural network approach, that automatically selects structure-sensitive wavelength/frequency, was used for the quantification of the protein secondary structure. Using this algorithm we also successfully identified the region of the CD spectrum that contains the most structure-sensitive information. This was located between 214-251 nm, suggesting that this region alone may be sufficient to rapidly determine the secondary structure content from CD spectral data. Overall, CD spectroscopic analysis produced better results compared to FTIR spectroscopy when selected wavelengths were used, although FTIR was better when the entire region between 1700-1600 cm−1 (FTIR), and 180-259 nm (CD), was subjected to neural network analysis. Application of Adaptive Neuro-Fuzzy Inference System (ANFIS) with fuzzy subtractive clustering for the analysis of the spectral data led to a slightly better prediction of the average helix/sheet length for FTIR spectroscopy compared to CD. Our findings reveal the potential of using artificial intelligence techniques for not only extracting structural information but also for better understanding of the relationship between complex spectral data and biologically important information.

2002 ◽  
Vol 16 (2) ◽  
pp. 53-69 ◽  
Author(s):  
Joachim A. Hering ◽  
Peter R. Innocent ◽  
Parvez I. Haris

Lack of reliable methods for accurate estimation of protein secondary structure from infrared spectra of proteins is a major barrier in its widespread use in protein secondary structure characterisation. Here we report a method for protein secondary structure estimation, from FTIR spectra of proteins, based on a multi‒layer feed‒forward neural network approach using an enhanced “resilient backpropagation” learning algorithm. The method utilises a database consisting of infrared spectra of 18 proteins, with known X‒ray structure, as the reference set. Our study revealed that providing the neural network analysis with only part of the amide I region from empirically determined structure sensitive regions in combination with appropriate pre‒processing of the spectral data produced the best overall results. This lead to a standard error of prediction (SEP) of 4.47% forα‒helix, an SEP of 6.16% forβ‒sheet, and an SEP of 4.61% for turns. Compared to a previous factor analysis study by Lee et al., using the same set of 18 FTIR spectra of proteins, the error in prediction of α‒helix and β‒sheet was improved by 3.33% and 3.54% respectively, with minor increase for turns by 0.31%. Generally, our neural network analysis achieved comparable, in most cases even better prediction accuracy than most of the alternative pattern recognition based methods that were previously reported indicating the significant potential of this approach.


2010 ◽  
Vol 58 (1) ◽  
pp. 72-75 ◽  
Author(s):  
Montaña Cámara ◽  
José S. Torrecilla ◽  
Jorge O. Caceres ◽  
M. Cortes Sánchez Mata ◽  
Virginia Fernández-Ruiz

2004 ◽  
Vol 171 (4S) ◽  
pp. 502-503
Author(s):  
Mohamed A. Gomha ◽  
Khaled Z. Sheir ◽  
Saeed Showky ◽  
Khaled Madbouly ◽  
Emad Elsobky ◽  
...  

2016 ◽  
Vol 34 (2) ◽  
pp. 025-036
Author(s):  
Oleg G. Gorshkov ◽  
◽  
Irina B. Starchenko ◽  
Andrey S. Sliva ◽  
◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Danijela Šantić ◽  
Kasia Piwosz ◽  
Frano Matić ◽  
Ana Vrdoljak Tomaš ◽  
Jasna Arapov ◽  
...  

AbstractBacteria are an active and diverse component of pelagic communities. The identification of main factors governing microbial diversity and spatial distribution requires advanced mathematical analyses. Here, the bacterial community composition was analysed, along with a depth profile, in the open Adriatic Sea using amplicon sequencing of bacterial 16S rRNA and the Neural gas algorithm. The performed analysis classified the sample into four best matching units representing heterogenic patterns of the bacterial community composition. The observed parameters were more differentiated by depth than by area, with temperature and identified salinity as important environmental variables. The highest diversity was observed at the deep chlorophyll maximum, while bacterial abundance and production peaked in the upper layers. The most of the identified genera belonged to Proteobacteria, with uncultured AEGEAN-169 and SAR116 lineages being dominant Alphaproteobacteria, and OM60 (NOR5) and SAR86 being dominant Gammaproteobacteria. Marine Synechococcus and Cyanobium-related species were predominant in the shallow layer, while Prochlorococcus MIT 9313 formed a higher portion below 50 m depth. Bacteroidota were represented mostly by uncultured lineages (NS4, NS5 and NS9 marine lineages). In contrast, Actinobacteriota were dominated by a candidatus genus Ca. Actinomarina. A large contribution of Nitrospinae was evident at the deepest investigated layer. Our results document that neural network analysis of environmental data may provide a novel insight into factors affecting picoplankton in the open sea environment.


Sign in / Sign up

Export Citation Format

Share Document