scholarly journals A Computational Method to Predict Effects of Residue Mutations on the Catalytic Efficiency of Hydrolases

Catalysts ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 286
Author(s):  
Yun Li ◽  
Kun Song ◽  
Jian Zhang ◽  
Shaoyong Lu

With scientific and technological advances, growing research has focused on engineering enzymes that acquire enhanced efficiency and activity. Thereinto, computer-based enzyme modification makes up for the time-consuming and labor-intensive experimental methods and plays a significant role. In this study, for the first time, we collected and manually curated a data set for hydrolases mutation, including structural information of enzyme-substrate complexes, mutated sites and Kcat/Km obtained from vitro assay. We further constructed a classification model using the random forest algorithm to predict the effects of residue mutations on catalytic efficiency (increase or decrease) of hydrolases. This method has achieved impressive performance on a blind test set with the area under the receiver operating characteristic curve of 0.86 and the Matthews Correlation Coefficient of 0.659. Our results demonstrate that computational mutagenesis has an instructive effect on enzyme modification, which may expedite the design of engineering hydrolases.

Author(s):  
Jorge Fernandez-de-Cossio-Diaz ◽  
Guido Uguzzoni ◽  
Andrea Pagnani

The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans, deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.


2020 ◽  
Vol 38 (1) ◽  
pp. 318-328 ◽  
Author(s):  
Jorge Fernandez-de-Cossio-Diaz ◽  
Guido Uguzzoni ◽  
Andrea Pagnani

Abstract The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype–fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.


Author(s):  
Weiping Liu ◽  
John W. Sedat ◽  
David A. Agard

Any real world object is three-dimensional. The principle of tomography, which reconstructs the 3-D structure of an object from its 2-D projections of different view angles has found application in many disciplines. Electron Microscopic (EM) tomography on non-ordered structures (e.g., subcellular structures in biology and non-crystalline structures in material science) has been exercised sporadically in the last twenty years or so. As vital as is the 3-D structural information and with no existing alternative 3-D imaging technique to compete in its high resolution range, the technique to date remains the kingdom of a brave few. Its tedious tasks have been preventing it from being a routine tool. One keyword in promoting its popularity is automation: The data collection has been automated in our lab, which can routinely yield a data set of over 100 projections in the matter of a few hours. Now the image processing part is also automated. Such automations finish the job easier, faster and better.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


2020 ◽  
Vol 44 (8) ◽  
pp. 851-860
Author(s):  
Joy Eliaerts ◽  
Natalie Meert ◽  
Pierre Dardenne ◽  
Vincent Baeten ◽  
Juan-Antonio Fernandez Pierna ◽  
...  

Abstract Spectroscopic techniques combined with chemometrics are a promising tool for analysis of seized drug powders. In this study, the performance of three spectroscopic techniques [Mid-InfraRed (MIR), Raman and Near-InfraRed (NIR)] was compared. In total, 364 seized powders were analyzed and consisted of 276 cocaine powders (with concentrations ranging from 4 to 99 w%) and 88 powders without cocaine. A classification model (using Support Vector Machines [SVM] discriminant analysis) and a quantification model (using SVM regression) were constructed with each spectral dataset in order to discriminate cocaine powders from other powders and quantify cocaine in powders classified as cocaine positive. The performances of the models were compared with gas chromatography coupled with mass spectrometry (GC–MS) and gas chromatography with flame-ionization detection (GC–FID). Different evaluation criteria were used: number of false negatives (FNs), number of false positives (FPs), accuracy, root mean square error of cross-validation (RMSECV) and determination coefficients (R2). Ten colored powders were excluded from the classification data set due to fluorescence background observed in Raman spectra. For the classification, the best accuracy (99.7%) was obtained with MIR spectra. With Raman and NIR spectra, the accuracy was 99.5% and 98.9%, respectively. For the quantification, the best results were obtained with NIR spectra. The cocaine content was determined with a RMSECV of 3.79% and a R2 of 0.97. The performance of MIR and Raman to predict cocaine concentrations was lower than NIR, with RMSECV of 6.76% and 6.79%, respectively and both with a R2 of 0.90. The three spectroscopic techniques can be applied for both classification and quantification of cocaine, but some differences in performance were detected. The best classification was obtained with MIR spectra. For quantification, however, the RMSECV of MIR and Raman was twice as high in comparison with NIR. Spectroscopic techniques combined with chemometrics can reduce the workload for confirmation analysis (e.g., chromatography based) and therefore save time and resources.


2020 ◽  
pp. 1-14
Author(s):  
Esraa Hassan ◽  
Noha A. Hikal ◽  
Samir Elmuogy

Nowadays, Coronavirus (COVID-19) considered one of the most critical pandemics in the earth. This is due its ability to spread rapidly between humans as well as animals. COVID_19 expected to outbreak around the world, around 70 % of the earth population might infected with COVID-19 in the incoming years. Therefore, an accurate and efficient diagnostic tool is highly required, which the main objective of our study. Manual classification was mainly used to detect different diseases, but it took too much time in addition to the probability of human errors. Automatic image classification reduces doctors diagnostic time, which could save human’s life. We propose an automatic classification architecture based on deep neural network called Worried Deep Neural Network (WDNN) model with transfer learning. Comparative analysis reveals that the proposed WDNN model outperforms by using three pre-training models: InceptionV3, ResNet50, and VGG19 in terms of various performance metrics. Due to the shortage of COVID-19 data set, data augmentation was used to increase the number of images in the positive class, then normalization used to make all images have the same size. Experimentation is done on COVID-19 dataset collected from different cases with total 2623 where (1573 training,524 validation,524 test). Our proposed model achieved 99,046, 98,684, 99,119, 98,90 In terms of Accuracy, precision, Recall, F-score, respectively. The results are compared with both the traditional machine learning methods and those using Convolutional Neural Networks (CNNs). The results demonstrate the ability of our classification model to use as an alternative of the current diagnostic tool.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2111
Author(s):  
Bo-Wei Zhao ◽  
Zhu-Hong You ◽  
Lun Hu ◽  
Zhen-Hao Guo ◽  
Lei Wang ◽  
...  

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.


2021 ◽  
Vol 30 (1) ◽  
pp. 893-902
Author(s):  
Ke Xu

Abstract A portrait recognition system can play an important role in emergency evacuation in mass emergencies. This paper designed a portrait recognition system, analyzed the overall structure of the system and the method of image preprocessing, and used the Single Shot MultiBox Detector (SSD) algorithm for portrait detection. It also designed an improved algorithm combining principal component analysis (PCA) with linear discriminant analysis (LDA) for portrait recognition and tested the system by applying it in a shopping mall to collect and monitor the portrait and establish a data set. The results showed that the missing detection rate and false detection rate of the SSD algorithm were 0.78 and 2.89%, respectively, which were lower than those of the AdaBoost algorithm. Comparisons with PCA, LDA, and PCA + LDA algorithms demonstrated that the recognition rate of the improved PCA + LDA algorithm was the highest, which was 95.8%, the area under the receiver operating characteristic curve was the largest, and the recognition time was the shortest, which was 465 ms. The experimental results show that the improved PCA + LDA algorithm is reliable in portrait recognition and can be used for emergency evacuation in mass emergencies.


Author(s):  
Hiroyuki Kurosu ◽  
Yukiharu Todo ◽  
Ryutaro Yamada ◽  
Kaoru Minowa ◽  
Tomohiko Tsuruta ◽  
...  

Abstract Objective The aim of this study was to find a clinical marker for identifying refractory cancer cachexia. Methods We analyzed computed tomography imaging data, which included the third lumbar vertebra, from 94 patients who died of uterine cervix or corpus malignancy. The time between the date of examination and date of death was the most important attribute for this study, and the computed tomography images were classified into >3 months before death and ≤ 3 months before death. Psoas muscle mass index was defined as the left–right sum of the psoas muscle areas (cm2) at the level of third lumbar vertebra, divided by height squared (m2). Results A data set of 94 computed tomography images was obtained at baseline hospital visit, and a data set of 603 images was obtained at other times. One hundred (16.6%) of the 603 non-baseline images were scanned ≤3 months before death. Mean psoas muscle mass index change rates at >3 months before death and ≤3 months before death were −1.3 and −20.1%, respectively (P < 0.001). Receiver operating characteristic curve analysis yielded a cutoff value of −13.0%. The area under the curve reached a moderate accuracy level (0.777, 95% confidence interval 0.715–0.838). When we used the cutoff value to predict death within 3 months, sensitivity and specificity were 74.0 and 82.1%, respectively. Conclusions Measuring change in psoas muscle mass index might be useful for predicting cancer mortality within 3 months. It could become a potential tool for identifying refractory cancer cachexia.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Jian-ye Yuan ◽  
Xin-yuan Nan ◽  
Cheng-rong Li ◽  
Le-le Sun

Considering that the garbage classification is urgent, a 23-layer convolutional neural network (CNN) model is designed in this paper, with the emphasis on the real-time garbage classification, to solve the low accuracy of garbage classification and recycling and difficulty in manual recycling. Firstly, the depthwise separable convolution was used to reduce the Params of the model. Then, the attention mechanism was used to improve the accuracy of the garbage classification model. Finally, the model fine-tuning method was used to further improve the performance of the garbage classification model. Besides, we compared the model with classic image classification models including AlexNet, VGG16, and ResNet18 and lightweight classification models including MobileNetV2 and SuffleNetV2 and found that the model GAF_dense has a higher accuracy rate, fewer Params, and FLOPs. To further check the performance of the model, we tested the CIFAR-10 data set and found the accuracy rates of the model (GAF_dense) are 0.018 and 0.03 higher than ResNet18 and SufflenetV2, respectively. In the ImageNet data set, the accuracy rates of the model (GAF_dense) are 0.225 and 0.146 higher than Resnet18 and SufflenetV2, respectively. Therefore, the garbage classification model proposed in this paper is suitable for garbage classification and other classification tasks to protect the ecological environment, which can be applied to classification tasks such as environmental science, children’s education, and environmental protection.


Sign in / Sign up

Export Citation Format

Share Document