A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides

AbstractOwing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906–0.910) and 2–17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.

Download Full-text

A Review on the Meandering of Wind Turbine Wakes

Energies ◽

10.3390/en12244725 ◽

2019 ◽

Vol 12 (24) ◽

pp. 4725 ◽

Cited By ~ 3

Author(s):

Xiaolei Yang ◽

Fotis Sotiropoulos

Keyword(s):

Wind Turbine ◽

Large Scale ◽

Computational Models ◽

Critical Role ◽

Low Frequency ◽

Wind Farms ◽

Future Research ◽

Work Related ◽

Eddy Simulation ◽

Future Research Directions

Meandering describes the large-scale, low frequency motions of wind turbine wakes, which could determine wake recovery rates, impact the loads exerted on turbine structures, and play a critical role in the design and optimal control of wind farms. This paper presents a comprehensive review of previous work related to wake meandering. Emphasis is placed on the origin and characteristics of wake meandering and computational models, including both the dynamic wake meandering models and large-eddy simulation approaches. Future research directions in the field are also discussed.

Download Full-text

Nursing professionals’ attitudes toward use of physical restraints in Styrian nursing homes Austria

Pflege ◽

10.1024/1012-5302/a000649 ◽

2019 ◽

Vol 32 (1) ◽

pp. 57-63

Author(s):

Hannes Mayerl ◽

Tanja Trummer ◽

Erwin Stolz ◽

Éva Rásky ◽

Wolfgang Freidl

Keyword(s):

Nursing Homes ◽

Large Scale ◽

Critical Role ◽

Physical Restraint ◽

Care Practice ◽

Physical Restraints ◽

Convenience Sample ◽

Restraint Use ◽

Positive Attitudes ◽

Nursing Professionals

Abstract. Background: Given that nursing staff play a critical role in the decision regarding use of physical restraints, research has examined nursing professionals’ attitudes toward this practice. Aim: Since nursing professionals’ views on physical restraint use have not yet been examined in Austria to date, we aimed to explore nursing professionals’ attitudes concerning use of physical restraints in nursing homes of Styria (Austria). Method: Data were collected from a convenience sample of nursing professionals (N = 355) within 19 Styrian nursing homes, based on a cross-sectional study design. Attitudes toward the practice of restraint use were assessed by means of the Maastricht Attitude Questionnaire in the German version. Results: The overall results showed rather positive attitudes toward the use of physical restraints, yet the findings regarding the sub-dimensions of the questionnaire were mixed. Although nursing professionals tended to deny “good reasons” for using physical restraints, they evaluated the consequences of physical restraint use rather positive and considered restraint use as an appropriate health care practice. Nursing professionals’ views regarding the consequences of using specific physical restraints further showed that belts were considered as the most restricting and discomforting devices. Conclusions: Overall, Austrian nursing professionals seemed to hold more positive attitudes toward the use of physical restraints than counterparts in other Western European countries. Future nationwide large-scale surveys will be needed to confirm our findings.

Download Full-text

DeepSSPred: A Deep Learning Based Sulfenylation site predictor via a novel n-segmented optimize federated feature encoder

Protein and Peptide Letters ◽

10.2174/0929866527666201202103411 ◽

2020 ◽

Vol 27 ◽

Author(s):

Zaheer Ullah Khan ◽

Dechang Pi

Keyword(s):

Large Scale ◽

Computational Models ◽

Research Work ◽

Training Data ◽

Training Dataset ◽

Validation Dataset ◽

Cytokine Signaling ◽

Minority Class ◽

Independent Dataset ◽

Feature Encoding

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

Download Full-text

Towards Computational Models of Identifying Protein Ubiquitination Sites

Current Drug Targets ◽

10.2174/1389450119666180924150202 ◽

2019 ◽

Vol 20 (5) ◽

pp. 565-578 ◽

Cited By ~ 1

Author(s):

Lidong Wang ◽

Ruijun Zhang

Keyword(s):

Computational Methods ◽

Computational Models ◽

Feature Representation ◽

Biological Sequence ◽

Post Translational Modification ◽

Test Dataset ◽

Protein Ubiquitination ◽

Protein Functions ◽

Independent Test Dataset ◽

Benchmark Datasets

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.

Download Full-text

Development and validation of a deep learning system to screen vision-threatening conditions in high myopia using optical coherence tomography images

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2020-317825 ◽

2020 ◽

pp. bjophthalmol-2020-317825

Author(s):

Yonghao Li ◽

Weibo Feng ◽

Xiujuan Zhao ◽

Bingqian Liu ◽

Yan Zhang ◽

...

Keyword(s):

Optical Coherence Tomography ◽

Deep Learning ◽

High Myopia ◽

Large Scale ◽

Learning System ◽

Youden Index ◽

Optical Coherence ◽

Test Dataset ◽

Independent Test ◽

Independent Test Dataset

Background/aimsTo apply deep learning technology to develop an artificial intelligence (AI) system that can identify vision-threatening conditions in high myopia patients based on optical coherence tomography (OCT) macular images.MethodsIn this cross-sectional, prospective study, a total of 5505 qualified OCT macular images obtained from 1048 high myopia patients admitted to Zhongshan Ophthalmic Centre (ZOC) from 2012 to 2017 were selected for the development of the AI system. The independent test dataset included 412 images obtained from 91 high myopia patients recruited at ZOC from January 2019 to May 2019. We adopted the InceptionResnetV2 architecture to train four independent convolutional neural network (CNN) models to identify the following four vision-threatening conditions in high myopia: retinoschisis, macular hole, retinal detachment and pathological myopic choroidal neovascularisation. Focal Loss was used to address class imbalance, and optimal operating thresholds were determined according to the Youden Index.ResultsIn the independent test dataset, the areas under the receiver operating characteristic curves were high for all conditions (0.961 to 0.999). Our AI system achieved sensitivities equal to or even better than those of retina specialists as well as high specificities (greater than 90%). Moreover, our AI system provided a transparent and interpretable diagnosis with heatmaps.ConclusionsWe used OCT macular images for the development of CNN models to identify vision-threatening conditions in high myopia patients. Our models achieved reliable sensitivities and high specificities, comparable to those of retina specialists and may be applied for large-scale high myopia screening and patient follow-up.

Download Full-text

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Download Full-text

STEM-20. A CANCER STEM CELL-SELECTIVE APOPTOSIS-INDUCING SMALL MOLECULE FOR THE TREATMENT OF GBM

Neuro-Oncology ◽

10.1093/neuonc/noaa215.837 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii200-ii200

Author(s):

Stephen Skirboll ◽

Natasha Lucki ◽

Genaro Villa ◽

Naja Vergani ◽

Michael Bollong ◽

...

Keyword(s):

Stem Cells ◽

Cancer Stem Cells ◽

Small Molecules ◽

Small Molecule ◽

Large Scale ◽

Critical Role ◽

Tumor Formation ◽

Cell Type ◽

Caspase 1 ◽

Activation Assay

Abstract INTRODUCTION Glioblastoma multiforme (GBM) is the most aggressive form of primary brain cancer. A subpopulation of multipotent cells termed GBM cancer stem cells (CSCs) play a critical role in tumor initiation and maintenance, drug resistance, and recurrence following surgery. New therapeutic strategies for the treatment of GBM have recently focused on targeting CSCs. Here we have used an unbiased large-scale screening approach to identify drug-like small molecules that induce apoptosis in GBM CSCs in a cell type-selective manner. METHODS A luciferase-based survival assay of patient-derived GBM CSC lines was established to perform a large-scale screen of ∼one million drug-like small molecules with the goal of identifying novel compounds that are selectively toxic to chemoresistant GBM CSCs. Compounds found to kill GBM CSC lines as compared to control cell types were further characterized. A caspase activation assay was used to evaluate the mechanism of induced cell death. A xenograft animal model using patient-derived GBM CSCs was employed to test the leading candidate for suppression of in vivo tumor formation. RESULTS We identified a small molecule, termed RIPGBM, from the cell-based chemical screen that induces apoptosis in primary patient-derived GBM CSC cultures. The cell type-dependent selectivity of RIPGBM appears to arise at least in part from redox-dependent formation of a proapoptotic derivative, termed cRIPGBM, in GBM CSCs. cRIPGBM induces caspase 1-dependent apoptosis by binding to receptor-interacting protein kinase 2 (RIPK2) and acting as a molecular switch, which reduces the formation of a prosurvival RIPK2/TAK1 complex and increases the formation of a proapoptotic RIPK2/caspase 1 complex. In an intracranial GBM xenograft mouse model, RIPGBM was found to significantly suppress tumor formation. CONCLUSIONS Our chemical genetics-based approach has identified a small molecule drug candidate and a potential drug target that selectively targets cancer stem cells and provides an approach for the treatment of GBMs.

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

Complex Network Modelling of Origin–Destination Commuting Flows for the COVID-19 Epidemic Spread Analysis in Italian Lombardy Region

Applied Sciences ◽

10.3390/app11104381 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4381

Author(s):

Angela Lombardi ◽

Nicola Amoroso ◽

Alfonso Monaco ◽

Sabina Tangaro ◽

Roberto Bellotti

Keyword(s):

Urban Areas ◽

Large Scale ◽

Critical Role ◽

Potential Contribution ◽

Epidemic Spread ◽

Network Modelling ◽

Lombardy Region ◽

Commuting Flows ◽

Dynamic Growth ◽

Spread Analysis

Currently the whole world is affected by the COVID-19 disease. Italy was the first country to be seriously affected in Europe, where the first COVID-19 outbreak was localized in the Lombardy region. The further spreading of the cases led to the lockdown of the most affected regions in northern Italy and then the entire country. In this work we investigated an epidemic spread scenario in the Lombardy region by using the origin–destination matrix with information about the commuting flows among 1450 urban areas within the region. We performed a large-scale simulation-based modeling of the epidemic spread over the networks related to three main motivations, i.e., work, study and occasional transfers to quantify the potential contribution of each category of travellers to the spread of the epidemic process. Our findings outline that the three networks are characterised by different weight dynamic growth rates and that the network “work” has a critical role in the diffusion phenomenon showing the greatest contribution to the epidemic spread.

Download Full-text

Large-Scale Landslide Displacement Rate Prediction Based on Multi-Factor Support Vector Regression Machine

Applied Sciences ◽

10.3390/app11041381 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1381

Author(s):

Xiuzhen Li ◽

Shengwei Li

Keyword(s):

Support Vector Regression ◽

Water Level ◽

Large Scale ◽

Displacement Rate ◽

Support Vector ◽

Single Factor ◽

Reservoir Water ◽

Reservoir Water Level ◽

Depth Analysis ◽

Three Factor

Forecasting the development of large-scale landslides is a contentious and complicated issue. In this study, we put forward the use of multi-factor support vector regression machines (SVRMs) for predicting the displacement rate of a large-scale landslide. The relative relationships between the main monitoring factors were analyzed based on the long-term monitoring data of the landslide and the grey correlation analysis theory. We found that the average correlation between landslide displacement and rainfall is 0.894, and the correlation between landslide displacement and reservoir water level is 0.338. Finally, based on an in-depth analysis of the basic characteristics, influencing factors, and development of landslides, three main factors (i.e., the displacement rate, reservoir water level, and rainfall) were selected to build single-factor, two-factor, and three-factor SVRM models. The key parameters of the models were determined using a grid-search method, and the models showed high accuracies. Moreover, the accuracy of the two-factor SVRM model (displacement rate and rainfall) is the highest with the smallest standard error (RMSE) of 0.00614; it is followed by the three-factor and single-factor SVRM models, the latter of which has the lowest prediction accuracy, with the largest RMSE of 0.01644.

Download Full-text