scholarly journals iSuc-ChiDT: A Computational Method for Identifying Succinylation Sites using Statistical Difference Table Encoding and the Chi-Square Decision Table Classifier

Author(s):  
Ying Zeng ◽  
Yuan Chen ◽  
Zheming Yuan

Abstract BackgroundLysine succinylation is a type of protein post-translational modification which is widely involved in cell differentiation, cell metabolism and other important physiological activities. To study the molecular mechanism of succinylation in depth, succinylation sites need to be accurately identified, and because experimental approaches are costly and time-consuming, there is a great demand for reliable computational methods. Feature extraction is a key step in building succinylation site prediction models, and the development of effective new features improves predictive accuracy. Because the number of false succinylation sites far exceeds that of true sites, traditional classifiers perform poorly, and designing a classifier to effectively handle highly imbalanced datasets has always been a challenge.ResultsWe propose a new computational method, iSuc-ChiDT, to identify succinylation sites in proteins. In iSuc-ChiDT, chi-square statistical difference table encoding is developed to extract positional features, and has the highest predictive accuracy and fewest features compared to binary encoding and physicochemical property encoding. The chi-square decision table (ChiDT) classifier is designed to implement imbalanced pattern classification. With a training set of 4748:50,551(true: false sites), independent tests showed that ChiDT significantly outperformed traditional classifiers (including random forest, artificial neural network and relaxed variable kernel density estimator) in predictive accuracy and only taking 17s. Using an independent testing set of experimentally identified succinylation sites, iSuc-ChiDT achieved sensitivity of 70.47%, specificity of 66.27%, Matthews correlation coefficient of 0.205, and a global accuracy index Q9 of 0.683, showing a significant improvement in sensitivity and overall accuracy compared to PSuccE, Success, SuccinSite and other existing succinylation site predictors. ConclusionsiSuc-ChiDT shows great promise in predicting succinylation sites and is expected to facilitate further experimental investigation of protein succinylation.

2020 ◽  
Vol 26 (33) ◽  
pp. 4195-4205
Author(s):  
Xiaoyu Ding ◽  
Chen Cui ◽  
Dingyan Wang ◽  
Jihui Zhao ◽  
Mingyue Zheng ◽  
...  

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i787-i794
Author(s):  
Gian Marco Messa ◽  
Francesco Napolitano ◽  
Sarah H. Elsea ◽  
Diego di Bernardo ◽  
Xin Gao

Abstract Motivation Untargeted metabolomic approaches hold a great promise as a diagnostic tool for inborn errors of metabolisms (IEMs) in the near future. However, the complexity of the involved data makes its application difficult and time consuming. Computational approaches, such as metabolic network simulations and machine learning, could significantly help to exploit metabolomic data to aid the diagnostic process. While the former suffers from limited predictive accuracy, the latter is normally able to generalize only to IEMs for which sufficient data are available. Here, we propose a hybrid approach that exploits the best of both worlds by building a mapping between simulated and real metabolic data through a novel method based on Siamese neural networks (SNN). Results The proposed SNN model is able to perform disease prioritization for the metabolic profiles of IEM patients even for diseases that it was not trained to identify. To the best of our knowledge, this has not been attempted before. The developed model is able to significantly outperform a baseline model that relies on metabolic simulations only. The prioritization performances demonstrate the feasibility of the method, suggesting that the integration of metabolic models and data could significantly aid the IEM diagnosis process in the near future. Availability and implementation Metabolic datasets used in this study are publicly available from the cited sources. The original data produced in this study, including the trained models and the simulated metabolic profiles, are also publicly available (Messa et al., 2020).


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Menelaos Pavlou ◽  
Gareth Ambler ◽  
Rumana Z. Omar

Abstract Background Clustered data arise in research when patients are clustered within larger units. Generalised Estimating Equations (GEE) and Generalised Linear Models (GLMM) can be used to provide marginal and cluster-specific inference and predictions, respectively. Methods Confounding by Cluster (CBC) and Informative cluster size (ICS) are two complications that may arise when modelling clustered data. CBC can arise when the distribution of a predictor variable (termed ‘exposure’), varies between clusters causing confounding of the exposure-outcome relationship. ICS means that the cluster size conditional on covariates is not independent of the outcome. In both situations, standard GEE and GLMM may provide biased or misleading inference, and modifications have been proposed. However, both CBC and ICS are routinely overlooked in the context of risk prediction, and their impact on the predictive ability of the models has been little explored. We study the effect of CBC and ICS on the predictive ability of risk models for binary outcomes when GEE and GLMM are used. We examine whether two simple approaches to handle CBC and ICS, which involve adjusting for the cluster mean of the exposure and the cluster size, respectively, can improve the accuracy of predictions. Results Both CBC and ICS can be viewed as violations of the assumptions in the standard GLMM; the random effects are correlated with exposure for CBC and cluster size for ICS. Based on these principles, we simulated data subject to CBC/ICS. The simulation studies suggested that the predictive ability of models derived from using standard GLMM and GEE ignoring CBC/ICS was affected. Marginal predictions were found to be mis-calibrated. Adjusting for the cluster-mean of the exposure or the cluster size improved calibration, discrimination and the overall predictive accuracy of marginal predictions, by explaining part of the between cluster variability. The presence of CBC/ICS did not affect the accuracy of conditional predictions. We illustrate these concepts using real data from a multicentre study with potential CBC. Conclusion Ignoring CBC and ICS when developing prediction models for clustered data can affect the accuracy of marginal predictions. Adjusting for the cluster mean of the exposure or the cluster size can improve the predictive accuracy of marginal predictions.


2021 ◽  
Author(s):  
Madiha M Abas ◽  
Shukir Saleem Hasan

Abstract Background and objectives: Colostrum is the first breastfeed which is a product by the mother. it has containing high amounts and concentrations of nutrients and antibodies. Methods: A comparative, cross-sectional study was conducted in different areas in Erbil Governorate. During the period started on 2nd Jan. to the end of May. 2019. Non- probability of 400 mothers who delivered their baby by normal vaginal delivery in the hospitals were recruited. A special tool was constructed by researchers, a direct face-to-face interview was adapted. Data were collected and interpreted to the computer. A special SPSS software version 23 was used for analyzing the data, frequency, chi-square, and two tailed t-test statistical analysis was applied for the study.Results: Incorrect knowledge among Erbil city and Koy-Sanjaq city was observed, with better information among Shaqlawa mothers. Poor practices of colostrum feeding among all mothers, and found a statistically significant association between mother’s knowledge, and relay to an association between mothers’ practices. Statistically significant differences were found between Erbil city and Koy-Sanjaq city; between Shaqlawa city and Koy-Sanjaq city concerning mothers’ practices respectively, with non-statistically differences between Erbil and Shaqlawa mothers, also non-statistically significant differences between mothers’ practices in Erbil and Shaqlawa city. There are statistically differences between mothers in Erbil, Koy-Sanjaq, and between Erbil and Shaqlawa with no statistical difference between Koy-Sanjaq and Shaqlawa mothers regarding knowledge. Conclusions: The mothers in Shaqlawa city had better knowledge and all three districts were having poor practices regarding colostrum feeding.


2020 ◽  
Vol 24 ◽  
pp. 435-453
Author(s):  
Mickael Albertus

The raking-ratio method is a statistical and computational method which adjusts the empirical measure to match the true probability of sets of a finite partition. The asymptotic behavior of the raking-ratio empirical process indexed by a class of functions is studied when the auxiliary information is given by estimates. These estimates are supposed to result from the learning of the probability of sets of partitions from another sample larger than the sample of the statistician, as in the case of two-stage sampling surveys. Under some metric entropy hypothesis and conditions on the size of the information source sample, the strong approximation of this process and in particular the weak convergence are established. Under these conditions, the asymptotic behavior of the new process is the same as the classical raking-ratio empirical process. Some possible statistical applications of these results are also given, like the strengthening of the Z-test and the chi-square goodness of fit test.


2021 ◽  
Vol 14 (3) ◽  
pp. 1707
Author(s):  
Heitor Carvalho Lacerda ◽  
André Luiz Lopes De Faria ◽  
Humberto Paiva Fonseca ◽  
Marco Antônio Saraiva Silva ◽  
Wesley Oliveira Soares ◽  
...  

O estudo da susceptibilidade a erosão laminar é pertinente na mesorregião da Zona da Mata de Minas Gerais, visto a predominância da cobertura de pastagem e pela expressiva degradação do solo. Neste estudo, objetivou-se compreender quais variáveis geodinâmicas são importantes na predição dos processos erosivos laminares e o melhor modelo preditivo entre oito, através de comparações multicritérios, possibilitando entender o fenômeno em uma bacia hidrográfica da mesorregião. Assim, utilizou-se o método de atribuição de notas pela Literatura (L) e Realidade de campo (RC), cuja ponderação de parcela dos processos erosivos (60%) laminares mapeados ponderou a nota das classes das variáveis pela área das mesmas. A integração das variáveis foi por testes de ponderação e integração total e parcial. A avaliação dos modelos gerados foi por estatística descritiva (Box-Plot), diferentes métodos de categorização (Manual, Natural Breaks e Geometrical Interval) e curva ROC com cálculo de eficiência AUC (40% das erosões mapeadas). Os resultados apontaram que a falta umidade é um fator importante para a ocorrência dos processos erosivos laminares, por outro lado, as variáveis morfométricas não foram importantes para a predição. Modelos baseados na RC (72,41% AUC médio) obteve eficiência consideravelmente maior do que a L (65,41% AUC médio), já quando comparado a integração de todas as variáveis geodinâmicas e somente as mais importantes e quando integrado com ponderação e sem ponderação, não houve considerável diferença estatística. O modelo mais eficiente obteve 76,3% AUC, considerado boa e estava adequado a realidade da área estudada.   Study of Susceptibility to Sheet Erosion in a Watershed in Zona da Mata, Minas Gerais, BrazilABSTRACTThe study of susceptibility to surface erosion is relevant in the mesoregion of the Zona da Mata of Minas Gerais, given the predominance of pasture cover, the significant degradation of the soil and the stagnation of the agricultural sector. In this study, the objective was to understand which geodynamic variables are important in the prediction of surface erosive processes and the best predictive model among eight, through multicriteria comparisons, making it possible to understand the phenomenon in a watershed in the mesoregion. Thus, it was used the method of attributing grades by Literature (L) and Field Reality (RC), whose weighting of the mapped surface erosive (60%) processes weighted the grade of the variable classes by their area. The integration of the variables was through weighting tests and total and partial integration. The evaluation of the models generated was by descriptive statistics (Box-Plot), different methods of categorization (Manual, Natural Breaks and Geometrical Interval) and ROC curve with AUC efficiency calculation (40% of the mapped erosions). The results showed that the lack of moisture is an important factor for the occurrence of surface erosive processes, on the other hand, the morphometric variables were not important for the prediction. Models based on RC (72.41% average AUC) achieved considerably greater efficiency than L (65.41% average AUC), when compared to the integration of all geodynamic variables and only the most important ones and when integrated with weighting and without weighting, there was no considerable statistical difference. The most efficient model obtained 76.3% AUC, considered good and was adequate to the reality of the studied area.Key words: Geotechnologies; Comparison of Risk Models; Multicriteria Analysis


2021 ◽  
Vol 129 ◽  
pp. 03031
Author(s):  
Maria Truchlikova

Research background: Predicting and assessing financial health should be one of the most important activities for each business especially in context of turbulent business environment and global economy. The financial sustainability of family businesses has a direct and significant influence on the development and growth of the economy because they still represent the backbone of the economy and play an important role in national economies worldwide accounting. Purpose of the article: We used in this article the financial distress and bankruptcy prediction models for assessing financial status of family businesses in agricultural sector. The aim of the paper is to compare models developed by using three different methods to identify a model with the highest predictive accuracy of financial distress and assess financial health. Methods: The data was obtained from Finstat database. For assessing the financial health of selected family businesses bankruptcy models were used: Chrastinova’s CH-Index, Gurcik’s G-Index (defined for Slovak agricultural enterprises) and Altman Z-score. Findings & Value added: This article summarizes existing models and compares results of assessing financial health of family businesses using three different models.


Author(s):  
Yazan Alnsour ◽  
Rassule Hadidi ◽  
Neetu Singh

Predictive analytics can be used to anticipate the risks associated with some patients, and prediction models can be employed to alert physicians and allow timely proactive interventions. Recently, health care providers have been using different types of tools with prediction capabilities. Sepsis is one of the leading causes of in-hospital death in the United States and worldwide. In this study, the authors used a large medical dataset to develop and present a model that predicts in-hospital mortality among Sepsis patients. The predictive model was developed using a dataset of more than one million records of hospitalized patients. The independent predictors of in-hospital mortality were identified using the chi-square automatic interaction detector. The authors found that adding hospital attributes to the predictive model increased the accuracy from 82.08% to 85.3% and the area under the curve from 0.69 to 0.84, which is favorable compared to using only patients' attributes. The authors discuss the practical and research contributions of using a predictive model that incorporates both patient and hospital attributes in identifying high-risk patients.


2013 ◽  
Vol 07 (S 01) ◽  
pp. S099-S104 ◽  
Author(s):  
Sezer Demirbuga ◽  
Oznur Tuncay ◽  
Kenan Cantekin ◽  
Muhammed Cayabatmaz ◽  
Asiye Nur Dincer ◽  
...  

ABSTRACT Objectives: The objective of this study is to evaluate the frequency and distribution of early tooth loss and endodontic treatment needs of permanent first molars in a Turkish pediatric population. Materials and Methods: A total of 7,895 panoramic radiographs taken for routine dental examination at the Department of Oral Maxillofacial Radiology between 2008 and 2012 years were investigated. Two independent specialists evaluated early tooth loss and endodontic treatment needs of permanent first molars using panoramic radiography and patient anamnesis forms. The teeth were classified according to the following data: (a) Missing teeth, (b) teeth requiring extraction, (c) endodontically treated teeth (ETT), (d) teeth requiring endodontic therapy. The data also classified according to four factors: Age group (6-12 and 13-16), gender (boy and girl), jaw (mandible and maxilla) and side (right and left). A Chi-square test was used for statistical analyses. Results: A total of 19,488 and 12,092 teeth were evaluated in the child group and adolescent group respectively. All data were higher in adolescents than children (p < 0.001). For gender factor, only ETT was higher in girls than it was in boys (p < 0.001). For the jaw factor, all data were higher (p < 0.001) in mandible than in the maxilla. For the side factor, no statistical difference existed between right and left. Conclusions: Early tooth loss and endodontic treatment needs of permanent first molars showed variability according to age groups and jaws. When the results were compared according to the side and gender factors, no statistical difference was found (p > 0.05) except with the data of ETT in gender groups.


2013 ◽  
Vol 3 (2) ◽  
pp. 32-35 ◽  
Author(s):  
Apeksha Mainali

Introduction: Intra-oral and extra-oral tissues are at risk of damage during orthodontic treatment, most commonly oral ulcerations. Clinicians should assess and monitor every aspect of patient’s treatment procedure to achieve an uneventful and successful final result. Objective: To evaluate occurrence of oral ulcerations in patients undergoing orthodontic treatment. To evaluate the most common type of ulceration and to assess the management of such ulcerations by the orthodontists. Materials & Method: A questionnaire-based study was used among Nepalese and international orthodontists. Data were analyzed statistically using descriptive analysis and Chi-square test, p<0.05 was considered to be significant with a confidence interval of 95%. Result: Most common oral ulceration encountered during orthodontic treatment was traumatic ulceration which was managed by symptomatic measures. There was a statistically significant difference in the method of education to the patients among national and international orthodontists. Conclusion: Careful use of instruments, careful fitting and adjustment of the appliances should be done to avoid oral ulcerations during orthodontic treatment. Topical medicines can be used for management of such ulcers. Nepalese orthodontists should focus on using audio-visual aids for patient education as it has great promise in enhancing patient understanding and in prompting behavioral change.  


Sign in / Sign up

Export Citation Format

Share Document