scholarly journals Estimating the S-N Curve by Machine Learning Random Forest Method

2021 ◽  
Vol 70 (12) ◽  
pp. 876-880
Author(s):  
Nobuo NAGASHIMA ◽  
Masa HAYAKAWA ◽  
Hiroyuku MASUDA ◽  
Kotobu NAGAI
Data ◽  
2018 ◽  
Vol 4 (1) ◽  
pp. 5 ◽  
Author(s):  
Lyudmyla Kirichenko ◽  
Tamara Radivilova ◽  
Vitalii Bulakh

The article presents a novel method of fractal time series classification by meta-algorithms based on decision trees. The classification objects are fractal time series. For modeling, binomial stochastic cascade processes are chosen. Each class that was singled out unites model time series with the same fractal properties. Numerical experiments demonstrate that the best results are obtained by the random forest method with regression trees. A comparative analysis of the classification approaches, based on the random forest method, and traditional estimation of self-similarity degree are performed. The results show the advantage of machine learning methods over traditional time series evaluation. The results were used for detecting denial-of-service (DDoS) attacks and demonstrated a high probability of detection.


2021 ◽  
Author(s):  
Motohisa Yamamoto ◽  
Masanori Nojima ◽  
Ryuta Kamekura ◽  
Akiko Kuribara-Souta ◽  
Masaaki Uehara ◽  
...  

Abstract Introduction: To eliminate the disparity and maldistribution of physicians and medical specialty services, the development of diagnostic support for rare diseases using artificial intelligence is being promoted. Immunoglobulin G4 (IgG4)-related disease (IgG4-RD) is a rare disorder often requiring special knowledge and experience to diagnose. In this study, we investigated the possibility of differential diagnosis of IgG4-RD based on basic patient characteristics and blood test findings using machine learning. Methods Six-hundred and two patients with IgG4-RD and 212 patients with non-IgG4-RD that needed to be differentiated who visited the participating institutions were included in the study. Ten percent of the subjects were randomly excluded as a validation sample. Among the remaining cases, 80% were used as training samples, and the remaining 20% were used as test samples. Finally, validation was performed on the validation sample. The analysis was performed using a decision tree and a random forest model. Furthermore, a comparison was made between conditions with and without the serum IgG4 concentration. Accuracy was evaluated using the area under the receiver-operating characteristic (AUROC) curve. Results In diagnosing IgG4-RD, AUROC curve values of the decision tree and the random forest method were 0.905 and 0.970, respectively, when serum IgG4 levels were included in the analysis. Excluding serum IgG4 levels, the AUROC curve value of the analysis by the random forest method was 0.919. Conclusion Based on machine learning in a multicenter collaboration, with or without serum IgG4 data, basic patient characteristics and blood test findings alone were sufficient to differentiate IgG4-RD from non-IgG4-RD.


2018 ◽  
Vol 3 ◽  
pp. 131 ◽  
Author(s):  
Mathupanee Oonsivilai ◽  
Yin Mo ◽  
Nantasit Luangasanatip ◽  
Yoel Lubell ◽  
Thyl Miliya ◽  
...  

Background: Early and appropriate empiric antibiotic treatment of patients suspected of having sepsis is associated with reduced mortality. The increasing prevalence of antimicrobial resistance reduces the efficacy of empiric therapy guidelines derived from population data. This problem is particularly severe for children in developing country settings. We hypothesized that by applying machine learning approaches to readily collect patient data, it would be possible to obtain individualized predictions for targeted empiric antibiotic choices. Methods and Findings: We analysed blood culture data collected from a 100-bed children's hospital in North-West Cambodia between February 2013 and January 2016. Clinical, demographic and living condition information was captured with 35 independent variables. Using these variables, we used a suite of machine learning algorithms to predict Gram stains and whether bacterial pathogens could be treated with common empiric antibiotic regimens: i) ampicillin and gentamicin; ii) ceftriaxone; iii) none of the above. 243 patients with bloodstream infections were available for analysis. We found that the random forest method had the best predictive performance overall as assessed by the area under the receiver operating characteristic curve (AUC). The random forest method gave an AUC of 0.80 (95%CI 0.66-0.94) for predicting susceptibility to ceftriaxone, 0.74 (0.59-0.89) for susceptibility to ampicillin and gentamicin, 0.85 (0.70-1.00) for susceptibility to neither, and 0.71 (0.57-0.86) for Gram stain result. Most important variables for predicting susceptibility were time from admission to blood culture, patient age, hospital versus community-acquired infection, and age-adjusted weight score. Conclusions: Applying machine learning algorithms to patient data that are readily available even in resource-limited hospital settings can provide highly informative predictions on antibiotic susceptibilities to guide appropriate empiric antibiotic therapy. When used as a decision support tool, such approaches have the potential to improve targeting of empiric therapy, patient outcomes and reduce the burden of antimicrobial resistance.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 3749-3749
Author(s):  
Youngil Koh ◽  
SuYeon Lee ◽  
Hong-Seok Yun ◽  
Sung-Soo Yoon ◽  
Inho Kim ◽  
...  

Abstract Introduction: ADAMTS13 activity level is crucial for differentiating thrombotic microangiopathies. However, ADAMTS13 testing is not readily available at site in many parts of the world. Hence, we developed an innovative algorithm that allow differentiation of thrombotic thrombocytopenic purpura (TTP) from other TMA's based on laboratory results other than ADAMTS13 using machine learning. Methods: Two hundred- eight adult patients with either TTP (N=64) or TMA other than TTP (N=144) (ADAMTS13 cutoff level of 10%) were classified using three machine learning techniques (decision tree, random forest, and neural network), using a set of easily measured 19 clinical variables such as fever, Hb, ALT and so on. Basically, each clinical variable is not correlated with TTP (Absolute values of correlation coefficients are lower than 0.5), so we applied machine learning algorithms. First, we divided patient data into three parts, train, test and validation set. And then, we applied these 3 machine learning techniques, decision tree, random forest and neural network. Principal component analysis was also performed. Results: As a single variable, platelet count, BUN and total bilirubin were the most important three variables that are predictive of differentiating TTP from other TMA's with accuracy of 82%. Random forest method increased accuracy to 85% and precision, and recall statistic is 0.828, and 0.832, respectively. Neural network did not do better without optimization than random forest method. Conclusion: Machine learning technology seems promising in differentiating TTP from other TMA's if ADAMTS13 value is not available. These algorithms could support the physician in tailoring the management of TMA. Correlation coefficient in our study Correlation coefficient in our study Scheme of Random Forest method used in our study Scheme of Random Forest method used in our study Disclosures Lee: SamsungSDS: Employment. Yun:Samsung SDS: Employment.


2019 ◽  
Vol 2019 ◽  
pp. 1-8 ◽  
Author(s):  
Susan Thapa ◽  
Lori A. Fischbach ◽  
Robert Delongchamp ◽  
Mohammed F. Faramawi ◽  
Mohammed S. Orloff

Background. Gastric cancer is the fourth most common cancer and the third most common cause of cancer deaths worldwide. Morbidity and mortality from gastric cancer may be decreased by identification of those that are at high risk for progression in the gastric precancerous process so that they can be monitored over time for early detection and implementation of preventive strategies. Method. Using machine learning, we developed prediction models for gastric precancerous progression in a population from a developing country with a high rate of gastric cancer who underwent gastroscopies for dyspeptic symptoms. In the data imputed for completeness, we divided the data into a training and a validation test set. Using the training set, we used the random forest method to rank potential predictors based on their predictive importance. Using predictors identified by the random forest method, we conducted best subset linear regressions with the leave-one-out cross-validation approach to select predictors for overall progression and progression to dysplasia or cancer. We validated the models in the test set using leave-one-out cross-validation. Results. We observed for all models that complete intestinal metaplasia and incomplete intestinal metaplasia were the strongest predictors for further progression in the precancerous process. We also observed that a diagnosis of no gastritis, superficial gastritis, or antral diffuse gastritis at baseline was a predictor of no progression in the gastric precancerous process. The sensitivities and specificities were 86% and 79% for the general model and 100% and 82% for the location-specific model, respectively. Conclusion. We developed prediction models to identify gastroscopy patients that are more likely to progress in the gastric precancerous process, among whom routine follow-up gastroscopies can be targeted to prevent gastric cancer. Future external validation is needed.


Author(s):  
Jun Pei ◽  
Zheng Zheng ◽  
Hyunji Kim ◽  
Lin Song ◽  
Sarah Walworth ◽  
...  

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>


2020 ◽  
Vol 27 (6) ◽  
pp. 37-55
Author(s):  
E. V. Zarova ◽  
E. I. Dubravskaya

The topic of quantitative research on informal employment has a consistently high relevance both in the Russian Federation and in other countries due to its high dependence on cyclicality and crisis stages in economic dynamics of countries with any level of economic development. Developing effective government policy measures to overcome the negative impact of informal employment requires special attention in theoretical and applied research to assessing the factors and conditions of informal employment in the Russian Federation including at the regional level. Such effects of informal employment as a shortfall in taxes, potential losses in production efficiency, and negative social consequences are a concern for the authorities of the federal and regional levels. Development of quantitative indicators to determine the level of informal employment in the regions, taking into account their specifics in the general spatial and economic system of Russia are necessary to overcome these negative effects. The article proposes and tests methods for solving the problem of assessing the impact of hierarchical relationships on macroeconomic factors at the regional level of informal employment in constituent entities of the Russian Federation. Majority of the works on the study of informal employment are based on basic statistical methods of spatial-dynamic analysis, as well as on the now «traditional» methods of cluster and correlation-regression analysis. Without diminishing the merits of these methods, it should be noted that they are somewhat limited in identifying hidden structural connections and interdependencies in such a complex multidimensional phenomenon as informal employment. In order to substantiate the possibility of overcoming these limitations, the article proposes indicators of regional statistics that directly and indirectly characterize informal employment and also presents the possibilities of using the «random forest» method to identify groups of constituent entities of the Russian Federation that have similar macroeconomic factors of informal employment. The novelty of this method in terms of research objectives is that it allows one to assess the impact of macroeconomic indicators of regional development on the level of informal employment, taking into account the implicit, not predetermined by the initial hypotheses, hierarchical relationships of factor indicators. Based on the generalization of the studies presented in the literature, as well as the authors’ statistical calculations using Rosstat data, the authors came to the conclusion about the high importance of macroeconomic parameters of regional development and systemic relationships of macroeconomic indicators in substantiating the differentiation of the informal level across the constituent entities of the Russian Federation.


2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.


2020 ◽  
Vol 15 (2) ◽  
pp. 121-134 ◽  
Author(s):  
Eunmi Kwon ◽  
Myeongji Cho ◽  
Hayeon Kim ◽  
Hyeon S. Son

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


Sign in / Sign up

Export Citation Format

Share Document