Regression, Classification and Ensemble Machine Learning Approaches to Forecasting Clinical Outcomes in Ischemic Stroke

Author(s):  
Ahmedul Kabir ◽  
Carolina Ruiz ◽  
Sergio A. Alvarez ◽  
Majaz Moonis
Author(s):  
Anthony D. McDonald ◽  
Thomas K. Ferris ◽  
Tyler A. Wiener

Objective The objective of this study was to analyze a set of driver performance and physiological data using advanced machine learning approaches, including feature generation, to determine the best-performing algorithms for detecting driver distraction and predicting the source of distraction. Background Distracted driving is a causal factor in many vehicle crashes, often resulting in injuries and deaths. As mobile devices and in-vehicle information systems become more prevalent, the ability to detect and mitigate driver distraction becomes more important. Method This study trained 21 algorithms to identify when drivers were distracted by secondary cognitive and texting tasks. The algorithms included physiological and driving behavioral input processed with a comprehensive feature generation package, Time Series Feature Extraction based on Scalable Hypothesis tests. Results Results showed that a Random Forest algorithm, trained using only driving behavior measures and excluding driver physiological data, was the highest-performing algorithm for accurately classifying driver distraction. The most important input measures identified were lane offset, speed, and steering, whereas the most important feature types were standard deviation, quantiles, and nonlinear transforms. Conclusion This work suggests that distraction detection algorithms may be improved by considering ensemble machine learning algorithms that are trained with driving behavior measures and nonstandard features. In addition, the study presents several new indicators of distraction derived from speed and steering measures. Application Future development of distraction mitigation systems should focus on driver behavior–based algorithms that use complex feature generation techniques.


Author(s):  
Rowland W. Pettit ◽  
Robert Fullem ◽  
Chao Cheng ◽  
Christopher I. Amos

AI is a broad concept, grouping initiatives that use a computer to perform tasks that would usually require a human to complete. AI methods are well suited to predict clinical outcomes. In practice, AI methods can be thought of as functions that learn the outcomes accompanying standardized input data to produce accurate outcome predictions when trialed with new data. Current methods for cleaning, creating, accessing, extracting, augmenting, and representing data for training AI clinical prediction models are well defined. The use of AI to predict clinical outcomes is a dynamic and rapidly evolving arena, with new methods and applications emerging. Extraction or accession of electronic health care records and combining these with patient genetic data is an area of present attention, with tremendous potential for future growth. Machine learning approaches, including decision tree methods of Random Forest and XGBoost, and deep learning techniques including deep multi-layer and recurrent neural networks, afford unique capabilities to accurately create predictions from high dimensional, multimodal data. Furthermore, AI methods are increasing our ability to accurately predict clinical outcomes that previously were difficult to model, including time-dependent and multi-class outcomes. Barriers to robust AI-based clinical outcome model deployment include changing AI product development interfaces, the specificity of regulation requirements, and limitations in ensuring model interpretability, generalizability, and adaptability over time.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 137
Author(s):  
Supatcha Lertampaiporn ◽  
Tayvich Vorapreeda ◽  
Apiradee Hongsthong ◽  
Chinae Thammarongtham

Antimicrobial peptides (AMPs) are natural peptides possessing antimicrobial activities. These peptides are important components of the innate immune system. They are found in various organisms. AMP screening and identification by experimental techniques are laborious and time-consuming tasks. Alternatively, computational methods based on machine learning have been developed to screen potential AMP candidates prior to experimental verification. Although various AMP prediction programs are available, there is still a need for improvement to reduce false positives (FPs) and to increase the predictive accuracy. In this work, several well-known single and ensemble machine learning approaches have been explored and evaluated based on balanced training datasets and two large testing datasets. We have demonstrated that the developed program with various predictive models has high performance in differentiating between AMPs and non-AMPs. Thus, we describe the development of a program for the prediction and recognition of AMPs using MaxProbVote, which is an ensemble model. Moreover, to increase prediction efficiency, the ensemble model was integrated with a new hybrid feature based on logistic regression. The ensemble model integrated with the hybrid feature can effectively increase the prediction sensitivity of the developed program called Ensemble-AMPPred, resulting in overall improvements in terms of both sensitivity and specificity compared to those of currently available programs.


2020 ◽  
Author(s):  
Yulan Liang ◽  
Amin Gharipour ◽  
Erik Kelemen ◽  
Arpad Kelemen

Abstract Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases. Diverse sets of computational tools were developed for omics data reductions and protein selections. However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown). We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes. Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy. Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.e. Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes. For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size. Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.


2021 ◽  
Vol 12 (1) ◽  
pp. 60
Author(s):  
Samuel Ndichu ◽  
Sangwook Kim ◽  
Seiichi Ozawa ◽  
Tao Ban ◽  
Takeshi Takahashi ◽  
...  

Attacks using Uniform Resource Locators (URLs) and their JavaScript (JS) code content to perpetrate malicious activities on the Internet are rampant and continuously evolving. Methods such as blocklisting, client honeypots, domain reputation inspection, and heuristic and signature-based systems are used to detect these malicious activities. Recently, machine learning approaches have been proposed; however, challenges still exist. First, blocklist systems are easily evaded by new URLs and JS code content, obfuscation, fast-flux, cloaking, and URL shortening. Second, heuristic and signature-based systems do not generalize well to zero-day attacks. Third, the Domain Name System allows cybercriminals to easily migrate their malicious servers to hide their Internet protocol addresses behind domain names. Finally, crafting fully representative features is challenging, even for domain experts. This study proposes a feature selection and classification approach for malicious JS code content using Shapley additive explanations and tree ensemble methods. The JS code features are obtained from the Abstract Syntax Tree form of the JS code, sample JS attack codes, and association rule mining. The malicious and benign JS code datasets obtained from Hynek Petrak and the Majestic Million Service were used for performance evaluation. We compared the performance of the proposed method to those of other feature selection methods in the task of malicious JS code content detection. With a recall of 0.9989, our experimental results show that the proposed approach is a better prediction model.


Sign in / Sign up

Export Citation Format

Share Document