scholarly journals Deducing Optimal Classification Algorithm for Heterogeneous Fabric

Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneous rock fabric, we identified Random Forest, among others, to be the appropriate algorithm.

2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2021 ◽  
Vol 8 (3) ◽  
pp. 209-221
Author(s):  
Li-Li Wei ◽  
Yue-Shuai Pan ◽  
Yan Zhang ◽  
Kai Chen ◽  
Hao-Yu Wang ◽  
...  

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 15107-15107
Author(s):  
R. V. Iyer ◽  
B. Tennant ◽  
M. Ruiz ◽  
T. Szyperski ◽  
D. Trump ◽  
...  

15107 Background: HCC is a common and rapidly fatal cancer. Current screening tools are inadequate for identification of potentially curable cases. Our aim was to determine whether H-NMR can identify HCC compared to controls in the woodchuck (WC) model of hepatitis related HCC. Methods: Eastern WCs were bred and inoculated at birth with dilute sera from WCs that are chronic carriers of Woodchuck Hepatitis B Virus (WHV). This resulted in chronic hepatitis in ∼60% animals and all carriers developed HCC by 24–36 months. Serum from 10 chronic WHV carriers with HCC (group 1), 5 WHV carriers with no HCC (group 2) and 15 matched non-infected controls (group 3) was obtained. 45uL serum was diluted with 5uL of D2O containing 27mM formic acid + 0.9% saline. Spectra were collected on a 600 MHz INOVA spectrometer using a CapNMR flow probe with 10uL flow cell at 298K without knowledge of group assignments. The resulting 1D spectra were processed using Nuts from AcornNMR. Results: Principle component analysis and supervised PLS-DA was performed using Simca P+ from Umetrics. Despite general separation of groups, the Q2 value of this model was relatively low (0.20). We trained a Support Vector Machine (SVM) algorithm, a supervised machine-learning algorithm, to learn to identify the groups. Evaluation of the performance of the algorithm using 10-fold validation on the data set achieved a Kappa value of 0.43. This algorithm learnt to identify HCC [0.765 ROC, 0.8 sensitivity, and 0.727 positive predictive value (PPV)] and controls (0.75 ROC, 0.69 sensitivity and 0.73 PPV) but not the WHV carrier group, likely due to the small numbers. In a second analysis of 10 HCC and 15 controls, PLS-DA showed clear separation using three components (Q2= 0.5). The corresponding SVM model showed a kappa value of 0.52 and ROC values of 0.767 for both classes. Conclusions: Our preliminary results indicate that H-NMR spectra alone can be used to distinguish HCC from healthy controls using the machine-learning algorithm for classification. Further validation in a larger cohort of woodchucks is ongoing and confirmation of these preliminary findings would support investigation of this technique as a screening tool in patients at risk for developing HCC. No significant financial relationships to disclose.


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


2020 ◽  
Vol 10 (1) ◽  
pp. 1-11
Author(s):  
Arvind Shrivastava ◽  
Nitin Kumar ◽  
Kuldeep Kumar ◽  
Sanjeev Gupta

The paper deals with the Random Forest, a popular classification machine learning algorithm to predict bankruptcy (distress) for Indian firms. Random Forest orders firms according to their propensity to default or their likelihood to become distressed. This is also useful to explain the association between the tendency of firm failure and its features. The results are analyzed vis-à-vis Tree Net. Both in-sample and out of sample estimations have been performed to compare Random Forest with Tree Net, which is a cutting edge data mining tool known to provide satisfactory estimation results. An exhaustive data set comprising companies from varied sectors have been included in the analysis. It is found that Tree Net procedure provides improved classification and predictive performance vis-à-vis Random Forest methodology consistently that may be utilized further by industry analysts and researchers alike for predictive purposes.


Sign in / Sign up

Export Citation Format

Share Document