Distinguishing Focal Cortical Dysplasia From Glioneuronal Tumors in Patients With Epilepsy by Machine Learning

Purpose: We are aiming to build a supervised machine learning-based classifier, in order to preoperatively distinguish focal cortical dysplasia (FCD) from glioneuronal tumors (GNTs) in patients with epilepsy.Methods: This retrospective study was comprised of 96 patients who underwent epilepsy surgery, with the final neuropathologic diagnosis of either an FCD or GNTs. Seven classical machine learning algorithms (i.e., Random Forest, SVM, Decision Tree, Logistic Regression, XGBoost, LightGBM, and CatBoost) were employed and trained by our dataset to get the classification model. Ten features [i.e., Gender, Past history, Age at seizure onset, Course of disease, Seizure type, Seizure frequency, Scalp EEG biomarkers, MRI features, Lesion location, Number of antiepileptic drug (AEDs)] were analyzed in our study.Results: We enrolled 56 patients with FCD and 40 patients with GNTs, which included 29 with gangliogliomas (GGs) and 11 with dysembryoplasic neuroepithelial tumors (DNTs). Our study demonstrated that the Random Forest-based machine learning model offered the best predictive performance on distinguishing the diagnosis of FCD from GNTs, with an F1-score of 0.9180 and AUC value of 0.9340. Furthermore, the most discriminative factor between FCD and GNTs was the feature “age at seizure onset” with the Chi-square value of 1,213.0, suggesting that patients who had a younger age at seizure onset were more likely to be diagnosed as FCD.Conclusion: The Random Forest-based machine learning classifier can accurately differentiate FCD from GNTs in patients with epilepsy before surgery. This might lead to improved clinician confidence in appropriate surgical planning and treatment outcomes.

Download Full-text

Towards Near-Real-Time Intrusion Detection for IoT Devices using Supervised Learning and Apache Spark

Electronics ◽

10.3390/electronics9030444 ◽

2020 ◽

Vol 9 (3) ◽

pp. 444 ◽

Cited By ~ 1

Author(s):

Valerio Morfino ◽

Salvatore Rampone

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Hybrid Approach ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Apache Spark ◽

Identification Accuracy ◽

Supervised Machine Learning ◽

Iot Devices

In the fields of Internet of Things (IoT) infrastructures, attack and anomaly detection are rising concerns. With the increased use of IoT infrastructure in every domain, threats and attacks in these infrastructures are also growing proportionally. In this paper the performances of several machine learning algorithms in identifying cyber-attacks (namely SYN-DOS attacks) to IoT systems are compared both in terms of application performances, and in training/application times. We use supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on public datasets using a training set of up to 2 million instances. We adopt a Cloud environment, emphasizing the importance of the scalability and of the elasticity of use. Results show that all the Spark algorithms used result in a very good identification accuracy (>99%). Overall, one of them, Random Forest, achieves an accuracy of 1. We also report a very short training time (23.22 sec for Decision Tree with 2 million rows). The experiments also show a very low application time (0.13 sec for over than 600,000 instances for Random Forest) using Apache Spark in the Cloud. Furthermore, the explicit model generated by Random Forest is very easy-to-implement using high- or low-level programming languages. In light of the results obtained, both in terms of computation times and identification performance, a hybrid approach for the detection of SYN-DOS cyber-attacks on IoT devices is proposed: the application of an explicit Random Forest model, implemented directly on the IoT device, along with a second level analysis (training) performed in the Cloud.

Download Full-text

Detecting Real-Time Fall of Elderly People Using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39635 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1913-1918

Author(s):

Prathima P

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Elderly People ◽

Fall Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

False Alarms ◽

Severe Injuries

Abstract: Fall is a significant national health issue for the elderly people, generally resulting in severe injuries when the person lies down on the floor over an extended period without any aid after experiencing a great fall. Thus, elders need to be cared very attentively. A supervised-machine learning based fall detection approach with accelerometer, gyroscope is devised. The system can detect falls by grouping different actions as fall or non-fall events and the care taker is alerted immediately as soon as the person falls. The public dataset SisFall with efficient class of features is used to identify fall. The Random Forest (RF) and Support Vector Machine (SVM) machine learning algorithms are employed to detect falls with lesser false alarms. The SVM algorithm obtain a highest accuracy of 99.23% than RF algorithm. Keywords: Fall detection, Machine learning, Supervised classification, Sisfall, Activities of daily living, Wearable sensors, Random Forest, Support Vector Machine

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.21203/rs.3.rs-147010/v1 ◽

2021 ◽

Author(s):

Marc Raphael ◽

Michael Robitaille ◽

Jeff Byers ◽

Joseph Christodoulides

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.1101/2021.01.07.425773 ◽

2021 ◽

Author(s):

Michael C. Robitaille ◽

Jeff M. Byers ◽

Joseph A. Christodoulides ◽

Marc P. Raphael

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm's initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery's optical modality, magnification or cell type.

Download Full-text

Machine learning in the diagnosis of Myocardial Infarction with Non-Obstructive Coronary Arteries

European Heart Journal ◽

10.1093/eurheartj/ehab724.3067 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M J Espinosa Pascual ◽

P Vaquero Martinez ◽

V Vaquero Martinez ◽

J Lopez Pais ◽

B Izquierdo Coronel ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Obstructive Coronary Artery Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Machine Learning Can Assign Geologic Basin to Produced Water Samples Using Major Ion Geochemistry

Natural Resources Research ◽

10.1007/s11053-021-09949-8 ◽

2021 ◽

Author(s):

Jenna L. Shelton ◽

Aaron M. Jubb ◽

Samuel W. Saxe ◽

Emil D. Attanasi ◽

Alexei V. Milkov ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Produced Water ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Validation Dataset ◽

Ion Chemistry ◽

Major Ion Chemistry ◽

Major Ion ◽

Produced Waters

AbstractUnderstanding the geochemistry of waters produced during petroleum extraction is essential to informing the best treatment and reuse options, which can potentially be optimized for a given geologic basin. Here, we used the US Geological Survey’s National Produced Waters Geochemical Database (PWGD) to determine if major ion chemistry could be used to classify accurately a produced water sample to a given geologic basin based on similarities to a given training dataset. Two datasets were derived from the PWGD: one with seven features but more samples (PWGD7), and another with nine features but fewer samples (PWGD9). The seven-feature dataset, prior to randomly generating a training and testing (i.e., validation) dataset, had 58,541 samples, 20 basins, and was classified based on total dissolved solids (TDS), bicarbonate (HCO3), Ca, Na, Cl, Mg, and sulfate (SO4). The nine-feature dataset, prior to randomly splitting into a training and testing (i.e., validation) dataset, contained 33,271 samples, 19 basins, and was classified based on TDS, HCO3, Ca, Na, Cl, Mg, SO4, pH, and specific gravity. Three supervised machine learning algorithms—Random Forest, k-Nearest Neighbors, and Naïve Bayes—were used to develop multi-class classification models to predict a basin of origin for produced waters using major ion chemistry. After training, the models were tested on three different datasets: Validation7, Validation9, and one based on data absent from the PWGD. Prediction accuracies across the models ranged from 23.5 to 73.5% when tested on the two PWGD-based datasets. A model using the Random Forest algorithm predicted most accurately compared to all other models tested. The models generally predicted basin of origin more accurately on the PWGD7-based dataset than on the PWGD9-based dataset. An additional dataset, which contained data not in the PWGD, was used to test the most accurate model; results suggest that some basins may lack geochemical diversity or may not be well described, while others may be geochemically diverse or are well described. A compelling result of this work is that a produced water basin of origin can be determined using major ions alone and, therefore, deep basinal fluid compositions may not be as variable within a given basin as previously thought. Applications include predicting the geochemistry of produced fluid prior to drilling at different intervals and assigning historical produced water data to a producing basin.

Download Full-text

Leveraging Machine Learning Algorithms For Zero-Day Ransomware Attack

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8694.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4104-4107

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Supervised Learning Algorithms ◽

Microsoft Windows

Current global huge cyber protection attacks resulting from Infected Encryption ransomware structures over all international locations and businesses with millions of greenbacks lost in paying compulsion abundance. This type of malware encrypts consumer files, extracts consumer files, and charges higher ransoms to be paid for decryption of keys. An attacker could use different types of ransomware approach to steal a victim's files. Some of ransomware attacks like Scareware, Mobile ransomware, WannaCry, CryptoLocker, Zero-Day ransomware attack etc. A zero-day vulnerability is a software program security flaw this is regarded to the software seller however doesn’t have patch in vicinity to restore a flaw. Despite the fact that machine learning algorithms are already used to find encryption Ransomware. This is based on the analysis of a large number of PE file data Samples (benign software and ransomware utility) makes use of supervised machine learning algorithms for ascertain Zero-day attacks. This work was done on a Microsoft Windows operating system (the most attacked os through encryption ransomware) and estimated it. We have used four Supervised learning Algorithms, Random Forest Classifier , K-Nearest Neighbor, Support Vector Machine and Logistic Regression. Tests using machine learning algorithms evaluate almost null false positives with a 99.5% accuracy with a random forest algorithm.

Download Full-text

Attribute-oriented Classification with Variable Importance using Random Forest Model

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1297.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 1630-1635

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Large Data ◽

Variable Importance ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Importance Measure ◽

Variable Importance Measure

In the present century, various classification issues are raised with large data and most commonly used machine learning algorithms are failed in the classification process to get accurate results. Datamining techniques like ensemble, which is made up of individual classifiers for the classification process and to generate the new data as well. Random forest is one of the ensemble supervised machine learning technique and essentially used in numerous machine learning applications such as the classification of text and image data. It is popular since it collects more relevant features such as variable importance measure, Out-of-bag error etc. For the viable learning and classification of random forest, it is required to reduce the number of decision trees (Pruning) in the random forest. In this paper, we have presented systematic overview of random forest algorithm along with its application areas. In addition, we presented a brief review of machine learning algorithm proposed in the recent years. Animal classification is considered as an important problem and most of the recent studies are classifying the animals by taking the image dataset. But, very less work has been done on attribute-oriented animal classification and poses many challenges in the process of extracting the accurate features. We have taken a real-time dataset from the Kaggle to classify the animal by collecting the more relevant features with the help of variable importance measure metric and compared with the other popular machine learning models.

Download Full-text

Streamlining Quality Review of Mass Spectrometry Data in the Clinical Laboratory by Use of Machine Learning

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2018-0238-oa ◽

2019 ◽

Vol 143 (8) ◽

pp. 990-998 ◽

Cited By ~ 2

Author(s):

Min Yu ◽

Lindsay A. L. Bazydlo ◽

David E. Bruns ◽

James H. Harrison

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Turnaround Time ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Test Dataset ◽

Manual Review

Context.— Turnaround time and productivity of clinical mass spectrometric (MS) testing are hampered by time-consuming manual review of the analytical quality of MS data before release of patient results. Objective.— To determine whether a classification model created by using standard machine learning algorithms can verify analytically acceptable MS results and thereby reduce manual review requirements. Design.— We obtained retrospective data from gas chromatography–MS analyses of 11-nor-9-carboxy-delta-9-tetrahydrocannabinol (THC-COOH) in 1267 urine samples. The data for each sample had been labeled previously as either analytically unacceptable or acceptable by manual review. The dataset was randomly split into training and test sets (848 and 419 samples, respectively), maintaining equal proportions of acceptable (90%) and unacceptable (10%) results in each set. We used stratified 10-fold cross-validation in assessing the abilities of 6 supervised machine learning algorithms to distinguish unacceptable from acceptable assay results in the training dataset. The classifier with the highest recall was used to build a final model, and its performance was evaluated against the test dataset. Results.— In comparison testing of the 6 classifiers, a model based on the Support Vector Machines algorithm yielded the highest recall and acceptable precision. After optimization, this model correctly identified all unacceptable results in the test dataset (100% recall) with a precision of 81%. Conclusions.— Automated data review identified all analytically unacceptable assays in the test dataset, while reducing the manual review requirement by about 87%. This automation strategy can focus manual review only on assays likely to be problematic, allowing improved throughput and turnaround time without reducing quality.

Download Full-text