Preliminary study on the application of renal ultrasonography radiomics in the classification of glomerulopathy

Abstract Background The aim of this study was to investigate the potential use of renal ultrasonography radiomics features in the histologic classification of glomerulopathy. Methods A total of 623 renal ultrasound images from 46 membranous nephropathy (MN) and 22 IgA nephropathy patients were collected. The cases and images were divided into a training group (51 cases with 470 images) and a test group (17 cases with 153 images). A total of 180 dimensional features were designed and extracted from the renal parenchyma in the ultrasound images. Least absolute shrinkage and selection operator (LASSO) logistic regression was then applied to these normalized radiomics features to select the features with the highest correlations. Four machine learning classifiers, including logistic regression, a support vector machine (SVM), a random forest, and a K-nearest neighbour classifier, were deployed for the classification of MN and IgA nephropathy. Subsequently, the results were assessed according to accuracy and receiver operating characteristic (ROC) curves. Results Patients with MN were older than patients with IgA nephropathy. MN primarily manifested in patients as nephrotic syndrome, whereas IgA nephropathy presented mainly as nephritic syndrome. Analysis of the classification performance of the four classifiers for IgA nephropathy and MN revealed that the random forest achieved the highest area under the ROC curve (AUC) (0.7639) and the highest specificity (0.8750). However, logistic regression attained the highest accuracy (0.7647) and the highest sensitivity (0.8889). Conclusions Quantitative radiomics imaging features extracted from digital renal ultrasound are fully capable of distinguishing IgA nephropathy from MN. Radiomics analysis, a non-invasive method, is helpful for histological classification of glomerulopathy.

Download Full-text

Image Classification of Tourist Attractions with K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine

International Journal on Advanced Science Engineering and Information Technology ◽

10.18517/ijaseit.10.6.9098 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2207

Author(s):

Herry Sujaini

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Tourist Attractions

Download Full-text

Germline BRCA 1-2 status prediction through ovarian ultrasound images radiogenomics: a hypothesis generating study (PROBE study)

Scientific Reports ◽

10.1038/s41598-020-73505-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Camilla Nero ◽

Francesca Ciccarone ◽

Luca Boldrini ◽

Jacopo Lenkowicz ◽

Ida Paris ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Univariate Analysis ◽

Multivariable Analysis ◽

Machine Learning Techniques ◽

Support Vector ◽

Ultrasound Images ◽

Imaging Features ◽

Gene Status ◽

Testing Set

Abstract Radiogenomics is a specific application of radiomics where imaging features are linked to genomic profiles. We aim to develop a radiogenomics model based on ovarian US images for predicting germline BRCA1/2 gene status in women with healthy ovaries. From January 2013 to December 2017 a total of 255 patients addressed to germline BRCA1/2 testing and pelvic US documenting normal ovaries, were retrospectively included. Feature selection for univariate analysis was carried out via correlation analysis. Multivariable analysis for classification of germline BRCA1/2 status was then carried out via logistic regression, support vector machine, ensemble of decision trees and automated machine learning pipelines. Data were split into a training (75%) and a testing (25%) set. The four strategies obtained a similar performance in terms of accuracy on the testing set (from 0.54 of logistic regression to 0.64 of the auto-machine learning pipeline). Data coming from one of the tested US machine showed generally higher performances, particularly with the auto-machine learning pipeline (testing set specificity 0.87, negative predictive value 0.73, accuracy value 0.72 and 0.79 on training set). The study shows that a radiogenomics model on machine learning techniques is feasible and potentially useful for predicting gBRCA1/2 status in women with healthy ovaries.

Download Full-text

A data mining approach to the diagnosis of failure modes for two serial fastened sandwich composite plates

Journal of Composite Materials ◽

10.1177/0021998316679720 ◽

2016 ◽

Vol 51 (20) ◽

pp. 2853-2862 ◽

Cited By ~ 2

Author(s):

Serkan Ballı

Keyword(s):

Data Mining ◽

Random Forest ◽

Failure Modes ◽

Composite Plates ◽

Study Data ◽

Sandwich Composite ◽

Support Vector ◽

Geometrical Parameters ◽

Mining Methods

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.

Download Full-text

Phybrata Sensors and Machine Learning for Enhanced Neurophysiological Diagnosis and Treatment

Sensors ◽

10.3390/s21217417 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7417

Author(s):

Alex J. Hope ◽

Utkarsh Vashisth ◽

Matthew J. Parker ◽

Andreas B. Ralston ◽

Joshua M. Roper ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Use Case ◽

Signal Features ◽

Test Population

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.

Download Full-text

Detecting Face Touching Using Smartwatches to Mitigate the Spread of COVID-19: Pilot Study (Preprint)

10.2196/preprints.28799 ◽

2021 ◽

Author(s):

Chen Bai ◽

Yu-Peng Chen ◽

Adam Wolach ◽

Lisa Anthony ◽

Mamoun Mardini

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Respiratory Diseases ◽

Window Size ◽

Support Vector ◽

Accelerometer Data ◽

Respiratory Illnesses ◽

Motion Data ◽

Machine Learning Methods

BACKGROUND Frequent spontaneous facial self-touches, predominantly during outbreaks, have the theoretical potential to be a mechanism of contracting and transmitting diseases. Despite the recent advent of vaccines, behavioral approaches remain an integral part of reducing the spread of COVID-19 and other respiratory illnesses. Real-time biofeedback of face touching can potentially mitigate the spread of respiratory diseases. The gap addressed in this study is the lack of an on-demand platform that utilizes motion data from smartwatches to accurately detect face touching. OBJECTIVE The aim of this study was to utilize the functionality and the spread of smartwatches to develop a smartwatch application to identifying motion signatures that are mapped accurately to face touching. METHODS Participants (n=10, 50% women, aged 20-83) performed 10 physical activities classified into: face touching (FT) and non-face touching (NFT) categories, in a standardized laboratory setting. We developed a smartwatch application on Samsung Galaxy Watch to collect raw accelerometer data from participants. Then, data features were extracted from consecutive non-overlapping windows varying from 2-16 seconds. We examined the performance of state-of-the-art machine learning methods on face touching movements recognition (FT vs NFT) and individual activity recognition (IAR): logistic regression, support vector machine, decision trees and random forest. RESULTS Machine learning models were accurate in recognizing face touching categories; logistic regression achieved the best performance across all metrics (Accuracy: 0.93 +/- 0.08, Recall: 0.89 +/- 0.16, Precision: 0.93 +/- 0.08, F1-score: 0.90 +/- 0.11, AUC: 0.95 +/- 0.07) at the window size of 5 seconds. IAR models resulted in lower performance; the random forest classifier achieved the best performance across all metrics (Accuracy: 0.70 +/- 0.14, Recall: 0.70 +/- 0.14, Precision: 0.70 +/- 0.16, F1-score: 0.67 +/- 0.15) at the window size of 9 seconds. CONCLUSIONS Wearable devices, powered with machine learning, are effective in detecting facial touches. This is highly significant during respiratory infection outbreaks, as it has a great potential to refrain people from touching their faces and potentially mitigate the possibility of transmitting COVID-19 and future respiratory diseases.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

Damage Classification of Composites Using Machine Learning

Volume 13: Safety Engineering, Risk, and Reliability Analysis ◽

10.1115/imece2019-11851 ◽

2019 ◽

Author(s):

Shweta Dabetwar ◽

Stephen Ekwaro-Osire ◽

João Paulo Dias

Keyword(s):

Machine Learning ◽

Composite Materials ◽

Random Forest ◽

Condition Monitoring ◽

Machine Learning Algorithms ◽

Support Vector ◽

Damage Classification ◽

Combining Data ◽

Ultrasonic Measurements

Abstract Composite materials have tremendous and ever-increasing applications in complex engineering systems; thus, it is important to develop non-destructive and efficient condition monitoring methods to improve damage prediction, thereby avoiding catastrophic failures and reducing standby time. Nondestructive condition monitoring techniques when combined with machine learning applications can contribute towards the stated improvements. Thus, the research question taken into consideration for this paper is “Can machine learning techniques provide efficient damage classification of composite materials to improve condition monitoring using features extracted from acousto-ultrasonic measurements?” In order to answer this question, acoustic-ultrasonic signals in Carbon Fiber Reinforced Polymer (CFRP) composites for distinct damage levels were taken from NASA Ames prognostics data repository. Statistical condition indicators of the signals were used as features to train and test four traditional machine learning algorithms such as K-nearest neighbors, support vector machine, Decision Tree and Random Forest, and their performance was compared and discussed. Results showed higher accuracy for Random Forest with a strong dependency on the feature extraction/selection techniques employed. By combining data analysis from acoustic-ultrasonic measurements in composite materials with machine learning tools, this work contributes to the development of intelligent damage classification algorithms that can be applied to advanced online diagnostics and health management strategies of composite materials, operating under more complex working conditions.

Download Full-text

Comparison of Computational Algorithms for the Classification of Liver Cancer using SELDI Mass Spectrometry: A Case Study

Cancer Informatics ◽

10.1177/117693510700300021 ◽

2007 ◽

Vol 3 ◽

pp. 117693510700300 ◽

Cited By ~ 3

Author(s):

Changyu Shen ◽

Timothy E Breen ◽

Lacey E Dobrolecki ◽

C. Max Schmidt ◽

George W. Sledge ◽

...

Keyword(s):

Mass Spectrometry ◽

Hepatocellular Carcinoma ◽

Support Vector Machine ◽

Random Forest ◽

Radial Function ◽

Support Vector ◽

Laser Desorption Ionization ◽

Prediction Analysis

Introduction As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI) mass spectrometer. Methods Nine supervised classification algorithms are implemented in R software and compared for the classification accuracy. Results We found that the support vector machine with radial function is preferable as a tool for classification of hepatocellular carcinoma using features in SELDI mass spectra. Among the rest of the methods, random forest and prediction analysis of microarrays have better performance. A permutation-based technique reveals that the support vector machine with a radial function seems intrinsically superior in learning from the training data since it has a lower prediction error than others when there is essentially no differential signal. On the other hand, the performance of the random forest and prediction analysis of microarrays rely on their capability of capturing the signals with substantial differentiation between groups. Conclusions Our finding is similar to a previous study, where classification methods based on the Matrix Assisted Laser Desorption/Ionization (MALDI) mass spectrometry are compared for the prediction accuracy of ovarian cancer. The support vector machine, random forest and prediction analysis of microarrays provide better prediction accuracy for hepatocellular carcinoma using SELDI proteomic data than six other approaches.

Download Full-text

Feature-Based Classification of Prostate Ultrasound Images using Multiwavelet and Kernel Support Vector Machines

2007 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2007.4370968 ◽

2007 ◽

Cited By ~ 9

Author(s):

Amjad Zaim ◽

Taeil Yi ◽

Rick Keck

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Ultrasound Images ◽

Vector Machines ◽

Feature Based ◽

Kernel Support Vector Machines ◽

Feature Based Classification

Download Full-text

Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2021.6.2.120-129 ◽

2021 ◽

Vol 6 (2) ◽

pp. 120-129

Author(s):

Nadhif Ikbar Wibowo ◽

Tri Andika Maulana ◽

Hamzah Muhammad ◽

Nur Aini Rakhmawati

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Supervised Learning ◽

Support Vector ◽

Data Set ◽

Logistic Regression Classifier

Public responses, posted on Twitter reacting to the Tokopedia data leak incident, were used as a data set to compare the performance of three different classifiers, trained using supervised learning modeling, to classify sentiment on the text. All tweets were classified into either positive, negative, or neutral classes. This study compares the performance of Random Forest, Support-Vector Machine, and Logistic Regression classifier. Data was scraped automatically and used to evaluate several models; the SVM-based model has the highest f1-score 0.503583. SVM is the best performing classifier.

Download Full-text