Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning (Preprint)

BACKGROUND Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training. OBJECTIVE The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting. METHODS Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio. RESULTS On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time. CONCLUSIONS These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.

Download Full-text

Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning

JMIR Formative Research ◽

10.2196/18279 ◽

2020 ◽

Vol 4 (6) ◽

pp. e18279 ◽

Cited By ~ 1

Author(s):

Rebecca O'Donovan ◽

Emre Sezgin ◽

Sven Bambach ◽

Eric Butter ◽

Simon Lin

Keyword(s):

Machine Learning ◽

Behavioral Disorders ◽

Clinical Data ◽

Area Under The Curve ◽

Tree Model ◽

Data Set ◽

Home Setting ◽

Detection Model ◽

Audio Data ◽

Model Training

Background Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training. Objective The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting. Methods Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio. Results On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time. Conclusions These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.

Download Full-text

Big Data for Health Care Analytics using Extreme Machine Learning Based on Map Reduce

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5808.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2758-2762

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Storage ◽

Clinical Data ◽

Disease Risk ◽

Learning Algorithm ◽

Information Storage ◽

Support Vector ◽

Machine Learning Algorithm ◽

Data Set

A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.

Download Full-text

Analysis of EEG signals using Machine Learning for the Detection and Diagnosis of Epilepsy

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1685.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 89-93

Keyword(s):

Machine Learning ◽

Epileptic Seizures ◽

Severe Damage ◽

Tree Model ◽

Data Set ◽

Feature Extraction Method ◽

Precautionary Measures ◽

Detection And Diagnosis ◽

Electroencephalogram Eeg ◽

Mean Variance

Electroencephalogram (EEG) is one of the most commonly used tools for epilepsy detection. In this paper we have presented two methods for the diagnosis of epilepsy using machine learning techniques.EEG waveforms have five different kinds of frequency bands. Out of which only two namely theta and gamma bands carry epileptic seizure information. Our model determines the statistical features like mean, variance, maximum, minimum, kurtosis, and skewness from the raw data set. This reduces the mathematical complexities and time consumption of the feature extraction method. It then uses a Logistic regression model and decision tree model to classify whether a person is epileptic or not. After the implementation of the machine learning models, parameters like accuracy, sensitivity, and recall have been found. The results for the same are analyzed in detail in this paper. Epileptic seizures cause severe damage to the brain which affects the health of a person. Our key objective from this paper is to help in the early prediction and detection of epilepsy so that preventive interventions can be provided and precautionary measures are taken to prevent the patient from suffering any severe damage

Download Full-text

Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29473 ◽

2015 ◽

Vol 9s3 ◽

pp. BBI.S29473 ◽

Cited By ~ 8

Author(s):

William Seffens ◽

Chad Evans ◽

Keyword(s):

Machine Learning ◽

Clinical Study ◽

Translational Research ◽

Clinical Data ◽

Rare Variants ◽

Genome Wide Association Studies ◽

Data Imputation ◽

Data Set ◽

Genome Wide ◽

Phenotype Data

Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

Download Full-text

Defining The Best-Fit Machine Learning Classifier Prediction Model For Diagnosis of Heart Disease

10.21203/rs.3.rs-1152876/v1 ◽

2021 ◽

Author(s):

Debarati Dey Roy ◽

Debashis De

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Heart Disease ◽

Supervised Learning ◽

Learning Algorithms ◽

Management Service ◽

Tree Model ◽

Data Set ◽

Death Cases ◽

Best Fit

Abstract Cardio vascular disease or alternatively heart disease is the primitive cause of death all around the world. Last few decades, it was observed that maximum death cases occurred due to heart failure. The heart failure death cases are associated with many risk factors for example high blood pressure, cholesterol level, sugar level etcetera. Therefore, it is advisable that regular and early diagnosis of these factors may reduce the risk of heart failure and hence achieve prompt disease management service. A commonly used technique to process these enormous medical data is called data mining, which help the researchers in health care domain. Several machine learning algorithms are used to analyses these data and help to design the best-fit model for early detection of heart diseases. This research paper contributes various attributes related to heart mal functioning and build the best-fit model using supervised learning algorithm such as various tree (fine tree, medium tree etc), Gaussian Naïve Bayes, Coarse KNN, Medium Gaussian SVM algorithms. In this paper, we used the data set from Kaggle.com. These data set comprises with total 732 instances along with 5 attributes. All these 5 attributes are to be considered for testing purpose and also to find out the best fit model for prediction of heart disease. In this research article we also compare the various classification models based on supervised learning algorithms. Based on the performance and accuracy rate we therefore, choose ‘Medium Tree’ model as the best-fit model. Maximum accuracy is obtained for ‘Medium Tree’ model. The confusion matrix for each model are calculated and analyze.

Download Full-text

Predicting Short-Term Survival after Gross Total or Near Total Resection in Glioblastomas by Machine Learning-Based Radiomic Analysis of Preoperative MRI

Cancers ◽

10.3390/cancers13205047 ◽

2021 ◽

Vol 13 (20) ◽

pp. 5047

Author(s):

Santiago Cepeda ◽

Angel Pérez-Nuñez ◽

Sergio García-García ◽

Daniel García-Pérez ◽

Ignacio Arrese ◽

...

Keyword(s):

Machine Learning ◽

Test Data ◽

Tumor Resection ◽

Area Under The Curve ◽

Risk Groups ◽

Feature Reduction ◽

Short Term ◽

Data Set ◽

Term Survival ◽

Short Term Survival

Radiomics, in combination with artificial intelligence, has emerged as a powerful tool for the development of predictive models in neuro-oncology. Our study aims to find an answer to a clinically relevant question: is there a radiomic profile that can identify glioblastoma (GBM) patients with short-term survival after complete tumor resection? A retrospective study of GBM patients who underwent surgery was conducted in two institutions between January 2019 and January 2020, along with cases from public databases. Cases with gross total or near total tumor resection were included. Preoperative structural multiparametric magnetic resonance imaging (mpMRI) sequences were pre-processed, and a total of 15,720 radiomic features were extracted. After feature reduction, machine learning-based classifiers were used to predict early mortality (<6 months). Additionally, a survival analysis was performed using the random survival forest (RSF) algorithm. A total of 203 patients were enrolled in this study. In the classification task, the naive Bayes classifier obtained the best results in the test data set, with an area under the curve (AUC) of 0.769 and classification accuracy of 80%. The RSF model allowed the stratification of patients into low- and high-risk groups. In the test data set, this model obtained values of C-Index = 0.61, IBS = 0.123 and integrated AUC at six months of 0.761. In this study, we developed a reliable predictive model of short-term survival in GBM by applying open-source and user-friendly computational means. These new tools will assist clinicians in adapting our therapeutic approach considering individual patient characteristics.

Download Full-text

An Integration of Cardiovascular Event Data and Machine Learning Models for Cardiac Arrest Predictions

International Journal of Health Sciences and Pharmacy ◽

10.47992/ijhsp.2581.6411.0061 ◽

2021 ◽

pp. 55-71

Author(s):

Krishna Prasad K ◽

Aithal P. S. ◽

Navin N. Bappalige ◽

Soumya S

Keyword(s):

Machine Learning ◽

Cardiac Arrest ◽

Area Under The Curve ◽

Computer Applications ◽

Data Sets ◽

Cardiovascular Risks ◽

Data Set ◽

Average Area ◽

Learning Classifier ◽

Tree Classifier

Purpose: Predicting and then preventing cardiac arrest of a patient in ICU is the most challenging phase even for a most highly skilled professional. The data been collected in ICU for a patient are huge, and the selection of a portion of data for preventing cardiac arrest in a quantum of time is highly decisive, analysing and predicting that large data require an effective system. An effective integration of computer applications and cardiovascular data is necessary to predict the cardiovascular risks. A machine learning technique is the right choice in the advent of technology to manage patients with cardiac arrest. Methodology: In this work we have collected and merged three data sets, Cleveland Dataset of US patients with total 303 records, Statlog Dataset of UK patients with 270 records, and Hungarian dataset of Hungary, Switzerland with 617 records. These data are the most comprehensive data set with a combination of all three data sets consisting of 11 common features with 1190 records. Findings/Results: Feature extraction phase extracts 7 features, which contribute to the event. In addition, extracted features are used to train the selected machine learning classifier models, and results are obtained and obtained results are then evaluated using test data and final results are drawn. Extra Tree Classifier has the highest value of 0.957 for average area under the curve (AUC). Originality: The originality of this combined Dataset analysis using machine learning classifier model results Extra Tree Classifier with highest value of 0.957 for average area under the curve (AUC). Paper Type: Experimental Research Keywords: Cardiac, Machine Learning, Random Forest, XBOOST, ROC AUC, ST Slope.

Download Full-text

A machine learning based prediction model of anti-PD-1 therapy response using noninvasive clinical information and blood markers of lung cancer patients.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e14138 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e14138-e14138

Author(s):

Beung-Chul AHN ◽

Kyoung Ho Pyo ◽

Dongmin Jung ◽

Chun-Feng Xin ◽

Chang Gon Kim ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Supervised Learning ◽

Clinical Data ◽

Ridge Regression ◽

Predictive Score ◽

Support Vector ◽

Data Set ◽

Test Set ◽

Flow Cytometry Data

e14138 Background: Immune checkpoint inhibitors have become breakthrough therapy for various types of cancers. However, regarding their total response rate around 20% based on clinical trials, predicting accurate aPD-1 response for individual patient is unestablished. The presence of PD-L1 expression or tumor infiltrating lymphocyte may be used as indicators of response but are limited. We developed models using machine learning methods to predict the aPD-1 response. Methods: A total of 126 advanced NSCLC patients treated with the aPD-1 were enrolled. Their clinical characteristics, treatment outcomes, and adverse events were collected. Total clinical data (n = 126) consist of 15 variables were divided into two subsets, discovery set (n = 63) and test set (n = 63). Thirteen supervised learning algorithms including support vector machine and regularized regression (lasso, ridge, elastic net) were applied on discovery set for model development and on test set for validation. Each model were evaluated according to the ROC curve and cross-validation method. Same methods were used to the subset which had additional flow cytometry data (n = 40). Results: The median age was 64 and 69.8% were male. Adenocarcinoma was predominant (69.8%) and twenty patients (15.1%) were driver mutation positive. Clinical data set (n = 126) demonstrated that the Ridge regression (AUC: 0.79) was the best model for prediction. Of 15 clinical variables, tumor burden, age, ECOG PS and PD-L1, were most important based on the random forest algorithm. When we merged the clinical and flow cytometry data, the Ridge regression model (AUC:0.82) showed better performance compared to using clinical data only. Among 52 variables of merged set, the top most important immune markers were as follows: CD3+CD8+CD25+/Teff-CD28, CD3+CD8+CD25-/Teff-Ki-67, and CD3+CD8+CD25+/Teff-NY-ESO/Teff-PD-1, which indicate activated tumor specific T cell subset. Conclusions: Our machine learning based model has benefit for predicting aPD-1 responses. After further validation in independent patient cohort, the supervised learning based non-invasive predictive score can be established to predict aPD-1 response.

Download Full-text

Superiority of Supervised Machine Learning on Reading Chest X-Rays in Intensive Care Units

Frontiers in Medicine ◽

10.3389/fmed.2021.676277 ◽

2021 ◽

Vol 8 ◽

Author(s):

Kumiko Tanaka ◽

Taka-aki Nakada ◽

Nozomi Takahashi ◽

Takahiro Dozono ◽

Yuichiro Yoshimura ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Pleural Effusion ◽

Diagnostic Accuracy ◽

Intensive Care Units ◽

Area Under The Curve ◽

Chest Radiographs ◽

Data Set ◽

Portable Chest ◽

Icu Physicians

Purpose: Portable chest radiographs are diagnostically indispensable in intensive care units (ICU). This study aimed to determine if the proposed machine learning technique increased in accuracy as the number of radiograph readings increased and if it was accurate in a clinical setting.Methods: Two independent data sets of portable chest radiographs (n = 380, a single Japanese hospital; n = 1,720, The National Institution of Health [NIH] ChestX-ray8 dataset) were analyzed. Each data set was divided training data and study data. Images were classified as atelectasis, pleural effusion, pneumonia, or no emergency. DenseNet-121, as a pre-trained deep convolutional neural network was used and ensemble learning was performed on the best-performing algorithms. Diagnostic accuracy and processing time were compared to those of ICU physicians.Results: In the single Japanese hospital data, the area under the curve (AUC) of diagnostic accuracy was 0.768. The area under the curve (AUC) of diagnostic accuracy significantly improved as the number of radiograph readings increased from 25 to 100% in the NIH data set. The AUC was higher than 0.9 for all categories toward the end of training with a large sample size. The time to complete 53 radiographs by machine learning was 70 times faster than the time taken by ICU physicians (9.66 s vs. 12 min). The diagnostic accuracy was higher by machine learning than by ICU physicians in most categories (atelectasis, AUC 0.744 vs. 0.555, P < 0.05; pleural effusion, 0.856 vs. 0.706, P < 0.01; pneumonia, 0.720 vs. 0.744, P = 0.88; no emergency, 0.751 vs. 0.698, P = 0.47).Conclusions: We developed an automatic detection system for portable chest radiographs in ICU setting; its performance was superior and quite faster than ICU physicians.

Download Full-text

Machine Learning Models and Neural Network Techniques for Predicting Uddanam CKD

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1792.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2550-2563

Keyword(s):

Neural Network ◽

Machine Learning ◽

Statistical Analysis ◽

Early Detection ◽

Clinical Data ◽

Process Model ◽

Research Work ◽

Process Time ◽

Data Set ◽

The World

Chronic kidney disease (CKD) is one of the most widely spread diseases across the world. Mysteriously some of the areas in the world like Srilanka, Nicrgua and Uddanam (India), this disease affect more and it is cause of thousands of deaths particular areas. Now days, the prevention with utilizing statistical analysis and early detection of CKD with utilizing Machine Learning (ML) and Neural Networks (NNs) are the most important topics. In this research work, we collected the data form Uddanam (costal area of srikakulam district, A.P, India) about patient’s clinical data, living styles (Habits and culture) and environmental conditions (water, land and etc.) data from 2016 to 2019. In this paper, we conduct the statistical analysis, Machine Learning (ML) and Neural Network application on clinical data set of Uddanam CKD for prevention and early detection of CKD. As per statistical analysis we can prevent the CKD in the Uddanam area. As per ML analysis Naive Bayes model is the best where the process model is constructed within 0.06 seconds and prediction accuracy is 99.9%. In the analysis of NNs, the 9 neurons hidden layer (HL) Artificial Neural Network (ANN) is very accurate than other all models where it performs 100% of accuracy for predicting CKD and it takes the 0.02 seconds process time.

Download Full-text