Resource-efficient fast prediction in healthcare data analytics: A pruned Random Forest regression approach

AbstractIn predictive healthcare data analytics, high accuracy is both vital and paramount as low accuracy can lead to misdiagnosis, which is known to cause serious health consequences or death. Fast prediction is also considered an important desideratum particularly for machines and mobile devices with limited memory and processing power. For real-time health care analytics applications, particularly the ones that run on mobile devices, such traits (high accuracy and fast prediction) are highly desirable. In this paper, we propose to use an ensemble regression technique based on CLUB-DRF, which is a pruned Random Forest that possesses these features. The speed and accuracy of the method have been demonstrated by an experimental study on three medical data sets of three different diseases.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text

Chronic Kidney Disease for Collaborative Healthcare Data Analytics using Random Forest Classification Algorithms

2021 International Conference on Computer Communication and Informatics (ICCCI) ◽

10.1109/iccci50826.2021.9402574 ◽

2021 ◽

Author(s):

V. Shanmugarajeshwari ◽

M. Ilayaraja

Keyword(s):

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Data Analytics ◽

Classification Algorithms ◽

Random Forest Classification ◽

Healthcare Data ◽

Forest Classification

Download Full-text

Challenges and Trends in Clinical Data Analytics

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i4.pp348-360 ◽

2020 ◽

pp. 348-360

Keyword(s):

Health Care ◽

Data Analysis ◽

Clinical Data ◽

Data Analytics ◽

Health Care Systems ◽

Data Sets ◽

Handwritten Documents ◽

Healthcare Data ◽

Clinical Data Analysis ◽

Care Systems

:Today’s technological advancements facilitated the researcher in collecting and organizing various forms of healthcare data. Data is an integral part of health care analytics. Drug discovery for clinical data analytics forms an important breakthrough work in terms of computational approaches in health care systems. On the other hand, healthcare analysis provides better value for money. The health care data management is very challenging as 80% of the data is unstructured as it includes handwritten documents, images; computer-generated clinical reports such as MRI, ECG, city scan, etc. The paper aims at providing a summary of work carried out by scientists and researchers who worked in health care domains. More precisely the work focuses on clinical data analysis for the period 2013 to 2019. The organization of the work carried out is specifically with concerned to data sets, Techniques, and Methods used, Tools adopted, Key Findings in clinical data analysis. The overall objective is to identify the current challenges, trends, and gaps in clinical data analysis. The pathway of the work is focused on carrying out on the bibliometric survey and summarization of the key findings in a novel way.

Download Full-text

RANDOM FOREST REGRESSION FOR THE ESTIMATION OF LEAF AREA INDEX OF OKRA CROP USING GROUND BASED BISTATIC SCATTEROMETER

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-5-719-2018 ◽

2018 ◽

Vol XLII-5 ◽

pp. 719-725

Author(s):

S. A. Yadav ◽

R. Prasad ◽

A. K. Vishwakarma ◽

V. P. Yadav

Keyword(s):

Regression Analysis ◽

Random Forest ◽

Leaf Area Index ◽

Leaf Area ◽

Growth Stages ◽

Data Sets ◽

Angle Of Incidence ◽

Random Forest Regression ◽

Area Index ◽

Scattering Coefficients

Abstract. The specular bistatic scattering mechanism of Okra's crop was analyzed using dual polarized ground based bistatic scatterometer system at X, C, and L bands in the specular direction with the azimuthal angle(&theta;=0&deg;). An outdoor Okra crop bed of area 10&times;10m2 was specially prepared for the estimation of leaf area index (LAI) at HH and VV polarizations over the angular range of incidence angle 20&deg; to 60&deg; at steps of 10&deg;. The regression analysis was done between bistatic specular scattering coefficients and crop biophysical parameter at X, C, and L bands for HH and VV polarization at different angle of incidence to determine the optimum parameters of bistatic scatterometer system. The linear regression analysis showed the high correlation at 40&deg; angle of incidence for all bands and polarizations for the Okra crop. The computed scattering coefficients and measured LAI of Okra crop for the seven growth stages at 40&deg; angle of incidence were interpolated into 61 data sets. The data sets were divided into input, validation and testing for the training and testing of the developed random forest regression (RF) model for the estimation of LAI for Okra crop. The estimated values of LAI of Okra crop, by the developed RF regression model, were found more closer to the observed values at X band for VV polarization with coefficient of determination (R2=0.928) and low root mean square error (RMSE=0.260m2/m2) in comparison to C and L bands.

Download Full-text

BIG DATA ANALYSIS IN HEALTH CARE DOMAIN: A SYSTEMATIC REVIEW

International Journal of Engineering Technologies and Management Research ◽

10.29121/ijetmr.v5.i2.2018.605 ◽

2020 ◽

Vol 5 (2) ◽

pp. 1-8

Author(s):

Abhishek Bajpai ◽

Dr. Sanjiv Sharma

Keyword(s):

Health Care ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Healthcare Data ◽

Challenges And Opportunities ◽

Business Profit

As the Volume of the data produced is increasing day by day in our society, the exploration of big data in healthcare is increasing at an unprecedented rate. Now days, Big data is very popular buzzword concept in the various areas. This paper provide an effort is made to established that even the healthcare industries are stepping into big data pool to take all advantages from its various advanced tools and technologies. This paper provides the review of various research disciplines made in health care realm using big data approaches and methodologies. Big data methodologies can be used for the healthcare data analytics (which consist 4 V’s) which provide the better decision to accelerate the business profit and customer affection, acquire a better understanding of market behaviours and trends and to provide E-Health services using Digital imaging and communication in Medicine (DICOM).Big data Techniques like Map Reduce, Machine learning can be applied to develop system for early diagnosis of disease, i.e. analysis of the chronic disease like- heart disease, diabetes and stroke. The analysis on the data is performed using big data analytics framework Hadoop. Hadoop framework is used to process large data sets Further the paper present the various Big data tools , challenges and opportunities and various hurdles followed by the conclusion.

Download Full-text

Random Forest Regression models for Lactation and Successful Insemination in Holstein Friesian cows

10.1101/2020.11.17.386318 ◽

2020 ◽

Author(s):

Lillian Oluoch ◽

László Stachó ◽

László Viharos ◽

Andor Viharos ◽

Edit Mikó

Keyword(s):

Random Forest ◽

Milk Production ◽

Production Control ◽

Large Data ◽

Data Sets ◽

Random Forest Regression ◽

Data Set ◽

Explanatory Variables ◽

Holstein Friesian ◽

Reliable Model

AbstractTo overcome well-known difficulties in establishing reliable models based on large data sets, the Random Forest Regression (RFR) method is applied to study economical breeding and milk production of dairy cows. As for the features of RFR, there are several positive experiences in various areas of applications supporting that with RFR one can achieve reliable model predictions for industrial production of any product providing a useful base for decisions. In this study, a data set of a period of ten years including about eighty thousand cows was analysed by means of RFR. Ranking of production control parameters is obtained, the most important explanatory variables are found by computing the variances of the target variable on the sets created during the training phases of the RFR. Predictions are made for the milk production and the conception of the calves with high accuracy on given data and simulations are used to investigate prediction accuracy. This paper is primarily concerned with the mathematical aspects of a forthcoming work focused on the agricultural viewpoints. As for future mathematical research plans, the results will be compared with models based on factor analysis and linear regression.

Download Full-text

A computer vision approach to identifying the manufacturer and model of anterior cervical spinal hardware

Journal of Neurosurgery Spine ◽

10.3171/2019.6.spine19463 ◽

2019 ◽

Vol 31 (6) ◽

pp. 844-850 ◽

Cited By ~ 1

Author(s):

Kevin T. Huang ◽

Michael A. Silva ◽

Alfred P. See ◽

Kyle C. Wu ◽

Troy Gallerani ◽

...

Keyword(s):

Computer Vision ◽

Feature Detection ◽

High Accuracy ◽

Detection Accuracy ◽

Data Sets ◽

Visual Words ◽

Fusion Systems ◽

Kaze Feature ◽

Applications Of Machine Learning ◽

Cervical Plating

OBJECTIVERecent advances in computer vision have revolutionized many aspects of society but have yet to find significant penetrance in neurosurgery. One proposed use for this technology is to aid in the identification of implanted spinal hardware. In revision operations, knowing the manufacturer and model of previously implanted fusion systems upfront can facilitate a faster and safer procedure, but this information is frequently unavailable or incomplete. The authors present one approach for the automated, high-accuracy classification of anterior cervical hardware fusion systems using computer vision.METHODSPatient records were searched for those who underwent anterior-posterior (AP) cervical radiography following anterior cervical discectomy and fusion (ACDF) at the authors’ institution over a 10-year period (2008–2018). These images were then cropped and windowed to include just the cervical plating system. Images were then labeled with the appropriate manufacturer and system according to the operative record. A computer vision classifier was then constructed using the bag-of-visual-words technique and KAZE feature detection. Accuracy and validity were tested using an 80%/20% training/testing pseudorandom split over 100 iterations.RESULTSA total of 321 total images were isolated containing 9 different ACDF systems from 5 different companies. The correct system was identified as the top choice in 91.5% ± 3.8% of the cases and one of the top 2 or 3 choices in 97.1% ± 2.0% and 98.4 ± 13% of the cases, respectively. Performance persisted despite the inclusion of variable sizes of hardware (i.e., 1-level, 2-level, and 3-level plates). Stratification by the size of hardware did not improve performance.CONCLUSIONSA computer vision algorithm was trained to classify at least 9 different types of anterior cervical fusion systems using relatively sparse data sets and was demonstrated to perform with high accuracy. This represents one of many potential clinical applications of machine learning and computer vision in neurosurgical practice.

Download Full-text

User Power Behavior Similarity Clustering Based on Unsupervised Extreme Learning Machine Algorithm

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096512666191004130655 ◽

2020 ◽

Vol 13 (5) ◽

pp. 641-649

Author(s):

Yuancheng Li ◽

Yaqi Cui ◽

Xiaolong Zhang

Keyword(s):

Extreme Learning Machine ◽

Clustering Algorithm ◽

Characteristic Curve ◽

Clustering Algorithms ◽

Data Sets ◽

Residential Areas ◽

Processing Power ◽

Learning Machine ◽

Advanced Metering ◽

Matlab Programming

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.

Download Full-text