Chi-square-based Scoring Function for Categorization of MEDLINE Citations

Summary Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE® citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH® descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine-learning algorithms (support vector machines, decision trees, naïve Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine-learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.

Download Full-text

IgA Nephropathy Prediction in Children with Machine Learning Algorithms

Future Internet ◽

10.3390/fi12120230 ◽

2020 ◽

Vol 12 (12) ◽

pp. 230

Author(s):

Ping Zhang ◽

Rongqin Wang ◽

Nianfeng Shi

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Immunoglobulin A ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Chi Square ◽

Extreme Gradient Boosting

Immunoglobulin A nephropathy (IgAN) is the most common primary glomerular disease all over the world and it is a major cause of renal failure. IgAN prediction in children with machine learning algorithms has been rarely studied. We retrospectively analyzed the electronic medical records from the Nanjing Eastern War Zone Hospital, chose eXtreme Gradient Boosting (XGBoost), random forest (RF), CatBoost, support vector machines (SVM), k-nearest neighbor (KNN), and extreme learning machine (ELM) models in order to predict the probability that the patient would not reach or reach end-stage renal disease (ESRD) within five years, used the chi-square test to select the most relevant 16 features as the input of the model, and designed a decision-making system (DMS) of IgAN prediction in children that is based on XGBoost and Django framework. The receiver operating characteristic (ROC) curve was used in order to evaluate the performance of the models and XGBoost had the best performance by comparison. The AUC value, accuracy, precision, recall, and f1-score of XGBoost were 85.11%, 78.60%, 75.96%, 76.70%, and 76.33%, respectively. The XGBoost model is useful for physicians and pediatric patients in providing predictions regarding IgAN. As an advantage, a DMS can be designed based on the XGBoost model to assist a physician to effectively treat IgAN in children for preventing deterioration.

Download Full-text

Predicting rural patients� use of eHealth through supervised machine learning algorithms: A study on Portable Health Clinic in Bangladesh (Preprint)

10.2196/preprints.10761 ◽

2018 ◽

Author(s):

Nazmul Hossain ◽

Fumihiko Yokota ◽

Akira Fukuda ◽

Ashir Ahmed

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Predictive Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Rural Patients

BACKGROUND Predictive analytics through machine learning has been extensively using across industries including eHealth and mHealth for analyzing patient’s health data, predicting diseases, enhancing the productivity of technology or devices used for providing healthcare services and so on. However, not enough studies were conducted to predict the usage of eHealth by rural patients in developing countries. OBJECTIVE The objective of this study is to predict rural patients’ use of eHealth through supervised machine learning algorithms and propose the best-fitted model after evaluating their performances in terms of predictive accuracy. METHODS Data were collected between June and July 2016 through a field survey with structured questionnaire form 292 randomly selected rural patients in a remote North-Western sub-district of Bangladesh. Four supervised machine learning algorithms namely logistic regression, boosted decision tree, support vector machine, and artificial neural network were chosen for this experiment. A ‘correlation-based feature selection’ technique was applied to include the most relevant but not redundant features into the model. A 10-fold cross-validation technique was applied to reduce bias and over-fitting of the data. RESULTS Logistic regression outperformed other three algorithms with 85.9% predictive accuracy, 86.4% precision, 90.5% recall, 88.1% F-score, and AUC of 91.5% followed by neural network, decision tree and support vector machine with the accuracy rate of 84.2%, 82.9 %, and 80.4% respectively. CONCLUSIONS The findings of this study are expected to be helpful for eHealth practitioners in selecting appropriate areas to serve and dealing with both under-capacity and over-capacity by predicting the patients’ response in advance with a certain level of accuracy and precision.

Download Full-text

Machine Learning Models for Analysis of Vital Signs Dynamics: A Case for Sepsis Onset Prediction

Journal of Healthcare Engineering ◽

10.1155/2019/5930379 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Eli Bloch ◽

Tammy Rotem ◽

Jonathan Cohen ◽

Pierre Singer ◽

Yehudit Aperstein

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Learning Algorithms ◽

Vital Signs ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Icu Stay ◽

Novel Approach ◽

High Level

Objective. Achieving accurate prediction of sepsis detection moment based on bedside monitor data in the intensive care unit (ICU). A good clinical outcome is more probable when onset is suspected and treated on time, thus early insight of sepsis onset may save lives and reduce costs. Methodology. We present a novel approach for feature extraction, which focuses on the hypothesis that unstable patients are more prone to develop sepsis during ICU stay. These features are used in machine learning algorithms to provide a prediction of a patient’s likelihood to develop sepsis during ICU stay, hours before it is diagnosed. Results. Five machine learning algorithms were implemented using R software packages. The algorithms were trained and tested with a set of 4 features which represent the variability in vital signs. These algorithms aimed to calculate a patient’s probability to become septic within the next 4 hours, based on recordings from the last 8 hours. The best area under the curve (AUC) was achieved with Support Vector Machine (SVM) with radial basis function, which was 88.38%. Conclusions. The high level of predictive accuracy along with the simplicity and availability of input variables present great potential if applied in ICUs. Variability of a patient’s vital signs proves to be a good indicator of one’s chance to become septic during ICU stay.

Download Full-text

Text categorization Performance examination Using Machine Learning Algorithms

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/981/2/022044 ◽

2020 ◽

Vol 981 ◽

pp. 022044

Author(s):

Bonthala Prabhanjan Yadav ◽

Sukhaveerji Ghate ◽

A Harshavardhan ◽

G Jhansi ◽

Komuravelly Sudheer Kumar ◽

...

Keyword(s):

Machine Learning ◽

Text Categorization ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Examination

Download Full-text

Hierarchical Tactile Sensation Integration from Prosthetic Fingertips Enables Multi-Texture Surface Recognition

Sensors ◽

10.3390/s21134324 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4324

Author(s):

Moaed A. Abd ◽

Rudy Paul ◽

Aparna Aravelli ◽

Ou Bai ◽

Leonel Lagos ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Sliding Contact ◽

Tactile Sensation ◽

Support Vector ◽

Textured Surfaces ◽

K Nearest Neighbor ◽

Tactile Sensors ◽

Time Frequency

Multifunctional flexible tactile sensors could be useful to improve the control of prosthetic hands. To that end, highly stretchable liquid metal tactile sensors (LMS) were designed, manufactured via photolithography, and incorporated into the fingertips of a prosthetic hand. Three novel contributions were made with the LMS. First, individual fingertips were used to distinguish between different speeds of sliding contact with different surfaces. Second, differences in surface textures were reliably detected during sliding contact. Third, the capacity for hierarchical tactile sensor integration was demonstrated by using four LMS signals simultaneously to distinguish between ten complex multi-textured surfaces. Four different machine learning algorithms were compared for their successful classification capabilities: K-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and neural network (NN). The time-frequency features of the LMSs were extracted to train and test the machine learning algorithms. The NN generally performed the best at the speed and texture detection with a single finger and had a 99.2 ± 0.8% accuracy to distinguish between ten different multi-textured surfaces using four LMSs from four fingers simultaneously. The capability for hierarchical multi-finger tactile sensation integration could be useful to provide a higher level of intelligence for artificial hands.

Download Full-text

Remote sensing inversion of water quality in coastal sea area based on machine learning: a case study of Shenzhen bay, China

10.5194/egusphere-egu21-1972 ◽

2021 ◽

Author(s):

Xiaotong Zhu ◽

Jinhui Jeanne Huang

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Water Quality ◽

Predictive Accuracy ◽

Water Environment ◽

Quality Parameters ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Support Vector ◽

Seawater Quality

Remote sensing monitoring has the characteristics of wide monitoring range, celerity, low cost for long-term dynamic monitoring of water environment. With the flourish of artificial intelligence, machine learning has enabled remote sensing inversion of seawater quality to achieve higher prediction accuracy. However, due to the physicochemical property of the water quality parameters, the performance of algorithms differs a lot. In order to improve the predictive accuracy of seawater quality parameters, we proposed a technical framework to identify the optimal machine learning algorithms using Sentinel-2 satellite and in-situ seawater sample data. In the study, we select three algorithms, i.e. support vector regression (SVR), XGBoost and deep learning (DL), and four seawater quality parameters, i.e. dissolved oxygen (DO), total dissolved solids (TDS), turbidity(TUR) and chlorophyll-a (Chla). The results show that SVR is a more precise algorithm to inverse DO (R2 = 0.81). XGBoost has the best accuracy for Chla and Tur inversion (R2 = 0.75 and 0.78 respectively) while DL performs better in TDS (R2 =0.789). Overall, this research provides a theoretical support for high precision remote sensing inversion of offshore seawater quality parameters based on machine learning.

Download Full-text

Machine Learning Models for Finger Bend Evaluation using Implemented Low cost Flex Sensor

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35742 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3605-3611

Author(s):

Pratyush Kaware

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Low Cost ◽

Learning Algorithms ◽

Cost Effective ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

In this paper a cost-effective sensor has been implemented to read finger bend signals, by attaching the sensor to a finger, so as to classify them based on the degree of bent as well as the joint about which the finger was being bent. This was done by testing with various machine learning algorithms to get the most accurate and consistent classifier. Finally, we found that Support Vector Machine was the best algorithm suited to classify our data, using we were able predict live state of a finger, i.e., the degree of bent and the joints involved. The live voltage values from the sensor were transmitted using a NodeMCU micro-controller which were converted to digital and uploaded on a database for analysis.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

Feature Selection and Comparison of Machine Learning Algorithms in Classification of Grazing and Rumination Behaviour in Sheep

Sensors ◽

10.3390/s18103532 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3532 ◽

Cited By ~ 16

Author(s):

Nicola Mansbridge ◽

Jurgen Mitsch ◽

Nicola Bollard ◽

Keith Ellis ◽

Giuliana Miguel-Pacheco ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Time Budget ◽

Learning Algorithms ◽

Eating Behaviour ◽

Machine Learning Algorithms ◽

Support Vector ◽

Optimum Number ◽

Eating Behaviours ◽

Adaptive Boosting

Grazing and ruminating are the most important behaviours for ruminants, as they spend most of their daily time budget performing these. Continuous surveillance of eating behaviour is an important means for monitoring ruminant health, productivity and welfare. However, surveillance performed by human operators is prone to human variance, time-consuming and costly, especially on animals kept at pasture or free-ranging. The use of sensors to automatically acquire data, and software to classify and identify behaviours, offers significant potential in addressing such issues. In this work, data collected from sheep by means of an accelerometer/gyroscope sensor attached to the ear and collar, sampled at 16 Hz, were used to develop classifiers for grazing and ruminating behaviour using various machine learning algorithms: random forest (RF), support vector machine (SVM), k nearest neighbour (kNN) and adaptive boosting (Adaboost). Multiple features extracted from the signals were ranked on their importance for classification. Several performance indicators were considered when comparing classifiers as a function of algorithm used, sensor localisation and number of used features. Random forest yielded the highest overall accuracies: 92% for collar and 91% for ear. Gyroscope-based features were shown to have the greatest relative importance for eating behaviours. The optimum number of feature characteristics to be incorporated into the model was 39, from both ear and collar data. The findings suggest that one can successfully classify eating behaviours in sheep with very high accuracy; this could be used to develop a device for automatic monitoring of feed intake in the sheep sector to monitor health and welfare.

Download Full-text

A Comparative Analysis of Machine Learning Algorithms Modeled from Machine Vision-Based Lettuce Growth Stage Classification in Smart Aquaponics

International Journal of Environmental Science and Development ◽

10.18178/ijesd.2020.11.9.1288 ◽

2020 ◽

Vol 11 (9) ◽

pp. 442-449 ◽

Cited By ~ 1

Author(s):

Sandy C. Lauguico ◽

◽

Ronnie S. Concepcion II ◽

Jonnel D. Alejandrino ◽

Rogelio Ruzcko Tobias ◽

...

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Machine Vision ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Urban Farming ◽

K Nearest Neighbor ◽

Lettuce Growth

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.

Download Full-text