Prediction of Short-Distance Aerial Movement of Phakopsora pachyrhizi Urediniospores Using Machine Learning

Dispersal of urediniospores by wind is the primary means of spread for Phakopsora pachyrhizi, the cause of soybean rust. Our research focused on the short-distance movement of urediniospores from within the soybean canopy and up to 61 m from field-grown rust-infected soybean plants. Environmental variables were used to develop and compare models including the least absolute shrinkage and selection operator regression, zero-inflated Poisson/regular Poisson regression, random forest, and neural network to describe deposition of urediniospores collected in passive and active traps. All four models identified distance of trap from source, humidity, temperature, wind direction, and wind speed as the five most important variables influencing short-distance movement of urediniospores. The random forest model provided the best predictions, explaining 76.1 and 86.8% of the total variation in the passive- and active-trap datasets, respectively. The prediction accuracy based on the correlation coefficient (r) between predicted values and the true values were 0.83 (P < 0.0001) and 0.94 (P < 0.0001) for the passive and active trap datasets, respectively. Overall, multiple machine learning techniques identified the most important variables to make the most accurate predictions of movement of P. pachyrhizi urediniospores short-distance.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Heart Disease Prediction using Machine Learning Techniques

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst2183218 ◽

2021 ◽

pp. 42-47

Author(s):

Ramesh Ponnala ◽

K. Sai Sowjanya

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Linear Model ◽

Machine Learning Techniques ◽

Disease Prediction ◽

Huge Amount ◽

Healthcare Enterprise ◽

Learning Techniques ◽

Accuracy Level

Prediction of Cardiovascular ailment is an important task inside the vicinity of clinical facts evaluation. Machine learning knowledge of has been proven to be effective in helping in making selections and predicting from the huge amount of facts produced by using the healthcare enterprise. on this paper, we advocate a unique technique that pursuits via finding good sized functions by means of applying ML strategies ensuing in improving the accuracy inside the prediction of heart ailment. The severity of the heart disease is classified primarily based on diverse methods like KNN, choice timber and so on. The prediction version is added with special combos of capabilities and several known classification techniques. We produce a stronger performance level with an accuracy level of a 100% through the prediction version for heart ailment with the Hybrid Random forest area with a linear model (HRFLM).

Download Full-text

Human Activity Recognition using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35694 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3553-3556

Author(s):

Chaudhari Shraddha

Keyword(s):

Machine Learning ◽

Random Forest ◽

Activity Recognition ◽

Human Activity ◽

Recognition System ◽

Human Activity Recognition ◽

Machine Learning Techniques ◽

K Nearest Neighbors ◽

Random Forest Classification ◽

Medical Health Care

Activity recognition in humans is one of the active challenges that find its application in numerous fields such as, medical health care, military, manufacturing, assistive techniques and gaming. Due to the advancements in technologies the usage of smartphones in human lives has become inevitable. The sensors in the smartphones help us to measure the essential vital parameters. These measured parameters enable us to monitor the activities of humans, which we call as human activity recognition. We have applied machine learning techniques on a publicly available dataset. K-Nearest Neighbors and Random Forest classification algorithms are applied. In this paper, we have designed and implemented an automatic human activity recognition system that independently recognizes the actions of the humans. This system is able to recognize the activities such as Laying, Sitting, Standing, Walking, Walking downstairs and Walking upstairs. The results obtained show that, the KNN and Random Forest Algorithms gives 90.22% and 92.70% respectively of overall accuracy in detecting the activities.

Download Full-text

Preliminary Screening of COVID-19 Infection Employing Machine Learning Techniques From Simple Blood Profile

International Journal of Quantitative Structure-Property Relationships ◽

10.4018/ijqspr.2021070103 ◽

2021 ◽

Vol 6 (3) ◽

pp. 35-47

Author(s):

Anirudh Reddy Cingireddy ◽

Robin Ghosh ◽

Supratik Kar ◽

Venkata Melapu ◽

Sravanthi Joginipeli ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Albert Einstein ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

Blood Profile ◽

Molecular Tests ◽

Large Populations

Frequent testing of the entire population would help to identify individuals with active COVID-19 and allow us to identify concealed carriers. Molecular tests, antigen tests, and antibody tests are being widely used to confirm COVID-19 in the population. Molecular tests such as the real-time reverse transcription-polymerase chain reaction (rRT-PCR) test will take a minimum of 3 hours to a maximum of 4 days for the results. The authors suggest using machine learning and data mining tools to filter large populations at a preliminary level to overcome this issue. The ML tools could reduce the testing population size by 20 to 30%. In this study, they have used a subset of features from full blood profile which are drawn from patients at Israelita Albert Einstein hospital located in Brazil. They used classification models, namely KNN, logistic regression, XGBooting, naive Bayes, decision tree, random forest, support vector machine, and multilayer perceptron with k-fold cross-validation, to validate the models. Naïve bayes, KNN, and random forest stand out as the most predictive ones with 88% accuracy each.

Download Full-text

Prediction of Fatal and Major Injury of Drivers, Cyclists, and Pedestrians in Collisions

PROMET - Traffic&Transportation ◽

10.7307/ptt.v32i1.3134 ◽

2020 ◽

Vol 32 (1) ◽

pp. 39-53

Author(s):

Dalia Shanshal ◽

Ceni Babaoglu ◽

Ayşe Başar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Injury Severity ◽

Predictive Analytics ◽

Machine Learning Techniques ◽

Lasso Regression ◽

Severe Injuries ◽

Factors Affecting ◽

Spatio Temporal ◽

Using Data

Traffic-related deaths and severe injuries may affect every person on the roads, whether driving, cycling or walking. Toronto, the largest city in Canada and the fourth largest in North America, aims to eliminate traffic-related fatalities and serious injuries on city streets. The aim of this study is to build a prediction model using data analytics and machine learning techniques that learn from past patterns, providing additional data-driven decision support for strategic planning. A detailed exploratory analysis is presented, investigating the relationship between the variables and factors affecting collisions in Toronto. A learning-based model is proposed to predict the fatalities and severe injuries in traffic collisions through a comparison of two predictive models: Lasso Regression and Random Forest. Exploratory data analysis results reveal both spatio-temporal and behavioural patterns such as the prevalence of collisions in intersections, in the spring and summer and aggressive driving and inattentive behaviours in drivers. The prediction results show that the best predictor of injury severity for drivers, cyclists and pedestrians is Random Forest with an accuracy of 0.80, 0.89, and 0.80, respectively. The proposed methods demonstrate the effectiveness of machine learning application to traffic and collision data, both for exploratory and predictive analytics.

Download Full-text

Classification study of solvation free energies of organic molecules using machine learning techniques

RSC Advances ◽

10.1039/c4ra07961b ◽

2014 ◽

Vol 4 (106) ◽

pp. 61624-61630 ◽

Cited By ~ 8

Author(s):

N. S. Hari Narayana Moorthy ◽

Silvia A. Martins ◽

Sergio F. Sousa ◽

Maria J. Ramos ◽

Pedro A. Fernandes

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Organic Molecules ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Models ◽

Free Energies ◽

Learning Techniques ◽

Solvation Free Energies

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.

Download Full-text

P31 Optimising trajectories in computer assisted planning for cranial laser interstitial thermal therapy: a machine learning approach

Journal of Neurology Neurosurgery & Psychiatry ◽

10.1136/jnnp-2019-abn.104 ◽

2019 ◽

Vol 90 (3) ◽

pp. e33.1-e33

Author(s):

K Li ◽

VN Vakharia ◽

R Sparks ◽

LGS França ◽

A McEvoy ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Hippocampal Sclerosis ◽

Thermal Therapy ◽

Machine Learning Techniques ◽

Target Point ◽

Computer Assisted ◽

Laser Interstitial Thermal Therapy ◽

Entry Points

ObjectivesOptimal trajectory planning for cranial laser interstitial thermal therapy (cLITT) in drug resistant focal mesial temporal lobe epilepsy (MTLE).DesignA composite ablation score of ablated AHC minus ablated PHG volumes were calculated and normalised. Random forest and linear regression were implemented to predict composite ablation scores and determine the optimal entry and target point combinations to maximize this.SubjectsTen patients with hippocampal sclerosis were included.MethodsComputer Assisted Planning (CAP) cLITT trajectories were generated using entry regions that include the inferior occipital gyri (IOG), middle occipital gyri (MOG), inferior temporal gyri (ITG) and middle temporal gyri (MTG). Target points were varied by sequential erosions and transformations of the centroid of the amygdala. In total 760 trajectory combinations were generated per patient and ablation volumes were calculated based on a conservative 15 mm maximum ablation diameter.ResultsLinear regression was superior to random forest predictions. Linear regression indicated that maximal composite ablation scores were associated with entry points that clustered around the junction of the IOG, MOG and MTG. The optimal target point was a translation of the centroid of the amygdala anteriorly and medially.ConclusionsMachine learning techniques accurately predict composite ablation scores with linear regression outperforming the random forest approach. Optimal CAP entry points for cLITT maximize ablation of the AHC and spare the PHG.

Download Full-text

Data efficient Random Forest model for avalanche forecasting

10.5194/nhess-2019-379 ◽

2019 ◽

Author(s):

Manesh Chawla ◽

Amreek Singh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Direct Reduction ◽

Random Forest Model ◽

Machine Learning Techniques ◽

Control Structures ◽

Avalanche Forecasting ◽

Forest Model ◽

Avalanche Flow ◽

Data Efficiency

Abstract. Fast downslope release of snow (avalanche) is a serious hazard to people living in snow bound mountains. Released snow mass can gain sufficient momentum on its down slope path to kill humans, uproot trees and rocks, destroy buildings. Direct reduction of avalanche threat is done by building control structures to add mechanical support to snowpack and reduce or deflect downward avalanche flow. On large terrains it is economically infeasible to use these methods on each high risk site.Therefore predicting and avoiding avalanches is the only feasible method to reduce threat but sufficient snow stability data for accurate forecasting is generally unavailable and difficult to collect. Forecasters infer snow stability from their knowledge of local weather, terrain and sparsely available snowpack observations. This inference process is vulnerable to human bias therefore machine learning models are used to find patterns from past data and generate helpful outputs to minimise and quantify uncertainty in forecasting process. These machine learning techniques require long past records of avalanches which are difficult to obtain. In this paper we propose a data efficient Random Forest model to address this problem. The model can generate a descriptive forecast showing reasoning and patterns which are difficult to observe manually. Our model advances the field by being inexpensive and convenient for operational forecasting due to its data efficiency, ease of automation and ability to describe its decisions.

Download Full-text

Predicting mortality in hemodialysis patients using machine learning analysis

Clinical Kidney Journal ◽

10.1093/ckj/sfaa126 ◽

2020 ◽

Author(s):

Victoria Garcia-Montemayor ◽

Alejandro Martin-Malo ◽

Carlo Barbieri ◽

Francesco Bellocchio ◽

Sagrario Soriano ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Prediction Models ◽

Initial Period ◽

Area Under The Curve ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Prediction Of Mortality ◽

Mortality Prediction Models

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.

Download Full-text