Monthly rainfall hind-cast using machine learning  algorithms for Coimbatore, Tamil Nadu

V. GEETHALAKSHMI; S. KOKILAVANI; S.P. RAMANATHAN; GA. DHEEBAKARAN; N.K. SATHYAMOORTHY; N. MARAGATHAM

doi:10.54302/mausam.v73i1.5077

Monthly rainfall hind-cast using machine learning algorithms for Coimbatore, Tamil Nadu

MAUSAM ◽

10.54302/mausam.v73i1.5077 ◽

2022 ◽

Vol 73 (1) ◽

pp. 19-26

Author(s):

V. GEETHALAKSHMI ◽

S. KOKILAVANI ◽

S.P. RAMANATHAN ◽

GA. DHEEBAKARAN ◽

N.K. SATHYAMOORTHY ◽

...

Keyword(s):

Machine Learning ◽

Tamil Nadu ◽

Southern Oscillation ◽

Global Climate ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Monthly Rainfall ◽

Research Centre ◽

Gradient Boosting ◽

Forecast Performance

Due to current world climate change, the accuracy of predicting rainfall is critical. This paper presents an approach using four different machine learning algorithms, viz., Decision Tree Regression (DTR), Gradient Boosting (GB), Ada Boost (AB) and Random Forest Regression (RFR) techniques to improve the rainfall forecast performance. When historical events are entered into the model and get validated to realise how well the output suits the known results referred as Hind-cast. Historical monthly weather parameters over a period of 42 years (1976 to 2017) were collected from Agro Climate Research Centre, Tamil Nadu Agricultural University. The global climate driver’s viz., Southern Oscillation Index and Indian Ocean Dipole indices were retrieved from Bureau of Meteorology, Australia. K- means algorithm was employed for centroid identification (which select the rows with unique distinguished features) at 90 per cent of the original data for the period of 42 years by eliminating the redundancy nature of the datawhich were used as training set. The result indicated the supremacy and notable strength of RFR over the other algorithms in terms of performance with 89.2 per cent. The Co-efficient of Determination (R2) for the predicted and observed values was found to be 0.8 for the monthly rainfall from 2015 to 2017.

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Bulk Processing of Multi-Temporal Modis Data, Statistical Analyses and Machine Learning Algorithms to Understand Climate Variables in the Indian Himalayan Region

Sensors ◽

10.3390/s21217416 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7416

Author(s):

Mohd Anul Haq ◽

Prashant Baral ◽

Shivaprakash Yaragal ◽

Biswajeet Pradhan

Keyword(s):

Machine Learning ◽

Global Climate ◽

Learning Algorithms ◽

Remotely Sensed ◽

Machine Learning Algorithms ◽

Himalayan Region ◽

Lapse Rate ◽

Climate Data ◽

Modis Data ◽

Bulk Processing

Studies relating to trends of vegetation, snowfall and temperature in the north-western Himalayan region of India are generally focused on specific areas. Therefore, a proper understanding of regional changes in climate parameters over large time periods is generally absent, which increases the complexity of making appropriate conclusions related to climate change-induced effects in the Himalayan region. This study provides a broad overview of changes in patterns of vegetation, snow covers and temperature in Uttarakhand state of India through bulk processing of remotely sensed Moderate Resolution Imaging Spectroradiometer (MODIS) data, meteorological records and simulated global climate data. Additionally, regression using machine learning algorithms such as Support Vectors and Long Short-term Memory (LSTM) network is carried out to check the possibility of predicting these environmental variables. Results from 17 years of data show an increasing trend of snow-covered areas during pre-monsoon and decreasing vegetation covers during monsoon since 2001. Solar radiation and cloud cover largely control the lapse rate variations. Mean MODIS-derived land surface temperature (LST) observations are in close agreement with global climate data. Future studies focused on climate trends and environmental parameters in Uttarakhand could fairly rely upon the remotely sensed measurements and simulated climate data for the region.

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text

Data Analytics for Monitoring the Satisfactory Parameters of Airline Passengers using Machine Learning Algorithms in Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8677.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1231-1235

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Complex Information ◽

Huge Data ◽

Gradient Boosting Machine ◽

Airline Passengers ◽

Effective Representation

An effective representation by machine learning algorithms is to obtain the results especially in Big Data, there are numerous applications can produce outcome, whereas a Random Forest Algorithm (RF) Gradient Boosting Machine (GBM), Decision tree (DT) in Python will able to give the higher accuracy in regard with classifying various parameters of Airliner Passengers satisfactory levels. The complex information of airline passengers has provided huge data for interpretation through different parameters of satisfaction that contains large information in quantity wise. An algorithm has to support in classifying these data’s with accuracies. As a result some of the methods may provide less precision and there is an opportunity of information cancellation and furthermore information missing utilizing conventional techniques. Subsequently RF and GBM used to conquer the unpredictability and exactness about the information provided. The aim of this study is to identify an Algorithm which is suitable for classifying the satisfactory level of airline passengers with data analytics using python by knowing the output. The optimization and Implementation of independent variables by training and testing for accuracy in python platform determined the variation between the each parameters and also recognized RF and GBM as a better algorithm in comparison with other classifying algorithms.

Download Full-text

A Method for Identifying Midlatitude Mesoscale Convective Systems in Radar Mosaics. Part I: Segmentation and Classification

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-17-0293.1 ◽

2018 ◽

Vol 57 (7) ◽

pp. 1575-1598 ◽

Cited By ~ 15

Author(s):

Alex M. Haberlie ◽

Walker S. Ashley

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Probability Of Detection ◽

Machine Learning Algorithms ◽

Mesoscale Convective Systems ◽

Gradient Boosting ◽

Moist Convection ◽

Probabilistic Prediction ◽

Convective Systems ◽

Mesoscale Convective

AbstractThis research evaluates the ability of image-processing and select machine-learning algorithms to identify midlatitude mesoscale convective systems (MCSs) in radar-reflectivity images for the conterminous United States. The process used in this study is composed of two parts: segmentation and classification. Segmentation is performed by identifying contiguous or semicontiguous regions of deep, moist convection that are organized on a horizontal scale of at least 100 km. The second part, classification, is performed by first compiling a database of thousands of precipitation clusters and then subjectively assigning each sample one of the following labels: 1) midlatitude MCS, 2) unorganized convective cluster, 3) tropical system, 4) synoptic system, or 5) ground clutter and/or noise. The attributes of each sample, along with their assigned label, are used to train three machine-learning algorithms: random forest, gradient boosting, and “XGBoost.” Results using a testing dataset suggest that the algorithms can distinguish between MCS and non-MCS samples with a high probability of detection and low probability of false detection. Further, the trained algorithm predictions are well calibrated, allowing reliable probabilistic classification. The utility of this two-step procedure is illustrated by generating spatial frequency maps of automatically identified precipitation clusters that are stratified by using various reflectivity and probabilistic prediction thresholds. These results suggest that machine learning can add value by limiting the amount of false-positive (non-MCS) samples that are not removed by segmentation alone.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia

Public Health Nutrition ◽

10.1017/s1368980021004262 ◽

2021 ◽

pp. 1-29

Author(s):

Fikrewold H. Bitew ◽

Corey S. Sparks ◽

Samuel H. Nyarko

Keyword(s):

Machine Learning ◽

Linear Models ◽

Learning Algorithms ◽

Public Health Problem ◽

Water Source ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Global Public Health ◽

Prediction Ability ◽

Extreme Gradient Boosting

Abstract Objective: Child undernutrition is a global public health problem with serious implications. In this study, estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. Design: This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five machine learning algorithms including eXtreme gradient boosting (xgbTree), k-nearest neighbors (K-NN), random forest (RF), neural network (NNet), and the generalized linear models (GLM) were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. Setting: Households in Ethiopia. Participants: A total of 9,471 children below five years of age. Results: The descriptive results show substantial regional variations in child stunting, wasting, and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalized linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anemia history, child age greater than 30 months, small birth size, and maternal underweight, among others. Conclusions: The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security, and fertility regulation among others in the quest to considerably improve childhood nutrition in Ethiopia.

Download Full-text

Gene Expression Assay: A New Panel for Early Metastatic Risk Estimation for Breast Cancer

10.21203/rs.3.rs-279461/v1 ◽

2021 ◽

Author(s):

Melih Agraz ◽

Umut Agyuz ◽

E. Celeste Welch ◽

Kaymaz Yasin ◽

Kuyumcu Birol

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

T Cell ◽

Zinc Finger ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cytotoxic T Cell ◽

Gradient Boosting ◽

Shh Pathway

Abstract Background Metastasis is one of the most challenging problems in cancer diagnosis and treatment, as its causes have not been yet well characterized. Prediction of the metastatic status of breast cancer is important in cancer research because it has the potential to save lives. However, the systems biology behind metastasis is complex and driven by a variety of factors beyond those that have already been characterized for various cancer types. Furthermore, prediction of cancer metastasis is a challenging task due to the variation in parameters and conditions specific to individual patients and mutation of the sub-types. Results In this paper, we apply tree-based machine learning algorithms for gene expression data analysis in the estimation of metastatic potentials within a group of 490 breast cancer patients. Hence, we utilize tree-based machine learning algorithms, decision trees, gradient boosting, and extremely randomized trees to assess the variable importance.Conclusions We obtained highly accurate values from all three algorithms, we observed the highest accuracy from the Gradient Boost method which is 0.8901. Finally, we were able to determine the 10 most important genetic variables used in the boosted algorithms, as well as their respective importance scores and biological importance. Common important genes for our algorithms are found as CD8, PB1, THP-1. CD8, also known as CD8A is a receptor for the TCR, or T-cell receptor, which facilitates cytotoxic T-cell activity and its association with cancer is defined in the paper. PB1, PBRM1 or polybromo 1 is a tumor suppressor gene. THP-1 or GLI2 is a zinc finger protein referred to as ”Glioma-Associated Oncogene Family Zinc Finger 2”. This gene encodes a protein for the zinc finger, which binds DNA and mediate Sonic hedgehog signaling (SHH). Disruption in the SHH pathway have long been associated with cancer and cellular proliferation.

Download Full-text