Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques

Address matching is a crucial step in geocoding; however, this step forms a bottleneck for geocoding accuracy, as precise input is the biggest challenge for establishing perfect matches. Matches still have to be established despite the inevitability of incorrect address inputs such as misspellings, abbreviations, informal and non-standard names, slangs, or coded terms. Thus, this study suggests an address geocoding system using machine learning to enhance the address matching implemented on street-based addresses. Three different kinds of machine learning methods are tested to find the best method showing the highest accuracy. The performance of address matching using machine learning models is compared to multiple text similarity metrics, which are generally used for the word matching. It was proved that extreme gradient boosting with the optimal hyper-parameters was the best machine learning method with the highest accuracy in the address matching process, and the accuracy of extreme gradient boosting outperformed similarity metrics when using training data or input data. The address matching process using machine learning achieved high accuracy and can be applied to any geocoding systems to precisely convert addresses into geographic coordinates for various research and applications, including car navigation.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

XGB-RF: A Hybrid Machine Learning Approach for IoT Intrusion Detection

Telecom ◽

10.3390/telecom3010003 ◽

2022 ◽

Vol 3 (1) ◽

pp. 52-69

Author(s):

Jabed Al Faysal ◽

Sk Tahmid Mostafa ◽

Jannatul Sultana Tamanna ◽

Khondoker Mirazul Mumenin ◽

Md. Mashrur Arifin ◽

...

Keyword(s):

Machine Learning ◽

Security And Privacy ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Network Systems ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Hybrid Machine ◽

Security Operations ◽

Iot Devices

In the past few years, Internet of Things (IoT) devices have evolved faster and the use of these devices is exceedingly increasing to make our daily activities easier than ever. However, numerous security flaws persist on IoT devices due to the fact that the majority of them lack the memory and computing resources necessary for adequate security operations. As a result, IoT devices are affected by a variety of attacks. A single attack on network systems or devices can lead to significant damages in data security and privacy. However, machine-learning techniques can be applied to detect IoT attacks. In this paper, a hybrid machine learning scheme called XGB-RF is proposed for detecting intrusion attacks. The proposed hybrid method was applied to the N-BaIoT dataset containing hazardous botnet attacks. Random forest (RF) was used for the feature selection and eXtreme Gradient Boosting (XGB) classifier was used to detect different types of attacks on IoT environments. The performance of the proposed XGB-RF scheme is evaluated based on several evaluation metrics and demonstrates that the model successfully detects 99.94% of the attacks. After comparing it with state-of-the-art algorithms, our proposed model has achieved better performance for every metric. As the proposed scheme is capable of detecting botnet attacks effectively, it can significantly contribute to reducing the security concerns associated with IoT systems.

Download Full-text

Intercomparing the robustness of machine learning models in simulation and forecasting of streamflow

Journal of Water and Climate Change ◽

10.2166/wcc.2020.365 ◽

2020 ◽

Author(s):

Parthiban Loganathan ◽

Amit Baburao Mahindrakar

Keyword(s):

Machine Learning ◽

Large Scale ◽

Resource Planning ◽

Machine Learning Techniques ◽

Low Flow ◽

Gradient Boosting ◽

Daily Streamflow ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Hydrological Indices

Abstract The intercomparison of streamflow simulation and the prediction of discharge using various renowned machine learning techniques were performed. The daily streamflow discharge model was developed for 35 observation stations located in a large-scale river basin named Cauvery. Various hydrological indices were calculated for observed and predicted discharges for comparing and evaluating the replicability of local hydrological conditions. The model variance and bias observed from the proposed extreme gradient boosting decision tree model were less than 15%, which is compared with other machine learning techniques considered in this study. The model Nash–Sutcliffe efficiency and coefficient of determination values are above 0.7 for both the training and testing phases which demonstrate the effectiveness of model performance. The comparison of monthly observed and model-predicted discharges during the validation period illustrates the model's ability in representing the peaks and fall in high-, medium-, and low-flow zones. The assessment and comparison of hydrological indices between observed and predicted discharges illustrate the model's ability in representing the baseflow, high-spell, and low-spell statistics. Simulating streamflow and predicting discharge are essential for water resource planning and management, especially in large-scale river basins. The proposed machine learning technique demonstrates significant improvement in model efficiency by dropping variance and bias which, in turn, improves the replicability of local-scale hydrology.

Download Full-text

Bankruptcy Prediction Using Machine Learning Techniques

Journal of Risk and Financial Management ◽

10.3390/jrfm15010035 ◽

2022 ◽

Vol 15 (1) ◽

pp. 35

Author(s):

Shekar Shetty ◽

Mohamed Musa ◽

Xavier Brédart

Keyword(s):

Machine Learning ◽

Small And Medium Enterprises ◽

Bankruptcy Prediction ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Global Accuracy ◽

Medium Enterprises

In this study, we apply several advanced machine learning techniques including extreme gradient boosting (XGBoost), support vector machine (SVM), and a deep neural network to predict bankruptcy using easily obtainable financial data of 3728 Belgian Small and Medium Enterprises (SME’s) during the period 2002–2012. Using the above-mentioned machine learning techniques, we predict bankruptcies with a global accuracy of 82–83% using only three easily obtainable financial ratios: the return on assets, the current ratio, and the solvency ratio. While the prediction accuracy is similar to several previous models in the literature, our model is very simple to implement and represents an accurate and user-friendly tool to discriminate between bankrupt and non-bankrupt firms.

Download Full-text

Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh

Information ◽

10.3390/info12120514 ◽

2021 ◽

Vol 12 (12) ◽

pp. 514

Author(s):

Fahad Rahman Amik ◽

Akash Lanard ◽

Ahnaf Ismat ◽

Sifat Momen

Keyword(s):

Machine Learning ◽

Web Application ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Good Prediction ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Exploratory Data ◽

Forecasting System ◽

Selection Operator

Pre-owned cars (i.e., cars with one or more previous retail owners) are extremely popular in Bangladesh. Customers who plan to purchase a pre-owned car often struggle to find a car within a budget as well as to predict the price of a particular pre-owned car. Currently, Bangladesh lacks online services that can provide assistance to customers purchasing pre-owned cars. A good prediction of prices of pre-owned cars can help customers greatly in making an informed decision about buying a pre-owned car. In this article, we look into this problem and develop a forecasting system (using machine learning techniques) that helps a potential buyer to estimate the price of a pre-owned car he is interested in. A dataset is collected and pre-processed. Exploratory data analysis has been performed. Following that, various machine learning regression algorithms, including linear regression, LASSO (Least Absolute Shrinkage and Selection Operator) regression, decision tree, random forest, and extreme gradient boosting have been applied. After evaluating the performance of each method, the best-performing model (XGBoost) was chosen. This model is capable of properly predicting prices more than 91% of the time. Finally, the model has been deployed as a web application in a local machine so that this can be later made available to end users.

Download Full-text

Machine Learning Methods to Predict Social Media Disaster Rumor Refuters

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16081452 ◽

2019 ◽

Vol 16 (8) ◽

pp. 1452 ◽

Cited By ~ 4

Author(s):

Shihang Wang ◽

Zongmin Li ◽

Yuhong Wang ◽

Qi Zhang

Keyword(s):

Machine Learning ◽

Language Processing ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Short Text ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Effective Decision Support ◽

Short Text Similarity

This research provides a general methodology for distinguishing disaster-related anti-rumor spreaders from a non-ignorant population base, with strong connections in their social circle. Several important influencing factors are examined and illustrated. User information from the most recent posted microblog content of 3793 Sina Weibo users was collected. Natural language processing (NLP) was used for the sentiment and short text similarity analyses, and four machine learning techniques, i.e., logistic regression (LR), support vector machines (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were compared on different rumor refuting microblogs; after which a valid and robust distinguishing XGBoost model was trained and validated to predict who would retweet disaster-related rumor refuting microblogs. Compared with traditional prediction variables that only access user information, the similarity and sentiment analyses of the most recent user microblog contents were found to significantly improve prediction precision and robustness. The number of user microblogs also proved to be a valuable reference for all samples during the prediction process. This prediction methodology could be possibly more useful for WeChat or Facebook as these have relatively stable closed-loop communication channels, which means that rumors are more likely to be refuted by acquaintances. Therefore, the methodology is going to be further optimized and validated on WeChat-like channels in the future. The novel rumor refuting approach presented in this research harnessed NLP for the user microblog content analysis and then used the analysis results of NLP as additional prediction variables to identify the anti-rumor spreaders. Therefore, compared to previous studies, this study presents a new and effective decision support for rumor countermeasures.

Download Full-text

Train Delay Estimation in Indian Railways by Including Weather Factors through Machine Learning Techniques

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666190912095739 ◽

2019 ◽

Vol 13 ◽

Author(s):

Mohd Arshad ◽

Muqeem Ahmed

Keyword(s):

Machine Learning ◽

Accurate Method ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Weather Factors ◽

Data Set ◽

Training Time ◽

Machine Learning Methods ◽

Learning Techniques ◽

Proposed Model

Railways system is facing one of the biggest challenges to prevent train delay in all over the world. Categorically in India, It is far worst problem other developing countries due to increasing the number of passengers and poor update in previous system. According to the TOI newspaper, around 25.3 million people were used to travel by train in 2006 in India which drastically increased year by year[1]. In the proposed model, we used 4 different machine learning methods (linear regression, Gradient Boosting Regressor(GBR), Decision Tree and Random Forest) which have been compared with different settings to find the most accurate method. To compare different methods, we consider training time and accuracy of the method over the test data set. Trains in India get delayed frequently, and if we can predict this in advance - it would be a great help for the passengers to plan their journey according to their works. The aim of this paper is to present the prediction of Train delay in Indian Railways through machine learning techniques to achieve higher accuracy.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Predictive modeling for peri-implantitis by using machine learning techniques

Scientific Reports ◽

10.1038/s41598-021-90642-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tomoaki Mameno ◽

Masahiro Wada ◽

Kazunori Nozaki ◽

Toshihito Takahashi ◽

Yoshitaka Tsujioka ◽

...

Keyword(s):

Machine Learning ◽

Demographic Data ◽

Risk Indicators ◽

Machine Learning Techniques ◽

Support Vector ◽

Machine Learning Methods ◽

Complex Interactions ◽

Learning Techniques ◽

Increased Risk ◽

Vector Machines

AbstractThe purpose of this retrospective cohort study was to create a model for predicting the onset of peri-implantitis by using machine learning methods and to clarify interactions between risk indicators. This study evaluated 254 implants, 127 with and 127 without peri-implantitis, from among 1408 implants with at least 4 years in function. Demographic data and parameters known to be risk factors for the development of peri-implantitis were analyzed with three models: logistic regression, support vector machines, and random forests (RF). As the results, RF had the highest performance in predicting the onset of peri-implantitis (AUC: 0.71, accuracy: 0.70, precision: 0.72, recall: 0.66, and f1-score: 0.69). The factor that had the most influence on prediction was implant functional time, followed by oral hygiene. In addition, PCR of more than 50% to 60%, smoking more than 3 cigarettes/day, KMW less than 2 mm, and the presence of less than two occlusal supports tended to be associated with an increased risk of peri-implantitis. Moreover, these risk indicators were not independent and had complex effects on each other. The results of this study suggest that peri-implantitis onset was predicted in 70% of cases, by RF which allows consideration of nonlinear relational data with complex interactions.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text