Predicting tile drainage discharge using machine learning algorithms

Abstract. Drainage systems can significantly improve the water management in agricultural fields. However, they may transport contaminants originating from fertilizers and pesticides and threaten ecosystems. Determining the quantity of drainage water is an important factor for constructed wetlands and other drainage mitigation techniques. This study was carried out in Denmark where tile drainage systems are implemented in more than half of the agricultural fields. The first aim of the study was to predict the annual discharge of tile drainage systems using machine-learning methods, which have been highly popular in recent years. The second objective was to assess the importance of the parameters and their impact on the predictions. Data from 53 drainage stations distributed in different regions of Denmark were collected and used for the analysis. The covariates contained 35 parameters including the calculated percolation and geographic variables such as drainage probability, clay content in different depth intervals, and elevation, all extracted from existing national maps. Random Forest and Cubist were selected as predictive models. Both models were trained on the dataset and used to predict yearly drainage discharge. Results highlighted the importance of the cross-validation methods and indicated that both Random Forest and Cubist can perform as predictive models with a low complexity and good correlation between predicted and observed discharge. Covariate importance analysis showed that among all of the used predictors, the percolation and elevation have the largest effect on the prediction of tile drainage discharge. This work opens up for a better understanding of the dynamics of tile drainage discharge and proves that machine-learning techniques can perform as predictive models in this specific concept. The developed models can be used in regard to a national mapping of expected tile drain discharge.

Download Full-text

Sport analytics for cricket game results using machine learning: An experimental study

Applied Computing and Informatics ◽

10.1016/j.aci.2019.11.006 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Kumash Kapadia ◽

Hussein Abdel-Jaber ◽

Fadi Thabtah ◽

Wael Hadi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Information Gain ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Technology ◽

Home Team ◽

Feature Sets ◽

Learning Techniques

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the betting market for IPL is growing significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivised to bet on the match results because it is a game that changes ball-by-ball. This paper investigates machine learning technology to deal with the problem of predicting cricket match results based on historical match data of the IPL. Influential features of the dataset have been identified using filter-based methods including Correlation-based Feature Selection, Information Gain (IG), ReliefF and Wrapper. More importantly, machine learning techniques including Naïve Bayes, Random Forest, K-Nearest Neighbour (KNN) and Model Trees (classification via regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-based methods. Two featured subsets were formulated, one based on home team advantage and other based on Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate predictive models.

Download Full-text

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

10.20944/preprints201906.0008.v1 ◽

2019 ◽

Author(s):

Marcos Ruiz-Álvarez ◽

Francisco Alonso-Sarría ◽

Francisco Gomariz-Castillo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Linear Regression ◽

Air Temperature ◽

Satellite Data ◽

Multivariate Linear Regression ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machine and Random Forest, are compared with Multivariate Linear Regression, TVX and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using four different statistics on a daily basis allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest with residual kriging produces the best results (R$^2$=0.612 $\pm$ 0.019, NSE=0.578 $\pm$ 0.025, RMSE=1.068 $\pm$ 0.027, PBIAS=-0.172 $\pm$ 0.046), whereas TVX produces the least accurate results. The environmental conditions in the study area are not really suited to TVX, moreover this method only takes into account satellite data. On the other hand, regression methods (Support Vector Machine, Random Forest and Multivariate Linear Regression) use several parameters that are easily calculated from a Digital Elevation Model, adding very little difficulty to the use of satellite data alone. The most important variables in the Random Forest Model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text

Machine Learning Algorithms For Understanding The Determinants of Under-Five Mortality

10.21203/rs.3.rs-1021040/v1 ◽

2021 ◽

Author(s):

Rakesh Kumar Saroj ◽

Pawan Kumar Yadav ◽

Rajneesh Singh ◽

Obvious Nchimunya Chilyabanyama

Keyword(s):

Machine Learning ◽

Random Forest ◽

Information Gain ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Mortality Data ◽

Mortality Factors ◽

Under Five ◽

Learning Techniques

Abstract Background: The death rate of under-five children in India declined last few decades, but few bigger states have poor performance. This is a matter of serious concern for the child's health as well as social development. Nowadays, machine learning techniques play a crucial role in the smart health care system to capture the hidden factors and patterns of outcomes. In this paper, we used machine learning techniques to predict the important factors of under-five mortality.This study aims to explore the importance of machine learning techniques to predict under-five mortality and to find the important factors that cause under-five mortality.The data was taken from the National Family Health Survey-IV of Uttar Pradesh. We used four machine learning techniques like decision tree, support vector machine, random forest, and logistic regression to predict under-five mortality factors and model accuracy of each model. We have also used information gain to rank to know the important variables for accurate predictions in under-five mortality data.Result: Random Forest (RF) predicts the child mortality factors with the highest accuracy of 97.5 %, and the number of living children, births in the last five years, educational level, birth order, total children ever born, currently breastfeeding, and size of child at birth that identifying as essential factors for under-five mortality.Conclusion: The study focuses on machine learning techniques to predict and identify important factors for under-five mortality. The random forest model provides an excellent predictive result for estimating the risk factors of under-five mortality. Based on the resulting outcome, policymakers can make policies and plans to reduce under-five mortality.

Download Full-text

Machine Learning Modeling of Horizontal Photovoltaics Using Weather and Location Data

Energies ◽

10.3390/en13102570 ◽

2020 ◽

Vol 13 (10) ◽

pp. 2570

Author(s):

Christil Pasion ◽

Torrey Wagner ◽

Clay Koschnick ◽

Steven Schuldt ◽

Jada Williams ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ambient Temperature ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Solar Panels ◽

Power Prediction ◽

Location Data ◽

Learning Techniques ◽

Prior Literature

Solar energy is a key renewable energy source; however, its intermittent nature and potential for use in distributed systems make power prediction an important aspect of grid integration. This research analyzed a variety of machine learning techniques to predict power output for horizontal solar panels using 14 months of data collected from 12 northern-hemisphere locations. We performed our data collection and analysis in the absence of irradiation data—an approach not commonly found in prior literature. Using latitude, month, hour, ambient temperature, pressure, humidity, wind speed, and cloud ceiling as independent variables, a distributed random forest regression algorithm modeled the combined dataset with an R2 value of 0.94. As a comparative measure, other machine learning algorithms resulted in R2 values of 0.50–0.94. Additionally, the data from each location was modeled separately with R2 values ranging from 0.91 to 0.97, indicating a range of consistency across all sites. Using an input variable permutation approach with the random forest algorithm, we found that the three most important variables for power prediction were ambient temperature, humidity, and cloud ceiling. The analysis showed that machine learning potentially allowed for accurate power prediction while avoiding the challenges associated with modeled irradiation data.

Download Full-text

Improving Heart Disease Prediction Using Random Forest and AdaBoost Algorithms

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i11.24781 ◽

2021 ◽

Vol 17 (11) ◽

pp. 60

Author(s):

Halima EL Hamdaoui ◽

Said Boujraf ◽

Nour El Houda Chaoui ◽

Badr Alami ◽

Mustapha Maaroufi

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Decision Support ◽

Random Forest ◽

Clinical Decision Support ◽

Clinical Decision ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Random Forest Algorithm ◽

Adaboost Algorithm

heart disease is a major cause of death worldwide. Thus, diagnosis and prediction of heart disease remain mandatory. Clinical decision support systems based on machine learning techniques have become the primary tool to assist clinicians and contribute to automated diagnosis. This paper aims to predict heart disease using Random Forest algorithm enhanced with the boosting algorithm Adaboost. The model is trained and tested on University of California Irvine (UCI) Cleveland and Statlog heart disease datasets using the most relevant features 14 attributes. The result shows that Random Forest algorithm combined with AdaBoost algorithm achieved higher accuracy than applying only Radom Forest algorithm, 96.16%, 95.98%, respectively. We compare our suggested model to report machine learning classifiers. Indeed, the obtained result is supporting the efficiency and validity of our model. Besides, the proposed model achieved high accuracy compared to existing studies in the literature that confirmed that a clinical decision support system could be used to predict heart disease based on machine learning algorithms.

Download Full-text

Heart Failure Detection Using Quantum-Enhanced Machine Learning and Traditional Machine Learning Techniques for Internet of Artificially Intelligent Medical Things

Wireless Communications and Mobile Computing ◽

10.1155/2021/1616725 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Yogesh Kumar ◽

Apeksha Koul ◽

Pushpendra Singh Sisodia ◽

Jana Shafi ◽

Verma Kavita ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Random Forest ◽

Learning Algorithms ◽

Failure Detection ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Research Progress ◽

Record Management

Quantum-enhanced machine learning plays a vital role in healthcare because of its robust application concerning current research scenarios, the growth of novel medical trials, patient information and record management, procurement of chronic disease detection, and many more. Due to this reason, the healthcare industry is applying quantum computing to sustain patient-oriented attention to healthcare patrons. The present work summarized the recent research progress in quantum-enhanced machine learning and its significance in heart failure detection on a dataset of 14 attributes. In this paper, the number of qubits in terms of the features of heart failure data is normalized by using min-max, PCA, and standard scalar, and further, has been optimized using the pipelining technique. The current work verifies that quantum-enhanced machine learning algorithms such as quantum random forest (QRF), quantum K nearest neighbour (QKNN), quantum decision tree (QDT), and quantum Gaussian Naïve Bayes (QGNB) are better than traditional machine learning algorithms in heart failure detection. The best accuracy rate is (0.89), which the quantum random forest classifier attained. In addition to this, the quantum random forest classifier also incurred the best results in F 1 score, recall and, precision by (0.88), (0.93), and (0.89), respectively. The computation time taken by traditional and quantum-enhanced machine learning algorithms has also been compared where the quantum random forest has the least execution time by 150 microseconds. Hence, the work provides a way to quantify the differences between standard and quantum-enhanced machine learning algorithms to select the optimal method for detecting heart failure.

Download Full-text

Real Time Efficient Accident Predictor System using Machine Learning Techniques (kNN, RF, LR, DT)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d6910.1210220 ◽

2020 ◽

Vol 10 (2) ◽

pp. 108-111

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Classification Accuracy ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Classification Methods ◽

K Nearest Neighbors ◽

Learning Techniques

Real time crash predictor system is determining frequency of crashes and also severity of crashes. Nowadays machine learning based methods are used to predict the total number of crashes. In this project, prediction accuracy of machine learning algorithms like Decision tree (DT), K-nearest neighbors (KNN), Random forest (RF), Logistic Regression (LR) are evaluated. Performance analysis of these classification methods are evaluated in terms of accuracy. Dataset included for this project is obtained from 49 states of US and 27 states of India which contains 2.25 million US accident crash records and 1.16 million crash records respectively. Results prove that classification accuracy obtained from Random Forest (RF) is96% compared to other classification methods.

Download Full-text

Feature Selection for Breast Cancer Detection using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8723.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 2080-2083

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Neural Networks ◽

Cancer Detection ◽

Predictive Models ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Cancer Type ◽

Wide Range ◽

Logistic Regression Algorithm

Cancer has been portrayed as a heterogeneous disease comprising of a wide range of subtypes. The early diagnosis of a cancer type is very important to determine the course of medical treatment required by the patient. The significance of classifying cancerous cells into benign or malignant has driven many research studies, in the biomedical and the bioinformatics field. In the past years researchers have been encouraged to use different machine learning (ML) techniques for cancer detection, as well as prediction of survivability and recurrence. What's more, ML instruments can be used to distinguish key highlights from complex datasets and uncover their significance. An assortment of these procedures, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Random Forest Methods (RVMs) and Decision Trees (DTs) has been usually used in cancer research for the development of predictive models, resulting in successful and exact decision making. Although it is obvious that the usage of machine learning techniques can enhance our comprehension of cancer detection, progression, recurrence and survivability, a proper level of accuracy is required for these strategies to be considered in the ordinary clinical practice. The predictive models talked about here depend on different administered ML strategies and on various input features and data samples. We have used Naïve-Bayes classifier, Neural Networks method, Decision Tree and Logistic Regression algorithm to detect the type of breast cancer (Benign or Malignant) and selection of features which are more relevant for prediction. We have made a comparative study to find out the best algorithm of the above four, for prediction of cancer type. With a high level of accuracy, any of these methods can be used to predict the type of breast cancer of any particular patient

Download Full-text

A Novel Approach for Detecting DGA-Based Botnets in DNS Queries Using Machine Learning Techniques

Journal of Computer Networks and Communications ◽

10.1155/2021/4767388 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ali Soleymani ◽

Fatemeh Arabgol

Keyword(s):

Machine Learning ◽

Random Forest ◽

Text Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Detection Accuracy ◽

Domain Name ◽

Botnet Detection ◽

Learning Techniques

In today’s security landscape, advanced threats are becoming increasingly difficult to detect as the pattern of attacks expands. Classical approaches that rely heavily on static matching, such as blacklisting or regular expression patterns, may be limited in flexibility or uncertainty in detecting malicious data in system data. This is where machine learning techniques can show their value and provide new insights and higher detection rates. The behavior of botnets that use domain-flux techniques to hide command and control channels was investigated in this research. The machine learning algorithm and text mining used to analyze the network DNS protocol and identify botnets were also described. For this purpose, extracted and labeled domain name datasets containing healthy and infected DGA botnet data were used. Data preprocessing techniques based on a text-mining approach were applied to explore domain name strings with n-gram analysis and PCA. Its performance is improved by extracting statistical features by principal component analysis. The performance of the proposed model has been evaluated using different classifiers of machine learning algorithms such as decision tree, support vector machine, random forest, and logistic regression. Experimental results show that the random forest algorithm can be used effectively in botnet detection and has the best botnet detection accuracy.

Download Full-text

Machine Learning (Neuronal Net, Random Forest, and C5.0 single decision tree) based on pXRF data as a tool to date sediment layers of the Nile Delta

10.5194/egusphere-egu21-15296 ◽

2021 ◽

Author(s):

Martin Seeliger ◽

Marina Altmeyer ◽

Andreas Ginau ◽

Robert Schiestl ◽

Jürgen Wunderlich

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Nile Delta ◽

Sediment Cores ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Surrounding Areas ◽

Sediment Layers

This paper presents the application of machine-learning techniques on pXRF data to establish a chronology for sediment cores around Tell Buto (Tell el-Fara&#180;in) in the northwestern Nile Delta. As modern laboratories for dating techniques like OSL or 14C are rare in Egypt and sample export is restricted, we are facing a lack of opportunities to create a robust chronology, which is indispensable in modern Geoarchaeology.Therefore, we present a new approach to transfer archaeological age information gained at the excavation at Buto to corings of the wider Buto area. Sediments of archaeological outcrops and pits with known age are measured using pXRF to create a geochemical &#8220;fingerprint&#8221; for several historic eras. Afterwards, these &#8220;fingerprints&#8221; are transferred to corings of the surrounding areas using machine-learning algorithms.This paper presents 1) the application of three different machine-learning approaches (Neuronal Net, Random Forest, and C5.0 decision tree) to check if archaeological age information can be transferred to sediments far off the settlement mounds using pXRF data, 2) the comparison of all approaches and the evaluation if the easily anticipated decision tree and Random Forest show similar results as the &#8220;black-box system&#8221; Neuronal Net, and finally, 3) a case study that provides the results of Altmeyer et al. (in review) for Kom el-Gir, a further settlement mound little north of Buto, with a chronostratigraphic framework based on this approach.Reference:Altmeyer, M., Seeliger, M., Ginau, A., Schiestl, R. & J. Wunderlich (in review):&#160; Reconstruction of former channel systems in the northwestern Nile Delta (Egypt) based on corings and electrical resistivity tomography (ERT). (Submitted to E & G Quaternary Science Journal).

Download Full-text