Prediction of Geopolymer Concrete Compressive Strength Using Novel Machine Learning Algorithms

The innovation of geopolymer concrete (GPC) plays a vital role not only in reducing the environmental threat but also as an exceptional material for sustainable development. The application of supervised machine learning (ML) algorithms to forecast the mechanical properties of concrete also has a significant role in developing the innovative environment in the field of civil engineering. This study was based on the use of the artificial neural network (ANN), boosting, and AdaBoost ML approaches, based on the python coding to predict the compressive strength (CS) of high calcium fly-ash-based GPC. The performance comparison of both the employed techniques in terms of prediction reveals that the ensemble ML approaches, AdaBoost, and boosting were more effective than the individual ML technique (ANN). The boosting indicates the highest value of R2 equals 0.96, and AdaBoost gives 0.93, while the ANN model was less accurate, indicating the coefficient of determination value equals 0.87. The lesser values of the errors, MAE, MSE, and RMSE of the boosting technique give 1.69 MPa, 4.16 MPa, and 2.04 MPa, respectively, indicating the high accuracy of the boosting algorithm. However, the statistical check of the errors (MAE, MSE, RMSE) and k-fold cross-validation method confirms the high precision of the boosting technique. In addition, the sensitivity analysis was also introduced to evaluate the contribution level of the input parameters towards the prediction of CS of GPC. The better accuracy can be achieved by incorporating other ensemble ML techniques such as AdaBoost, bagging, and gradient boosting.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions

Genes ◽

10.3390/genes11090985 ◽

2020 ◽

Vol 11 (9) ◽

pp. 985 ◽

Cited By ~ 2

Author(s):

Thomas Vanhaeren ◽

Federico Divina ◽

Miguel García-Torres ◽

Francisco Gómez-Vela ◽

Wim Vanhoof ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Long Range ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Other ◽

Supervised Machine Learning ◽

Chromatin Interaction ◽

Gradient Boosting ◽

Chromatin Interactions

The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.

Download Full-text

Effective Parameter Optimization & Classification using Bat-Inspired Algorithm with Improving NSSA

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1498.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3343-3349

Keyword(s):

Machine Learning ◽

Optimal Parameter ◽

Personal Information ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Security Measures ◽

End User ◽

Effective Strategies ◽

Made In

Network Security is an important aspectin communication-related activities. In recent times, the advent of more sophisticated technologies changed the way the information is being sharedwith everyone in any part of the world.Concurrently, these advancements are mishandled to compromise the end-user devices intentionally to steal their personal information. The number of attacks made on targeted devices is increasing over time. Even though the security mechanisms used to defend the network is enhanced and kept updated periodically, new advanced methods are developed by the intruders to penetrate the system. In order to avoid these discrepancies, effective strategies must be applied to enhance the security measures in the network. In this paper, a machine learning-based approach is proposed to identify the pattern of different categories of attacks made in the past. KDD cup 1999 dataset is accessed to develop this predictive model. Bat optimization algorithm identifies the optimal parameter subset. Supervised machine learning algorithms were employed to train the model from the data to make predictions. The performance of the system is evaluated through evaluation metrics like accuracy, precision and so on. Four classification algorithms were used out of which, gradient boosting model outperformed the benchmarked algorithms and proved its importance on data classification based on the accuracy obtained from this model.

Download Full-text

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features, and tree-rings

10.21203/rs.3.rs-303081/v1 ◽

2021 ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Tree Rings ◽

Test Site ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Growing Seasons ◽

Extreme Gradient Boosting ◽

Streamflow Modeling

Abstract Monitoring temporal variation of streamflow is necessary for many water resources management plans, yet, such practices are constrained by the absence or paucity of data in many rivers around the world. Using a permanent river in the north of Iran as a test site, a machine learning framework was proposed to model the streamflow data in the three periods of growing seasons based on tree-rings and vessel features of the Zelkova carpinifolia species. First, full-disc samples were taken from 30 trees near the river, and the samples went through preprocessing, cross-dating, standardization, and time series analysis. Two machine learning algorithms, namely random forest (RF) and extreme gradient boosting (XGB), were used to model the relationships between dendrochronology variables (tree-rings and vessel features in the three periods of growing seasons) and the corresponding streamflow rates. The performance of each model was evaluated using statistical coefficients (coefficient of determination (R-squared), Nash-Sutcliffe efficiency (NSE), and root-mean-square error (NRMSE)). Findings demonstrate that consideration should be given to the XGB model in streamflow modeling given its apparent enhanced performance (R-squared: 0.87; NSE: 0.81; and NRMSE: 0.43) over the RF model (R-squared: 0.82; NSE: 0.71; and NRMSE: 0.52). Further, the results showed that the models perform better in modeling the normal and low flows compared to extremely high flows. Finally, the tested models were used to reconstruct the temporal streamflow during the past decades (1970–1981).

Download Full-text

Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

Quantitative Science Studies ◽

10.1162/qss_a_00106 ◽

2020 ◽

pp. 1-26

Author(s):

Joshua Eykens ◽

Raf Guns ◽

Tim C.E. Engels

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Social Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Fine Grained ◽

Textual Data

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.

Download Full-text

A comparative study of supervised machine learning algorithms for the prediction of long-range chromatin interactions

10.1101/2020.06.09.141473 ◽

2020 ◽

Author(s):

Thomas Vanhaeren ◽

Federico Divina ◽

Miguel García-Torres ◽

Francisco Gómez-Vela ◽

Wim Vanhoof ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Long Range ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Other ◽

Supervised Machine Learning ◽

Chromatin Interaction ◽

Gradient Boosting ◽

Chromatin Interactions

AbstractThe role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model chromatin interactions in two human cell lines and evaluate the prediction performance of 5 popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines and multi-layer perceptron. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other four algorithms, yielding accuracies of ~ 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring.

Download Full-text

Application of Gradient Boosting Machine Learning Algorithms to Predict Uniaxial Compressive Strength of Soft Sedimentary Rocks at Thar Coalfield

Advances in Civil Engineering ◽

10.1155/2021/2565488 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

Niaz Muhammad Shahani ◽

Muhammad Kamran ◽

Xigui Zheng ◽

Cancan Liu ◽

Xiaowei Guo

Keyword(s):

Machine Learning ◽

Compressive Strength ◽

Learning Algorithms ◽

Sedimentary Rocks ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Training Phase ◽

Mean Square ◽

Gradient Boosting Machine ◽

Thar Coalfield

The uniaxial compressive strength (UCS) of rock is one of the essential data in engineering planning and design. Correctly testing UCS of rock to ensure its accuracy and authenticity is a prerequisite for assuring the design of any rock engineering project. UCS of rock has a broad range of applications in mining, geotechnical, petroleum, geomechanics, and other fields of engineering. The application of the gradient boosting machine learning algorithms has been rarely used, especially for UCS prediction, and has performed well, based on the relevant literature of the study. In this study, four gradient boosting machine learning algorithms, namely, gradient boosted regression (GBR), Catboost, light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost), were developed to predict the UCS in MPa of soft sedimentary rocks of the Block-IX at Thar Coalfield, Pakistan, using four input variables such as wet density (ρw) in g/cm3; moisture in %; dry density (ρd) in g/cm3; and Brazilian tensile strength (BTS) in MPa. Then, 106-point dataset was allocated identically for each algorithm into 70% for the training phase and 30% for the testing phase. According to the results, the XGBoost algorithm outperformed the GBR, Catboost, and LightGBM with coefficient of correlation (R2) = 0.99, mean absolute error (MAE) = 0.00062, mean square error (MSE) = 0.0000006, and root mean square error (RMSE) = 0.00079 in the training phase and R2 = 0.99, MAE = 0.00054, MSE = 0.0000005, and RMSE = 0.00069 in the testing phase. The sensitivity analysis showed that BTS and ρw are positively correlated, and the moisture and ρd are negatively correlated with the UCS. Therefore, in this study, the XGBoost algorithm was shown to be the most accurate algorithm among all the investigated four algorithms for UCS prediction of soft sedimentary rocks of the Block-IX at Thar Coalfield, Pakistan.

Download Full-text

Surface Shortwave Net Radiation Estimation from Landsat TM/ETM+ Data Using Four Machine Learning Algorithms

Remote Sensing ◽

10.3390/rs11232847 ◽

2019 ◽

Vol 11 (23) ◽

pp. 2847 ◽

Cited By ~ 3

Author(s):

Yezhe Wang ◽

Bo Jiang ◽

Shunlin Liang ◽

Dongdong Wang ◽

Tao He ◽

...

Keyword(s):

Machine Learning ◽

Land Surface ◽

Thematic Mapper ◽

Global Model ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Net Radiation

Surface shortwave net radiation (SSNR) flux is essential for the determination of the radiation energy balance between the atmosphere and the Earth’s surface. The satellite-derived intermediate SSNR data are strongly needed to bridge the gap between existing coarse-resolution SSNR products and point-based measurements. In this study, four different machine learning (ML) algorithms were tested to estimate the SSNR from the Landsat Thematic Mapper (TM)/ Enhanced Thematic Mapper Plus (ETM+) top-of-atmosphere (TOA) reflectance and other ancillary information (i.e., clearness index, water vapor) at instantaneous and daily scales under all sky conditions. The four ML algorithms include the multivariate adaptive regression splines (MARS), backpropagation neural network (BPNN), support vector regression (SVR), and gradient boosting regression tree (GBRT). Collected in-situ measurements were used to train the global model (using all data) and the conditional models (in which all data were divided into subsets and the models were fitted separately). The validation results indicated that the GBRT-based global model (GGM) performs the best at both the instantaneous and daily scales. For example, the GGM based on the TM data yielded a coefficient of determination value (R2) of 0.88 and 0.94, an average root mean square error (RMSE) of 73.23 W∙m-2 (15.09%) and 18.76 W·m-2 (11.2%), and a bias of 0.64 W·m-2 and –1.74 W·m-2 for instantaneous and daily SSNR, respectively. Compared to the Global LAnd Surface Satellite (GLASS) daily SSNR product, the daily TM-SSNR showed a very similar spatial distribution but with more details. Further analysis also demonstrated the robustness of the GGM for various land cover types, elevation, general atmospheric conditions, and seasons

Download Full-text

Machine learning associated with respiratory oscillometry: a computer-aided diagnosis system for the detection of respiratory abnormalities in systemic sclerosis

BioMedical Engineering OnLine ◽

10.1186/s12938-021-00865-9 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Domingos S. M. Andrade ◽

Luigi Maciel Ribeiro ◽

Agnaldo J. Lopes ◽

Jorge L. M. Amaral ◽

Pedro L. Melo

Keyword(s):

Machine Learning ◽

Systemic Sclerosis ◽

Diagnostic Accuracy ◽

Group Versus ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Control Group ◽

Extreme Gradient Boosting

Abstract Introduction The use of machine learning (ML) methods would improve the diagnosis of respiratory changes in systemic sclerosis (SSc). This paper evaluates the performance of several ML algorithms associated with the respiratory oscillometry analysis to aid in the diagnostic of respiratory changes in SSc. We also find out the best configuration for this task. Methods Oscillometric and spirometric exams were performed in 82 individuals, including controls (n = 30) and patients with systemic sclerosis with normal (n = 22) and abnormal (n = 30) spirometry. Multiple instance classifiers and different supervised machine learning techniques were investigated, including k-Nearest Neighbors (KNN), Random Forests (RF), AdaBoost with decision trees (ADAB), and Extreme Gradient Boosting (XGB). Results and discussion The first experiment of this study showed that the best oscillometric parameter (BOP) was dynamic compliance, which provided moderate accuracy (AUC = 0.77) in the scenario control group versus patients with sclerosis and normal spirometry (CGvsPSNS). In the scenario control group versus patients with sclerosis and altered spirometry (CGvsPSAS), the BOP obtained high accuracy (AUC = 0.94). In the second experiment, the ML techniques were used. In CGvsPSNS, KNN achieved the best result (AUC = 0.90), significantly improving the accuracy in comparison with the BOP (p < 0.01), while in CGvsPSAS, RF obtained the best results (AUC = 0.97), also significantly improving the diagnostic accuracy (p < 0.05). In the third, fourth, fifth, and sixth experiments, different feature selection techniques allowed us to spot the best oscillometric parameters. They resulted in a small increase in diagnostic accuracy in CGvsPSNS (respectively, 0.87, 0.86, 0.82, and 0.84), while in the CGvsPSAS, the best classifier's performance remained the same (AUC = 0.97). Conclusions Oscillometric principles combined with machine learning algorithms provide a new method for diagnosing respiratory changes in patients with systemic sclerosis. The present study's findings provide evidence that this combination may help in the early diagnosis of respiratory changes in these patients.

Download Full-text

Classification of Lidar Measurements Using Supervised and Unsupervised Machine Learning Methods

10.5194/amt-2019-495 ◽

2020 ◽

Author(s):

Ghazal Farhani ◽

Robert J. Sica ◽

Mark Joseph Daley

Keyword(s):

Machine Learning ◽

Ad Hoc ◽

Signal To Noise Ratio ◽

Stratospheric Aerosol ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Lidar Measurements

Abstract. While it is relatively straightforward to automate the processing of lidar signals, it is more difficult to choose periods of "good" measurements to process. Groups use various ad hoc procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures (e.g. Wing et al., 2018) to perform a task which is easy to train humans to perform but is time consuming. Here, we use machine learning techniques to train the machine to sort the measurements before processing. The presented methods is generic and can be applied to most lidars. We test the techniques using measurements from the Purple Crow Lidar (PCL) system located in London, Canada. The PCL has over 200,000 raw scans in Rayleigh and Raman channels available for classification. We classify raw (level-0) lidar measurements as "clear" sky scans with strong lidar returns, "bad" scans, and scans which are significantly influenced by clouds or aerosol loads. We examined different supervised machine learning algorithms including the random forest, the support vector machine, and the gradient boosting trees, all of which can successfully classify scans. The algorithms where trained using about 1500 scans for each PCL channel, selected randomly from different nights of measurements in different years. The success rate of identification, for all the channels is above 95 %. We also used the t-distributed Stochastic Embedding (t-SNE) method, which is an unsupervised algorithm, to cluster our lidar scans. Because the t-SNE is a data driven method in which no labelling of training set is needed, it is an attractive algorithm to find anomalies in lidar scans. The method has been tested on several nights of measurements from the PCL measurements.The t-SNE can successfully cluster the PCL data scans into meaningful categories. To demonstrate the use of the technique, we have used the algorithm to identify stratospheric aerosol layers due to wildfires.

Download Full-text