On-Ground Distributed COVID-19 Variant Intelligent Data Analytics for a Regional Territory

The onset of the COVID-19 pandemic and the subsequent transmission among communities has made the entire human population extremely vulnerable. Due to the virus’s contagiousness, the most powerful economies in the world are struggling with the inadequacies of resources. As the number of cases continues to rise and the healthcare industry is overwhelmed with the increasing needs of the infected population, there is a requirement to estimate the potential future number of cases using prediction methods. This paper leverages data-driven estimation methods such as linear regression (LR), random forest (RF), and XGBoost (extreme gradient boosting) algorithm. All three algorithms are trained using the COVID-19 data of Pakistan from 24 February to 31 December 2020, wherein the daily resolution is integrated. Essentially, this paper postulates that, with the help of values of new positive cases, medical swabs, daily death, and daily new positive cases, it is possible to predict the progression of the COVID-19 pandemic and demonstrate future trends. Linear regression tends to oversimplify concepts in supervised learning and neglect practical challenges present in the real world, often cited as its primary disadvantage. In this paper, we use an enhanced random forest algorithm. It is a supervised learning algorithm that is used for classification. This algorithm works well for an extensive range of data items, and also it is very flexible and possesses very high accuracy. For higher accuracy, we have also implemented the XGBoost algorithm on the dataset. XGBoost is a newly introduced machine learning algorithm; this algorithm provides high accuracy of prediction models, and it is observed that it performs well in short-term prediction. This paper discusses various factors such as total COVID-19 cases, new cases per day, total COVID-19 related deaths, new deaths due to the COVID-19, the total number of recoveries, number of daily recoveries, and swabs through the proposed technique. This paper presents an innovative approach that assists health officials in Pakistan with their decision-making processes.

Download Full-text

Prediction of West Nile Virus using Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9810.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3744-3749

Keyword(s):

West Nile Virus ◽

Random Forest ◽

Learning Algorithm ◽

Traditional Approach ◽

The United States ◽

Gradient Boosting ◽

Ensemble Classifiers ◽

Human Beings ◽

West Nile ◽

Extreme Gradient Boosting

West Nile Virus (WNV) is a disease caused by mosquitoes where human beings get infected by the mosquito’s bite. The disease is considered to be a serious threat to the society especially in the United States where it is frequently found in localities having water bodies. The traditional approach is to collect the traps of mosquitoes from a locality and check whether they are infected with virus. If there is a virus found then that locality is sprayed with pesticides. But this process is very time consuming and requires a lot of financial support. Machine learning methods can provide an efficient approach to predict the presence of virus in a locality using data related to the location and weather. This paper uses the dataset present in Kaggle which includes information related to the traps found in the locality and also about the information related to the locality’s weather. The dataset is found to be imbalanced hence Synthetic Minority Over sampling Technique (SMOTE), an upsampling method, is used to sample the dataset to balance it. Ensemble learning classifiers like random forest, gradient boosting and Extreme Gradient Boosting (XGB). The performance of ensemble classifiers is compared with the performance of the best supervised learning algorithm, SVM. Among the models, XGB gave the highest F-1 score of 92.93 by performing marginally better than random forest (92.78) and also SVM (91.16).

Download Full-text

Stock Portfolio Prediction by Multi-Target Decision Support

iSys - Brazilian Journal of Information Systems ◽

10.5753/isys.2019.381 ◽

2019 ◽

Vol 12 (1) ◽

pp. 05-27

Author(s):

Everton Jose Santana ◽

João Augusto Provin Ribeiro Da silva ◽

Saulo Martiello Mastelini ◽

Sylvio Barbon Jr

Keyword(s):

Decision Support ◽

Random Forest ◽

Stock Market ◽

Prediction Models ◽

Deep Structure ◽

Predictive Performance ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Extreme Gradient Boosting

Investing in the stock market is a complex process due to its high volatility caused by factors as exchange rates, political events, inflation and the market history. To support investor's decisions, the prediction of future stock price and economic metrics is valuable. With the hypothesis that there is a relation among investment performance indicators, the goal of this paper was exploring multi-target regression (MTR) methods to estimate 6 different indicators and finding out the method that would best suit in an automated prediction tool for decision support regarding predictive performance. The experiments were based on 4 datasets, corresponding to 4 different time periods, composed of 63 combinations of weights of stock-picking concepts each, simulated in the US stock market. We compared traditional machine learning approaches with seven state-of-the-art MTR solutions: Stacked Single Target, Ensemble of Regressor Chains, Deep Structure for Tracking Asynchronous Regressor Stacking, Deep Regressor Stacking, Multi-output Tree Chaining, Multi-target Augment Stacking and Multi-output Random Forest (MORF). With the exception of MORF, traditional approaches and the MTR methods were evaluated with Extreme Gradient Boosting, Random Forest and Support Vector Machine regressors. By means of extensive experimental evaluation, our results showed that the most recent MTR solutions can achieve suitable predictive performance, improving all the scenarios (14.70% in the best one, considering all target variables and periods). In this sense, MTR is a proper strategy for building stock market decision support system based on prediction models.

Download Full-text

PM2.5 Estimation using Supervised Learning Models

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9912.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4771-4776

Keyword(s):

Random Forest ◽

Supervised Learning ◽

Human Life ◽

Quality Data ◽

Gradient Boosting ◽

Learning Models ◽

High Concentration ◽

Data Prediction ◽

Extreme Gradient Boosting ◽

Accuracy Of Prediction

Present era of Urbanization, mechanization, and globalization has attracted more and more Air pollution problems. However, PM 2.5 (Particulate Matter) majorly present at air, having diameter below 2.5 μm. With its high concentration leading to health issues such as lung cancer, cardiovascular disease, respiratory disease etc. With respect to this, presented work approach is building of supervised learning models, XGBoost(Extreme Gradient Boosting) along with MLR(Multiple Linear Regression),RF(Random Forest) and MLP (Multilayer Perceptron) to estimate PM2.5 congregation. The air quality data of city Changping in Beijing is taken into consideration for Analaysis. The accuracy of prediction of the four approaches is measured by using contrasting discovered value verses predicted value of PM2.5 with the help of three measuring matrices. The consequences reveals that the Random Forest algorithm outperforms other data mining strategies for the considered data. Prediction of PM2.5 concentrations will assist governing bodies in warning people who are at peak risk, and taking right measures to reduce its quantity in air also to reduce its impact on human life.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Utilizing Data-Driven Models to Predict Brittleness in Tuscaloosa Marine Shale: A Machine Learning Approach

10.2118/208628-stu ◽

2021 ◽

Author(s):

Jamal Ahmadov

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brittleness Index ◽

Estimation Methods ◽

Gradient Boosting ◽

Average Error ◽

Support Vector ◽

Marine Shale ◽

Effective Manner ◽

Selection Of

Abstract The Tuscaloosa Marine Shale (TMS) formation is a clay- and liquid-rich emerging shale play across central Louisiana and southwest Mississippi with recoverable resources of 1.5 billion barrels of oil and 4.6 trillion cubic feet of gas. The formation poses numerous challenges due to its high average clay content (50 wt%) and rapidly changing mineralogy, making the selection of fracturing candidates a difficult task. While brittleness plays an important role in screening potential intervals for hydraulic fracturing, typical brittleness estimation methods require the use of geomechanical and mineralogical properties from costly laboratory tests. Machine Learning (ML) can be employed to generate synthetic brittleness logs and therefore, may serve as an inexpensive and fast alternative to the current techniques. In this paper, we propose the use of machine learning to predict the brittleness index of Tuscaloosa Marine Shale from conventional well logs. We trained ML models on a dataset containing conventional and brittleness index logs from 8 wells. The latter were estimated either from geomechanical logs or log-derived mineralogy. Moreover, to ensure mechanical data reliability, dynamic-to-static conversion ratios were applied to Young's modulus and Poisson's ratio. The predictor features included neutron porosity, density and compressional slowness logs to account for the petrophysical and mineralogical character of TMS. The brittleness index was predicted using algorithms such as Linear, Ridge and Lasso Regression, K-Nearest Neighbors, Support Vector Machine (SVM), Decision Tree, Random Forest, AdaBoost and Gradient Boosting. Models were shortlisted based on the Root Mean Square Error (RMSE) value and fine-tuned using the Grid Search method with a specific set of hyperparameters for each model. Overall, Gradient Boosting and Random Forest outperformed other algorithms and showed an average error reduction of 5 %, a normalized RMSE of 0.06 and a R-squared value of 0.89. The Gradient Boosting was chosen to evaluate the test set and successfully predicted the brittleness index with a normalized RMSE of 0.07 and R-squared value of 0.83. This paper presents the practical use of machine learning to evaluate brittleness in a cost and time effective manner and can further provide valuable insights into the optimization of completion in TMS. The proposed ML model can be used as a tool for initial screening of fracturing candidates and selection of fracturing intervals in other clay-rich and heterogeneous shale formations.

Download Full-text

Algorithmic and data modeling: Will algorithmic modeling improve predictions of traits evaluated on ordinal scales?

10.1101/2020.10.07.329466 ◽

2020 ◽

Author(s):

Zhanyou Xu ◽

Andreomar Kurek ◽

Steven B. Cannon ◽

Williams D. Beavis

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Ridge Regression ◽

Genomic Prediction ◽

Ordinal Data ◽

Prediction Models ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Data Types

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.

Download Full-text

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features and tree-rings

Environmental Earth Sciences ◽

10.1007/s12665-021-10054-5 ◽

2021 ◽

Vol 80 (22) ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkaman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Random Forest ◽

Tree Rings ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Boosting Algorithms ◽

Streamflow Modeling

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text

Modeling and analysis of COVID-19 new deaths using tree-based ensemble

10.36227/techrxiv.16566012.v1 ◽

2021 ◽

Author(s):

Ibrahim Abaker Targio Hashem ◽

Raja Sher Afgun Usmani ◽

Asad Ali Shah ◽

Abdulwahab Ali Almazroi ◽

Muhammad Bilal

Keyword(s):

Infectious Disease ◽

United States ◽

Random Forest ◽

Economic Activity ◽

The United States ◽

Gradient Boosting ◽

Health Crisis ◽

Modeling And Analysis ◽

Extreme Gradient Boosting ◽

The World

The COVID-19 pandemic has emerged as the world's most serious health crisis, affecting millions of people all over the world. The majority of nations have imposed nationwide curfews and reduced economic activity to combat the spread of this infectious disease. Governments are monitoring the situation and making critical decisions based on the daily number of new cases and deaths reported. Therefore, this study aims to predict the daily new deaths using four tree-based ensemble models i.e., Gradient Tree Boosting (GB), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Voting Regressor (VR) for the three most affected countries, which are the United States, Brazil, and India. The results showed that VR outperformed other models in predicting daily new deaths for all three countries. The predictions of daily new deaths made using VR for Brazil and India are very close to the actual new deaths, whereas the prediction of daily new deaths for the United States still needs to be improved.<br>

Download Full-text