Regulation Modelling and Analysis Using Machine Learning During the Covid-19 Pandemic in Russia

Due to the specific circumstances related to the COVID-19 pandemic, many countries have enforced emergency measures such as self-isolation and restriction of movement and assembly, which are also directly affecting the functioning of their respective public health and judicial systems. The goal of this study is to identify the efficiency of the criminal sanctions in Russia that were introduced in the beginning of COVID-19 outbreak using machine learning methods. We have developed a regression model for the fine handed out, using random forest regression and XGBoost regression, and calculated the features importance parameters. We have developed classification models for the remission of the penalty and for setting a sentence using a gradient boosting classifier.

Download Full-text

Automated invasive cervical cancer disease detection at early stage through suitable machine learning model

SN Applied Sciences ◽

10.1007/s42452-021-04786-z ◽

2021 ◽

Vol 3 (10) ◽

Author(s):

Sohely Jahan ◽

M. D. Saimun Islam ◽

Linta Islam ◽

Tamanna Yesmin Rashme ◽

Ayesha Aziz Prova ◽

...

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Random Forest ◽

Invasive Cervical Cancer ◽

Research Work ◽

Gradient Boosting ◽

Classification Models ◽

Cancer Disease ◽

Suggested Approach ◽

Machine Learning Classification

AbstractCervical cancer is a common cancer that affects women all over the world. This is the fourth leading cause of death among women and has no symptoms in its early stages. At the cervix, cervical cancer cells develop slowly. If it can be detected early, this cancer can be successfully treated. Health professionals are now facing a major challenge in detecting such cancer until it spreads rapidly. This study applied various machine learning classification methods to predict cervical cancer using risk factors. The main aim of this research work is to be described of the performance variation of eight most classifications algorithm to detect cervical cancer disease based on the selection of various top features sets from the dataset. Multilayer Perceptron (MLP), Random Forest and k-Nearest Neighbor, Decision Tree, Logistic Regression, SVC, Gradient Boosting, AdaBoost are examples of machine learning classification algorithms that have been used to predict cervical cancer and help in early diagnosis. A variety of approaches are used to avoid missing values in the dataset. To choose the various best features, a combination of feature selection techniques such as Chi-square, SelectBest and Random Forest was used. The performance of those classifications is evaluated using the accuracy, recall, precision and f1-score parameters. On a variety of top feature sets, MLP outperformed other classification models. The majority of classification models, on the other hand, claim to have the highest accuracy on the top 25 features in dataset splitting ratio (70:30). For each model, the percentage of correctly classified instances has been presented and all of the results are then discussed. Medical professionals will be able to use the suggested approach to perform research on cervical cancer.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10

Geoscientific Model Development ◽

10.5194/gmd-12-1209-2019 ◽

2019 ◽

Vol 12 (3) ◽

pp. 1209-1225 ◽

Cited By ~ 15

Author(s):

Christoph A. Keller ◽

Mat J. Evans

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gas Phase ◽

Atmospheric Chemistry ◽

Random Forest Regression ◽

Data Set ◽

Gas Phase Chemistry ◽

Chemical Conditions ◽

Phase Chemistry ◽

The Impact

Abstract. Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry transport models. Our training data consist of 1 month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model v10. From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine-learning-driven GEOS-Chem model compares well to the standard simulation. For ozone (O3), errors from using the random forests (compared to the reference simulation) grow slowly and after 5 days the normalized mean bias (NMB), root mean square error (RMSE) and R2 are 4.2 %, 35 % and 0.9, respectively; after 30 days the errors increase to 13 %, 67 % and 0.75, respectively. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10 % and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short-lived nitrogen species, errors become large, with NMB, RMSE and R2 reaching >2100 % >400 % and <0.1, respectively. This proof-of-concept implementation takes 1.8 times more time than the direct integration of the differential equations, but optimization and software engineering should allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations, and its applicability to operational air quality activities.

Download Full-text

Utilizing Data-Driven Models to Predict Brittleness in Tuscaloosa Marine Shale: A Machine Learning Approach

10.2118/208628-stu ◽

2021 ◽

Author(s):

Jamal Ahmadov

Keyword(s):

Machine Learning ◽

Random Forest ◽

Brittleness Index ◽

Estimation Methods ◽

Gradient Boosting ◽

Average Error ◽

Support Vector ◽

Marine Shale ◽

Effective Manner ◽

Selection Of

Abstract The Tuscaloosa Marine Shale (TMS) formation is a clay- and liquid-rich emerging shale play across central Louisiana and southwest Mississippi with recoverable resources of 1.5 billion barrels of oil and 4.6 trillion cubic feet of gas. The formation poses numerous challenges due to its high average clay content (50 wt%) and rapidly changing mineralogy, making the selection of fracturing candidates a difficult task. While brittleness plays an important role in screening potential intervals for hydraulic fracturing, typical brittleness estimation methods require the use of geomechanical and mineralogical properties from costly laboratory tests. Machine Learning (ML) can be employed to generate synthetic brittleness logs and therefore, may serve as an inexpensive and fast alternative to the current techniques. In this paper, we propose the use of machine learning to predict the brittleness index of Tuscaloosa Marine Shale from conventional well logs. We trained ML models on a dataset containing conventional and brittleness index logs from 8 wells. The latter were estimated either from geomechanical logs or log-derived mineralogy. Moreover, to ensure mechanical data reliability, dynamic-to-static conversion ratios were applied to Young's modulus and Poisson's ratio. The predictor features included neutron porosity, density and compressional slowness logs to account for the petrophysical and mineralogical character of TMS. The brittleness index was predicted using algorithms such as Linear, Ridge and Lasso Regression, K-Nearest Neighbors, Support Vector Machine (SVM), Decision Tree, Random Forest, AdaBoost and Gradient Boosting. Models were shortlisted based on the Root Mean Square Error (RMSE) value and fine-tuned using the Grid Search method with a specific set of hyperparameters for each model. Overall, Gradient Boosting and Random Forest outperformed other algorithms and showed an average error reduction of 5 %, a normalized RMSE of 0.06 and a R-squared value of 0.89. The Gradient Boosting was chosen to evaluate the test set and successfully predicted the brittleness index with a normalized RMSE of 0.07 and R-squared value of 0.83. This paper presents the practical use of machine learning to evaluate brittleness in a cost and time effective manner and can further provide valuable insights into the optimization of completion in TMS. The proposed ML model can be used as a tool for initial screening of fracturing candidates and selection of fracturing intervals in other clay-rich and heterogeneous shale formations.

Download Full-text

Machine Learning Predictive Models for Pile Drivability: An Evaluation of Random Forest Regression and Multivariate Adaptive Regression Splines

Information Technology in Geo-Engineering - Springer Series in Geomechanics and Geoengineering ◽

10.1007/978-3-030-32029-4_21 ◽

2019 ◽

pp. 243-255

Author(s):

Wengang Zhang ◽

Chongzhi Wu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Multivariate Adaptive Regression Splines ◽

Regression Splines ◽

Random Forest Regression ◽

Adaptive Regression ◽

Adaptive Regression Splines

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Penerapan Klasifikasi Kueri untuk Meningkatkan Efektivitas Mesin Pencari

Seminar Nasional Official Statistics ◽

10.34123/semnasoffstat.v2021i1.914 ◽

2021 ◽

Vol 2021 (1) ◽

pp. 1012-1018

Author(s):

Handy Geraldy ◽

Lutfi Rahmatuti Maghfiroh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting

Dalam menjalankan peran sebagai penyedia data, Badan Pusat Statistik (BPS) memberikan layanan akses data BPS bagi masyarakat. Salah satu layanan tersebut adalah fitur pencarian di website BPS. Namun, layanan pencarian yang diberikan belum memenuhi harapan konsumen. Untuk memenuhi harapan konsumen, salah satu upaya yang dapat dilakukan adalah meningkatkan efektivitas pencarian agar lebih relevan dengan maksud pengguna. Oleh karena itu, penelitian ini bertujuan untuk membangun fungsi klasifikasi kueri pada mesin pencari dan menguji apakah fungsi tersebut dapat meningkatkan efektivitas pencarian. Fungsi klasifikasi kueri dibangun menggunakan model machine learning. Kami membandingkan lima algoritma yaitu SVM, Random Forest, Gradient Boosting, KNN, dan Naive Bayes. Dari lima algoritma tersebut, model terbaik diperoleh pada algoritma SVM. Kemudian, fungsi tersebut diimplementasikan pada mesin pencari yang diukur efektivitasnya berdasarkan nilai precision dan recall. Hasilnya, fungsi klasifikasi kueri dapat mempersempit hasil pencarian pada kueri tertentu, sehingga meningkatkan nilai precision. Namun, fungsi klasifikasi kueri tidak memengaruhi nilai recall.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text