Obtaining predicted values of the demographic process using machine learning methods

Research and analysis of demographic processes play an important role in many areas. For this, the population size and key factors from 1994 to 2019 were selected on the statistical website of the Republic of Kazakhstan. Demographics were population size, fertility, mortality, divorce, and migration. The factors of the standard of living were the number of unemployed and the average monthly salary, while the medical factors were the hospital organizations, the number of hospital beds and the number of doctors of all specialties. In the course of regression analysis, a correlation was obtained and multicollinear factors were identified. We used four different machine learning models from the Scikit-Learn library to generate population estimates. Regression models were evaluated using the quality score. As a result, linear regression and random forest models performed well.

Download Full-text

Data-Driven Wildfire Risk Prediction in Northern California

Atmosphere ◽

10.3390/atmos12010109 ◽

2021 ◽

Vol 12 (1) ◽

pp. 109

Author(s):

Ashima Malik ◽

Megha Rajam Rao ◽

Nandini Puppala ◽

Prathusha Koouri ◽

Venkata Anil Kumar Thota ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Curves ◽

Data Driven ◽

Northern California ◽

Combined Model ◽

Wildfire Risk ◽

Study Results ◽

Forest Models ◽

Random Forest Models

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

Download Full-text

Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning (Preprint)

10.2196/preprints.17119 ◽

2019 ◽

Author(s):

Karen-Inge Karstoft ◽

Ioannis Tsamardinos ◽

Kasper Eskelund ◽

Søren Bo Andersen ◽

Lars Ravnborg Nissen

Keyword(s):

Machine Learning ◽

Operating Characteristic ◽

Linear Models ◽

Prediction Models ◽

Characteristic Curve ◽

Ptsd Symptoms ◽

Forest Models ◽

Random Forest Models ◽

Automated Machine Learning ◽

Military Rank

BACKGROUND Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful. OBJECTIVE This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment. METHODS Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232). RESULTS Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive. CONCLUSIONS Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions.

Download Full-text

For Honor, for Toxicity

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3474680 ◽

2021 ◽

Vol 5 (CHI PLAY) ◽

pp. 1-29

Author(s):

Alessandro Canossa ◽

Dmitry Salimov ◽

Ahmad Azadvar ◽

Casper Harteveld ◽

Georgios Yannakakis

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forests ◽

Initial Study ◽

Unfair Advantage ◽

Offensive Behavior ◽

Forest Models ◽

Random Forest Models ◽

Action Type ◽

Degree Of Severity

Is it possible to detect toxicity in games just by observing in-game behavior? If so, what are the behavioral factors that will help machine learning to discover the unknown relationship between gameplay and toxic behavior? In this initial study, we examine whether it is possible to predict toxicity in the MOBA gameFor Honor by observing in-game behavior for players that have been labeled as toxic (i.e. players that have been sanctioned by Ubisoft community managers). We test our hypothesis of detecting toxicity through gameplay with a dataset of almost 1,800 sanctioned players, and comparing these sanctioned players with unsanctioned players. Sanctioned players are defined by their toxic action type (offensive behavior vs. unfair advantage) and degree of severity (warned vs. banned). Our findings, based on supervised learning with random forests, suggest that it is not only possible to behaviorally distinguish sanctioned from unsanctioned players based on selected features of gameplay; it is also possible to predict both the sanction severity (warned vs. banned) and the sanction type (offensive behavior vs. unfair advantage). In particular, all random forest models predict toxicity, its severity, and type, with an accuracy of at least 82%, on average, on unseen players. This research shows that observing in-game behavior can support the work of community managers in moderating and possibly containing the burden of toxic behavior.

Download Full-text

Prediction of Gas Turbine Trip: a Novel Methodology Based on Random Forest Models

10.1115/gt2021-58916 ◽

2021 ◽

Author(s):

Enzo Losi ◽

Mauro Venturini ◽

Lucrezia Manservigi ◽

Giuseppe Fabio Ceschini ◽

Giovanni Bechini ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gas Turbine ◽

Gas Turbines ◽

Remaining Useful Life ◽

Training Data ◽

The Novel ◽

Novel Approach ◽

Forest Models ◽

Random Forest Models

Abstract A gas turbine trip is an unplanned shutdown, of which the most relevant consequences are business interruption and a reduction of equipment remaining useful life. Thus, understanding the underlying causes of gas turbine trip would allow predicting its occurrence in order to maximize gas turbine profitability and improve its availability. In the ever competitive Oil & Gas sector, data mining and machine learning are increasingly being employed to support a deeper insight and improved operation of gas turbines. Among the various machine learning tools, Random Forests are an ensemble learning method consisting of an aggregation of decision tree classifiers. This paper presents a novel methodology aimed at exploiting information embedded in the data and develops Random Forest models, aimed at predicting gas turbine trip based on information gathered during a timeframe of historical data acquired from multiple sensors. The novel approach exploits time series segmentation to increase the amount of training data, thus reducing overfitting. First, data are transformed according to a feature engineering methodology developed in a separate work by the same authors. Then, Random Forest models are trained and tested on unseen observations to demonstrate the benefits of the novel approach. The superiority of the novel approach is proved by considering two real-word case-studies, involving filed data taken during three years of operation of two fleets of Siemens gas turbines located in different regions. The novel methodology allows values of Precision, Recall and Accuracy in the range 75–85 %, thus demonstrating the industrial feasibility of the predictive methodology.

Download Full-text

Globalization of Migration Processes: on the Threshold of the New Migration Reality

SHS Web of Conferences ◽

10.1051/shsconf/20185001225 ◽

2018 ◽

Vol 50 ◽

pp. 01225

Author(s):

Rakhmon Ulmasov ◽

Nurali Kurbanov

Keyword(s):

Social Unit ◽

Underdeveloped Countries ◽

Global Issues ◽

Political Conflicts ◽

Sharp Growth ◽

Demographic Processes ◽

Important Idea ◽

The Republic ◽

And Migration ◽

National Governments

In this paper, the authors address the global issues of migration on the threshold of a new migration reality. Migration is considered to be one of the most actively developing global issues at present as more and more people are crossing the border of their state for one reason or another. The authors determine a comprehensive analysis of issues, in particular, the accelerating processes of global warming, expanding social and political conflicts, economic crises, and migration collapse. All of these issues together indicate a very important idea: in all of its processes related to life, particularly in the area of migration, the world has reached the threshold of a New Reality. The authors have indicated with absolute accuracy those issues that need a most focused attention from national governments and international institutions. This way, there is an obvious fact that is paradoxical for many countries, especially the European ones: despite the complex socioeconomic situation, limited natural resources, rising unemployment, declining income, and increasing impelled migration, there is a sharp growth in population in the Republic of Tajikistan. Such demographic processes are a hallmark of predominantly poor and underdeveloped countries, where having many children is often the only factor of a family’s survival as a social unit.

Download Full-text

FP-ADMET: a compendium of fingerprint-based ADMET prediction models

Journal of Cheminformatics ◽

10.1186/s13321-021-00557-5 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Vishwesh Venkatraman

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Shortest Paths ◽

Predictive Ability ◽

Molecular Fingerprint ◽

Atom Pairs ◽

Forest Models ◽

Random Forest Models ◽

Local Path ◽

Binary Fingerprints

Abstract Motivation The absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs plays a key role in determining which among the potential candidates are to be prioritized. In silico approaches based on machine learning methods are becoming increasing popular, but are nonetheless limited by the availability of data. With a view to making both data and models available to the scientific community, we have developed FPADMET which is a repository of molecular fingerprint-based predictive models for ADMET properties. Summary In this article, we have examined the efficacy of fingerprint-based machine learning models for a large number of ADMET-related properties. The predictive ability of a set of 20 different binary fingerprints (based on substructure keys, atom pairs, local path environments, as well as custom fingerprints such as all-shortest paths) for over 50 ADMET and ADMET-related endpoints have been evaluated as part of the study. We find that for a majority of the properties, fingerprint-based random forest models yield comparable or better performance compared with traditional 2D/3D molecular descriptors. Availability The models are made available as part of open access software that can be downloaded from https://gitlab.com/vishsoft/fpadmet.

Download Full-text

Assessment and mapping of demographic potential of urbanized territories of the Baikal-Mongol region

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/885/1/012028 ◽

2021 ◽

Vol 885 (1) ◽

pp. 012028

Author(s):

N V Vorobyev ◽

A N Vorobyev

Keyword(s):

Urban Areas ◽

Irkutsk Region ◽

Quantitative Characteristics ◽

Working Age ◽

Population Structures ◽

Demographic Processes ◽

The Republic ◽

And Migration ◽

Population Demographic ◽

Demographic Potential

Abstract This article provides an assessment of the demographic potential of the Baikal-Mongolian region, which unites the adjacent territories of the two countries. The cores of the research site are the urbanized territories of Irkutsk, Ulan-Ude and Ulan-Bator, and communications are railways and highways connecting the main centres. The demographic potential is characterized by the level and possibilities for the development of demographic processes and population structures, and mainly numerous quantitative characteristics of the population of the territory are used. The authors limited themselves to using quantitative characteristics of the demographic potential according to statistical data for 2019–2020 within the territories of the municipal districts and urban districts of the Irkutsk region, the Republic of Buryatia and aimags of Mongolia. Data on density and proportion of urban population reflect the size of the main urban areas. Data on demographic processes reflect the characteristics of the natural and migration movement of the population. Demographic structures are represented by the age structure and the demographic load of the working-age population, which is minimal throughout Mongolia and in the suburbs of Russian regional centres. Generalizing characteristics of demographic potential calculated from the average sum of individual indicators.

Download Full-text

Predicting performance in 4 x 200-m freestyle swimming relay events

PLoS ONE ◽

10.1371/journal.pone.0254538 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254538

Author(s):

Paul Pao-Yen Wu ◽

Toktam Babaei ◽

Michael O’Shea ◽

Kerrie Mengersen ◽

Christopher Drovandi ◽

...

Keyword(s):

Machine Learning ◽

Support Staff ◽

Individual Event ◽

Team Members ◽

Team Selection ◽

Predicting Performance ◽

Gold Silver ◽

Forest Models ◽

Random Forest Models ◽

The Individual

Aim The aim was to predict and understand variations in swimmer performance between individual and relay events, and develop a predictive model for the 4x200-m swimming freestyle relay event to help inform team selection and strategy. Data and methods Race data for 716 relay finals (4 x 200-m freestyle) from 14 international competitions between 2010–2018 were analysed. Individual 200-m freestyle season best time for the same year was located for each swimmer. Linear regression and machine learning was applied to 4 x 200-m swimming freestyle relay events. Results Compared to the individual event, the lowest ranked swimmer in the team (-0.62 s, CI = [−0.94, −0.30]) and American swimmers (−0.48 s [−0.89, −0.08]) typically swam faster 200-m times in relay events. Random forest models predicted gold, silver, bronze and non-medal with 100%, up to 41%, up to 63%, and 93% sensitivity, respectively. Discussion Team finishing position was strongly associated with the differential time to the fastest team (mean decrease in Gini (MDG) when this variable was omitted = 31.3), world rankings of team members (average ranking MDG of 18.9), and the order of swimmers (MDG = 6.9). Differential times are based on the sum of individual swimmer’s season’s best times, and along with world rankings, reflect team strength. In contrast, the order of swimmers reflects strategy. This type of analysis could assist coaches and support staff in selecting swimmers and team orders for relay events to enhance the likelihood of success.

Download Full-text

Predicting the animal hosts of coronaviruses from compositional biases of spike protein and whole genome sequences through machine learning

10.1101/2020.11.02.350439 ◽

2020 ◽

Author(s):

Liam Brierley ◽

Anna Fowler

Keyword(s):

Machine Learning ◽

Random Forest ◽

Machine Learning Algorithms ◽

Spike Protein ◽

Animal Origin ◽

Whole Genome ◽

Genome Sequences ◽

Genome Composition ◽

Forest Models ◽

Random Forest Models

AbstractThe COVID-19 pandemic has demonstrated the serious potential for novel zoonotic coronaviruses to emerge and cause major outbreaks. The immediate animal origin of the causative virus, SARS-CoV-2, remains unknown, a notoriously challenging task for emerging disease investigations. Coevolution with hosts leads to specific evolutionary signatures within viral genomes that can inform likely animal origins. We obtained a set of 650 spike protein and 511 whole genome nucleotide sequences from 225 and 187 viruses belonging to the family Coronaviridae, respectively. We then trained random forest models independently on genome composition biases of spike protein and whole genome sequences, including dinucleotide and codon usage biases in order to predict animal host (of nine possible categories, including human). In hold-one-out cross-validation, predictive accuracy on unseen coronaviruses consistently reached ∼73%, indicating evolutionary signal in spike proteins to be just as informative as whole genome sequences. However, different composition biases were informative in each case. Applying optimised random forest models to classify human sequences of MERS-CoV and SARS-CoV revealed evolutionary signatures consistent with their recognised intermediate hosts (camelids, carnivores), while human sequences of SARS-CoV-2 were predicted as having bat hosts (suborder Yinpterochiroptera), supporting bats as the suspected origins of the current pandemic. In addition to phylogeny, variation in genome composition can act as an informative approach to predict emerging virus traits as soon as sequences are available. More widely, this work demonstrates the potential in combining genetic resources with machine learning algorithms to address long-standing challenges in emerging infectious diseases.

Download Full-text

The Level of Education and Cultural Capital as Migration Factors (Based on People Living in the Republic of Bashkortostan)

Sociologicheskaja nauka i social naja praktika ◽

10.19181/snsp.2019.7.4.6805 ◽

2019 ◽

Vol 7 (4) ◽

pp. 119-127

Author(s):

Marsel S. Turakaev

Keyword(s):

Cultural Capital ◽

Local Population ◽

Key Factors ◽

Life Long Learning ◽

Republic Of Bashkortostan ◽

Education Status ◽

Migration Factors ◽

The Republic ◽

And Migration ◽

The One

This paper highlights the link between education status and cultural capital, on the one hand, and migration likelihood, on the other hand. The conclusions are based on two surveys of respondents living the Republic of Bashkortostan. The desire to continue one’s education and the dissatisfaction with one’s current education level are among the key factors that drive the local population to change their permanent place of residence. Cultural capital, in turn, is viewed through the prism of skills and knowledge, everyday leisure practices, and values. For instance, the greater a group’s computer and Internet literacy, the higher the share of people who are ready to permanently move to a different place. Those locals who express their readiness to migrate are generally younger; they tend to spend their free time online, acquire self-taught skills, pursue creative hobbies and sports etc. They value life-long learning, independence, initiative, and a desire for change, and plan their future a long way in advance. By contrast, respondents who do not plan to migrate tend to prefer traditional pastimes and practices: gardening, watching television, reading newspapers, going to church or mosque, performing home repairs etc.

Download Full-text