Machine Learning Reveals a Significant Shift in Water Regime Types Due to Projected Climate Change

A water regime type is a cumulative representation of seasonal runoff variability in a textual, qualitative, or quantitative form developed for a particular period. The assessment of the respective water regime type changes is of high importance for local communities and water management authorities, increasing their awareness and opening strategies for adaptation. In the presented study, we trained a machine learning model—the Random Forest classifier—to predict water regime types in northwest Russia based on monthly climatological hydrographs derived for a historical period (1979–1991). Evaluation results show the high efficiency of the trained model with an accuracy of 91.6%. Then, the Random Forest model was used to predict water regime types based on runoff projections for the end of the 21st century (2087–2099) forced by four different General Circulation Models (GCM) and three Representative Concentration Pathway scenarios (RCP). Results indicate that climate is expected to modify water regime types remarkably. There are two primary directions of projected changes. First, we detect the tendency towards less stable summer and winter flows. The second direction is towards a shift in spring flood characteristics. While spring flooding is expected to remain the dominant phase of the water regime, the flood peak is expected to shift towards earlier occurrence and lower magnitude. We identified that the projected changes in water regime types are more pronounced in more aggressive RCP scenarios.

Download Full-text

Objective classification of changes in water regime types of the Russian Plain rivers utilizing machine learning approaches

10.5194/egusphere-egu2020-11553 ◽

2020 ◽

Author(s):

Alexander Ivanov ◽

Timophey Samsonov ◽

Natalia Frolova ◽

Maria Kireeva ◽

Elena Povalishnikova

Keyword(s):

Machine Learning ◽

Water Regime ◽

Cosine Similarity ◽

Regime Type ◽

Russian Plain ◽

Regime Types ◽

Objective Classification ◽

Seasonal Runoff ◽

Monthly Runoff

Hydrological regime classification of Russian Plain rivers was always done by hand and by using subjective analysis of various characteristics of a seasonal runoff. Last update to this classification was made in the early 1990s.&#160;In this work we make an attempt at using different machine learning methods for objective classification. Both clustering (DBSCAN, K-Means) and classification (XGBoost) methods were used to establish 1) if an established runoff types can be inferred from the data using supervised approach 2) similar clusters can be inferred from data (unsupervised approach). Monthly runoff data for 237 rivers of Russian Plain since 1945 and until 2016 were used as a dataset.&#160;In a first attempt dataset was divided into periods of 1945-1977 and 1978-2016 in attempt to detect changes in river water regimes due to climate change. Monthly data were transformed into following features: annual and seasonal runoff, runoff levels for different seasons, minimum and maximum values of monthly runoff, ratios of the minimum and maximum runoff compared to yearly average and others. Supervised classification using XGBoost method resulted in 90% accuracy in water regime type identification for 1945-1977 period. Shifts in water regime types for southern rivers of Russian Plain rivers in a Don region were identified by this classifier.DBSCAN algorithm for clustering was able to identify 6 major clusters corresponding to existing water regime types: Kola peninsula, North-East part of Russian Plain and polar Urals, Central Russia, Southern Russia, arid South-East, foothills and separately higher altitudes of the Caucasus. Nonetheless a better approach was sought due to intersections of a clusters because of the continuous nature of data. Cosine similarity metric was used as an alternative way to separate river runoff types, this time for each year. Yearly cutoff also allows us to make a timeline of water regime changes over the course of 70 years. By using it as an objective ground truth we plan to remake classification and clusterization made earlier and establish an automated way to classify changes in water regime over time.As a result, the following conclusions can be made<ol><li>It&#8217;s possible to train an accurate classifier based on established water regime type and apply it to detect changes in water regime types over the course of time</li> <li>By applying the classifier to different periods of time we can detect a shift to &#8220;southern&#8221; type of water regime in the central area of Russian Plain</li> <li>Despite the highly continuous nature of data it seems possible to use cosine similarity metric to separate water regime types into zones corresponding to established ones</li> </ol>The study was supported by the Russian Science Foundation (grant No.19-77-10032) in methods&#160;and Russian Foundation for Basic Research (grant No.18-05-60021)&#160;for analyses in Arctic region&#160;

Download Full-text

Do CMIP5 models hint at a warmer and wetter India in the 21st century?

Journal of Water and Climate Change ◽

10.2166/wcc.2015.126 ◽

2015 ◽

Vol 7 (2) ◽

pp. 280-295 ◽

Cited By ~ 5

Author(s):

Rajib Maity ◽

Ankit Aggarwal ◽

Kironmala Chanda

Keyword(s):

General Circulation ◽

Winter Season ◽

Coupled Model ◽

General Circulation Models ◽

Cmip5 Models ◽

Historical Period ◽

The North ◽

North Eastern ◽

Spatio Temporal ◽

Projected Changes

This study diagnoses the spatio-temporal variation of three major hydroclimatic variables (temperature, precipitation and evaporation) estimated from four general circulation models participating in the Fifth Phase of the Coupled Model Intercomparision Project (CMIP5). Changes in climate regime are analyzed across India for the historical scenario (1850–2005) and for the RCP8.5 scenario (2006–2100). The study provides a relative assessment of projected changes in climatic pattern over different zones in India, broadly divided as southern, Eastern, Western, Central, North-Eastern and Himalayan regions. Monthly data for both the scenarios were obtained, and all the data were re-gridded to a common resolution. All the models show a stronger warming in the future as compared to the historical period. The North-Eastern, Northern and Himalayan regions are likely to be severely affected. Though inconsistencies have been observed among the models, the majority of them predict an increase in precipitation in future, with a major increment in southern cities. The Himalayan belt is expected to receive heavy rainfall in the summer season, with little change in the winter season. Most of the regions are not expected to experience change in evaporation in pre-monsoonal months, but substantial change is expected in some regions during monsoonal and post-monsoonal months.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not.

Download Full-text

A Study on Host Tropism Determinants of Influenza Virus Using Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666191104160927 ◽

2020 ◽

Vol 15 (2) ◽

pp. 121-134 ◽

Cited By ~ 2

Author(s):

Eunmi Kwon ◽

Myeongji Cho ◽

Hayeon Kim ◽

Hyeon S. Son

Keyword(s):

Machine Learning ◽

Amino Acids ◽

Influenza Virus ◽

Random Forest ◽

Physicochemical Properties ◽

Protein Sequences ◽

Influenza Viruses ◽

Host Tropism ◽

Post Hoc ◽

Ha Protein

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.

Download Full-text

Development of Prediction Models Using Machine Learning Algorithms for Girls with Suspected Central Precocious Puberty: Retrospective Study (Preprint)

10.2196/preprints.11728 ◽

2018 ◽

Author(s):

Liyan Pan ◽

Guangjian Liu ◽

Xiaojian Mao ◽

Huixian Li ◽

Jiexin Zhang ◽

...

Keyword(s):

Machine Learning ◽

Retrospective Study ◽

Random Forest ◽

Precocious Puberty ◽

Prediction Models ◽

Central Precocious Puberty ◽

Machine Learning Algorithms ◽

Stimulation Test ◽

Gnrh Analogue ◽

Prediction Probability

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

Knee Muscle Force Estimating Model Using Machine Learning Approach

The Computer Journal ◽

10.1093/comjnl/bxaa160 ◽

2020 ◽

Author(s):

Anurag Sohane ◽

Ravinder Agarwal

Keyword(s):

Machine Learning ◽

Random Forest ◽

Muscle Force ◽

Vastus Lateralis ◽

Input Parameter ◽

Research Work ◽

Cost Effective ◽

Coefficient Of Determination ◽

Muscle Forces ◽

Knee Muscle

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Integrating Machine/Deep Learning Methods and Filtering Techniques for Reliable Mineral Phase Segmentation of 3D X-ray Computed Tomography Images

Energies ◽

10.3390/en14154595 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4595

Author(s):

Parisa Asadi ◽

Lauren E. Beckingham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ct Images ◽

Ct Imaging ◽

Learning Method ◽

Learning Methods ◽

X Ray ◽

Machine Learning Methods ◽

Filtering Techniques

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.

Download Full-text