Water Demand Prediction Using Machine Learning Methods: A Case Study of the Beijing–Tianjin–Hebei Region in China

Predicting water demand helps decision-makers allocate regional water resources efficiently, thereby preventing water waste and shortage. The aim of this study is to predict water demand in the Beijing–Tianjin–Hebei region of North China. The explanatory variables associated with economy, community, water use, and resource availability were identified. Eleven statistical and machine learning models were built, which used data covering the 2004–2019 period. Interpolation and extrapolation scenarios were conducted to find the most suitable predictive model. The results suggest that the gradient boosting decision tree (GBDT) model demonstrates the best prediction performance in the two scenarios. The model was further tested for three other regions in China, and its robustness was validated. The water demand in 2020–2021 was provided. The results show that the identified explanatory variables were effective in water demand prediction. The machine learning models outperformed the statistical models, with the ensemble models being superior to the single predictor models. The best predictive model can also be applied to other regions to help forecast water demand to ensure sustainable water resource management.

Download Full-text

Probabilistic urban water demand forecasting using wavelet-based machine learning models

Journal of Hydrology ◽

10.1016/j.jhydrol.2021.126358 ◽

2021 ◽

pp. 126358

Author(s):

Mostafa Rezaali ◽

John Quilty ◽

Abdolreza Karimi

Keyword(s):

Machine Learning ◽

Water Demand ◽

Demand Forecasting ◽

Urban Water ◽

Learning Models ◽

Water Demand Forecasting ◽

Urban Water Demand ◽

Machine Learning Models

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

O-203 Application of machine learning to predict aneuploidy and mosaicism in embryos from in vitro fertilization (IVF) cycles

Human Reproduction ◽

10.1093/humrep/deab128.014 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

J A Ortiz ◽

R Morales ◽

B Lledo ◽

E Garcia-Hernandez ◽

A Cascales ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Predictive Models ◽

Maternal Age ◽

The Other ◽

Predictor Variables ◽

Learning Models ◽

Male Factor ◽

Factors Associated ◽

Machine Learning Models

Abstract Study question Is it possible to predict the likelihood of an IVF embryo being aneuploid and/or mosaic using a machine learning algorithm? Summary answer There are paternal, maternal, embryonic and IVF-cycle factors that are associated with embryonic chromosomal status that can be used as predictors in machine learning models. What is known already The factors associated with embryonic aneuploidy have been extensively studied. Mostly maternal age and to a lesser extent male factor and ovarian stimulation have been related to the occurrence of chromosomal alterations in the embryo. On the other hand, the main factors that may increase the incidence of embryo mosaicism have not yet been established. The models obtained using classical statistical methods to predict embryonic aneuploidy and mosaicism are not of high reliability. As an alternative to traditional methods, different machine and deep learning algorithms are being used to generate predictive models in different areas of medicine, including human reproduction. Study design, size, duration The study design is observational and retrospective. A total of 4654 embryos from 1558 PGT-A cycles were included (January-2017 to December-2020). The trophoectoderm biopsies on D5, D6 or D7 blastocysts were analysed by NGS. Embryos with ≤25% aneuploid cells were considered euploid, between 25-50% were classified as mosaic and aneuploid with >50%. The variables of the PGT-A were recorded in a database from which predictive models of embryonic aneuploidy and mosaicism were developed. Participants/materials, setting, methods The main indications for PGT-A were advanced maternal age, abnormal sperm FISH and recurrent miscarriage or implantation failure. Embryo analysis were performed using Veriseq-NGS (Illumina). The software used to carry out all the analysis was R (RStudio). The library used to implement the different algorithms was caret. In the machine learning models, 22 predictor variables were introduced, which can be classified into 4 categories: maternal, paternal, embryonic and those specific to the IVF cycle. Main results and the role of chance The different couple, embryo and stimulation cycle variables were recorded in a database (22 predictor variables). Two different predictive models were performed, one for aneuploidy and the other for mosaicism. The predictor variable was of multi-class type since it included the segmental and whole chromosome alteration categories. The dataframe were first preprocessed and the different classes to be predicted were balanced. A 80% of the data were used for training the model and 20% were reserved for further testing. The classification algorithms applied include multinomial regression, neural networks, support vector machines, neighborhood-based methods, classification trees, gradient boosting, ensemble methods, Bayesian and discriminant analysis-based methods. The algorithms were optimized by minimizing the Log_Loss that measures accuracy but penalizing misclassifications. The best predictive models were achieved with the XG-Boost and random forest algorithms. The AUC of the predictive model for aneuploidy was 80.8% (Log_Loss 1.028) and for mosaicism 84.1% (Log_Loss: 0.929). The best predictor variables of the models were maternal age, embryo quality, day of biopsy and whether or not the couple had a history of pregnancies with chromosomopathies. The male factor only played a relevant role in the mosaicism model but not in the aneuploidy model. Limitations, reasons for caution Although the predictive models obtained can be very useful to know the probabilities of achieving euploid embryos in an IVF cycle, increasing the sample size and including additional variables could improve the models and thus increase their predictive capacity. Wider implications of the findings Machine learning can be a very useful tool in reproductive medicine since it can allow the determination of factors associated with embryonic aneuploidies and mosaicism in order to establish a predictive model for both. To identify couples at risk of embryo aneuploidy/mosaicism could benefit them of the use of PGT-A. Trial registration number Not Applicable

Download Full-text

Implementing clinical decision support for oncology advanced care planning: A systems engineering framework to optimize the usability and utility of a machine learning predictive model in clinical practice.

Journal of Clinical Oncology ◽

10.1200/jco.2020.39.28_suppl.330 ◽

2021 ◽

Vol 39 (28_suppl) ◽

pp. 330-330

Author(s):

Teja Ganta ◽

Stephanie Lehrman ◽

Rachel Pappalardo ◽

Madalene Crow ◽

Meagan Will ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Predictive Model ◽

Systems Engineering ◽

Care Planning ◽

Learning Models ◽

Predictive Tool ◽

Risk Of Death ◽

The Impact ◽

Machine Learning Models

330 Background: Machine learning models are well-positioned to transform cancer care delivery by providing oncologists with more accurate or accessible information to augment clinical decisions. Many machine learning projects, however, focus on model accuracy without considering the impact of using the model in real-world settings and rarely carry forward to clinical implementation. We present a human-centered systems engineering approach to address clinical problems with workflow interventions utilizing machine learning algorithms. Methods: We aimed to develop a mortality predictive tool, using a Random Forest algorithm, to identify oncology patients at high risk of death within 30 days to move advance care planning (ACP) discussions earlier in the illness trajectory. First, a project sponsor defined the clinical need and requirements of an intervention. The data scientists developed the predictive algorithm using data available in the electronic health record (EHR). A multidisciplinary workgroup was assembled including oncology physicians, advanced practice providers, nurses, social workers, chaplain, clinical informaticists, and data scientists. Meeting bi-monthly, the group utilized human-centered design (HCD) methods to understand clinical workflows and identify points of intervention. The workgroup completed a workflow redesign workshop, a 90-minute facilitated group discussion, to integrate the model in a future state workflow. An EHR (Epic) analyst built the user interface to support the intervention per the group’s requirements. The workflow was piloted in thoracic oncology and bone marrow transplant with plans to scale to other cancer clinics. Results: Our predictive model performance on test data was acceptable (sensitivity 75%, specificity 75%, F-1 score 0.71, AUC 0.82). The workgroup identified a “quality of life coordinator” who: reviews an EHR report of patients scheduled in the upcoming 7 days who have a high risk of 30-day mortality; works with the oncology team to determine ACP clinical appropriateness; documents the need for ACP; identifies potential referrals to supportive oncology, social work, or chaplain; and coordinates the oncology appointment. The oncologist receives a reminder on the day of the patient’s scheduled visit. Conclusions: This workgroup is a viable approach that can be replicated at institutions to address clinical needs and realize the full potential of machine learning models in healthcare. The next steps for this project are to address end-user feedback from the pilot, expand the intervention to other cancer disease groups, and track clinical metrics.

Download Full-text

Predicting Electric Vehicle Charging Station Availability Using Ensemble Machine Learning

Energies ◽

10.3390/en14237834 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7834

Author(s):

Christopher Hecht ◽

Jan Figgener ◽

Dirk Uwe Sauer

Keyword(s):

Machine Learning ◽

Binary Data ◽

Training Data ◽

Gradient Boosting ◽

Traffic Density ◽

Learning Models ◽

Charging Infrastructure ◽

Ensemble Models ◽

Charging Station ◽

Machine Learning Models

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.

Download Full-text

Estimation of Chlorophyll-a Concentrations in Small Water Bodies: Comparison of Fused Gaofen-6 and Sentinel-2 Sensors

Remote Sensing ◽

10.3390/rs14010229 ◽

2022 ◽

Vol 14 (1) ◽

pp. 229

Author(s):

Jiarui Shi ◽

Qian Shen ◽

Yue Yao ◽

Junsheng Li ◽

Fu Chen ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll A ◽

Water Bodies ◽

Gradient Boosting ◽

Learning Models ◽

Small Water ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Sentinel 2 ◽

Small Water Bodies

Chlorophyll-a concentrations in water bodies are one of the most important environmental evaluation indicators in monitoring the water environment. Small water bodies include headwater streams, springs, ditches, flushes, small lakes, and ponds, which represent important freshwater resources. However, the relatively narrow and fragmented nature of small water bodies makes it difficult to monitor chlorophyll-a via medium-resolution remote sensing. In the present study, we first fused Gaofen-6 (a new Chinese satellite) images to obtain 2 m resolution images with 8 bands, which was approved as a good data source for Chlorophyll-a monitoring in small water bodies as Sentinel-2. Further, we compared five semi-empirical and four machine learning models to estimate chlorophyll-a concentrations via simulated reflectance using fused Gaofen-6 and Sentinel-2 spectral response function. The results showed that the extreme gradient boosting tree model (one of the machine learning models) is the most accurate. The mean relative error (MRE) was 9.03%, and the root-mean-square error (RMSE) was 4.5 mg/m3 for the Sentinel-2 sensor, while for the fused Gaofen-6 image, MRE was 6.73%, and RMSE was 3.26 mg/m3. Thus, both fused Gaofen-6 and Sentinel-2 could estimate the chlorophyll-a concentrations in small water bodies. Since the fused Gaofen-6 exhibited a higher spatial resolution and Sentinel-2 exhibited a higher temporal resolution.

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v2 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text

Extra Point Under Review: Machine Learning And The NFL Field Goal

Elements ◽

10.6017/eurj.v12i2.9448 ◽

2016 ◽

Vol 12 (2) ◽

Author(s):

James LeDoux

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Learning Models ◽

The World ◽

Extra Point ◽

Machine Learning Models

<p>The new NFL extra point rule first implemented in the 2015 season requires a kicker to attempt his extra point with the ball snapped from the 15-yard line. This attempt stretches an extra point to the equivalent of a 32-yard field goal attempt, 13 yards longer than under the previous rule. Though a 32-yard attempt is still a chip shot to any professional kicker, many NFL analysts were surprised to see the number of extra points that were missed. Should this really have been a surprise, though? Beginning with a replication of a study by Clark et. al, this study aims to explore the world of NFL kicking from a statistical perspective, applying econometric and machine learning models to display a deeper perspective on what exactly makes some field goal attempts more difficult than others. Ultimately, the goal is to go beyond the previous research on this topic, providing an improved predictive model of field goal success and a better metric for evaluating placekicker ability.</p>

Download Full-text

Active learning for the power factor prediction in diamond-like thermoelectric materials

npj Computational Materials ◽

10.1038/s41524-020-00439-8 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Ye Sheng ◽

Yasong Wu ◽

Jiong Yang ◽

Wencong Lu ◽

Pierre Villars ◽

...

Keyword(s):

Machine Learning ◽

Active Learning ◽

Transport Properties ◽

Gradient Boosting ◽

Learning Models ◽

Material Development ◽

Materials Genome ◽

P Type ◽

Type Power ◽

Machine Learning Models

Abstract The Materials Genome Initiative requires the crossing of material calculations, machine learning, and experiments to accelerate the material development process. In recent years, data-based methods have been applied to the thermoelectric field, mostly on the transport properties. In this work, we combined data-driven machine learning and first-principles automated calculations into an active learning loop, in order to predict the p-type power factors (PFs) of diamond-like pnictides and chalcogenides. Our active learning loop contains two procedures (1) based on a high-throughput theoretical database, machine learning methods are employed to select potential candidates and (2) computational verification is applied to these candidates about their transport properties. The verification data will be added into the database to improve the extrapolation abilities of the machine learning models. Different strategies of selecting candidates have been tested, finally the Gradient Boosting Regression model of Query by Committee strategy has the highest extrapolation accuracy (the Pearson R = 0.95 on untrained systems). Based on the prediction from the machine learning models, binary pnictides, vacancy, and small atom-containing chalcogenides are predicted to have large PFs. The bonding analysis reveals that the alterations of anionic bonding networks due to small atoms are beneficial to the PFs in these compounds.

Download Full-text

Time-domain Feature and Ensemble Model based Classification of EMG Signals for Hand Gesture Recognition

10.21203/rs.3.rs-605286/v1 ◽

2021 ◽

Author(s):

Debarati Bhattacharjee ◽

Munesh Singh

Keyword(s):

Machine Learning ◽

Time Domain ◽

Electrical Current ◽

Clinical Diagnostics ◽

Gradient Boosting ◽

Learning Models ◽

Absolute Value ◽

Feature Pair ◽

Emg Signal ◽

Machine Learning Models

Abstract The electromyography (EMG) signal is the electrical current generated in muscles due to the inter-change of ions during their contractions. It has many applications in clinical diagnostics and the biomedical field. This paper has experimented with various ensemble algorithms and time-domain features to classify eight types of hand gestures. To train and test the machine learning models, we have extracted eight types of time-domain features from the raw EMG signals, such as integrated EMG (IEMG), variance, mean absolute value (MAV), modified mean absolute value type 1, waveform length, root mean square, average amplitude change, and difference absolute standard deviation value. The ensemble machine learning models are based on stacking, bagging, and gradient boosting. We have used four different-sized training sets to evaluate the performance of these classifiers. From the performance evaluation, we have identified the XG-Boost (gblinear) classifier with the IEMG feature as the best classifier-feature pair. The proposed classifier-feature pair has given better performance with a classification accuracy of 98.33% and a processing time of 5.67 μs for one vector than the existing extended associative memory-MAV classifier-feature pair.

Download Full-text