Development and Validation of Machine-Learning Clear-Sky Detection Method Using 1-Min Irradiance Data and Sky Imagers at a Polluted Suburban Site, Xianghe

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CHx, NHx and OHx species on the oxygen vacancy and pristine rutile TiO2(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N1.12) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.

Download Full-text

Forecasting Of Covid-19 Cases Using Machine Learning Approach

Current Respiratory Medicine Reviews ◽

10.2174/1573398x17666210129131009 ◽

2021 ◽

Vol 17 ◽

Author(s):

Sachin Kumar ◽

Karan Veer

Keyword(s):

Machine Learning ◽

Regression Model ◽

Model Performance ◽

Real Data ◽

Absolute Error ◽

Viral Disease ◽

Support Vector ◽

Family Welfare ◽

Accuracy Score ◽

Learning Approaches

Aims: The objective of this research is to predict the covid-19 cases in India based on the machine learning approaches. Background: Covid-19, a respiratory disease caused by one of the coronavirus family members, has led to a pandemic situation worldwide in 2020. This virus was detected firstly in Wuhan city of China in December 2019. This viral disease has taken less than three months to spread across the globe. Objective: In this paper, we proposed a regression model based on the Support vector machine (SVM) to forecast the number of deaths, the number of recovered cases, and total confirmed cases for the next 30 days. Method: For prediction, the data is collected from Github and the ministry of India's health and family welfare from March 14, 2020, to December 3, 2020. The model has been designed in Python 3.6 in Anaconda to forecast the forecasting value of corona trends until September 21, 2020. The proposed methodology is based on the prediction of values using SVM based regression model with polynomial, linear, rbf kernel. The dataset has been divided into train and test datasets with 40% and 60% test size and verified with real data. The model performance parameters are evaluated as a mean square error, mean absolute error, and percentage accuracy. Results and Conclusion: The results show that the polynomial model has obtained 95 % above accuracy score, linear scored above 90%, and rbf scored above 85% in predicting cumulative death, conformed cases, and recovered cases.

Download Full-text

Investigation of Influential Factors of Predicting Individuals' Use and Non-use of Fitness and Diet Apps on Smartphones: Application of the Machine Learning Algorithm (XGBoost)

American Journal of Health Behavior ◽

10.5993/ajhb.45.1.9 ◽

2021 ◽

Vol 45 (1) ◽

pp. 111-124

Author(s):

Jaehee Cho ◽

Sehwan Kim ◽

Gwangjin Jeong ◽

Chonghye Kim ◽

Ja-Kyoung Seo

Keyword(s):

Machine Learning ◽

Social Support ◽

Social Influence ◽

Health Management ◽

Learning Algorithm ◽

Media Use ◽

Influential Factors ◽

Machine Learning Algorithm ◽

Accuracy Score ◽

Diverse Groups

Objectives: In this study, we aimed to find the influential factors in determining individuals' use and non-use of fitness and diet apps on smartphones. To this end, we focused on diverse groups of predictors that would significantly affect people's use and non-use of these apps. Methods: Overall, we considered 105 factors as potential predictors and included them in further analyses using a machine learning algorithm, XGBoost. The main reason for selecting this particular algorithm was that it had been known as one of the most accurate and popular algorithms for predicting consumer behaviors. Results: We found the accuracy score of those factors for predicting people's use and non-use of fitness and diet apps was approximately 71.3%. In particular, the most influential predictors were mainly related to social influence, media use, overeating, social support, health management, and attitudes toward exercise. Conclusion: These findings contribute to helping scholars and practitioners to develop more practical strategies of the implementation of fitness and diet apps.

Download Full-text

Scalable Approach to High Coverages on Oxides via Iterative Training of a Machine-Learning Algorithm

10.26434/chemrxiv.10288514 ◽

2019 ◽

Author(s):

Andrew Medford ◽

Shengchun Yang ◽

Fuzhu Liu

Keyword(s):

Machine Learning ◽

Chemical Potential ◽

Learning Algorithm ◽

Absolute Error ◽

Low Energy ◽

Training Data ◽

High Coverage ◽

Metal Compounds ◽

Adsorption Energies ◽

The Stability

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CHx, NHx and OHx species on the oxygen vacancy and pristine rutile TiO2(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N1.12) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.

Download Full-text

Power Prediction of Combined Cycle Power Plant (CCPP) Using Machine Learning Algorithm-Based Paradigm

Wireless Communications and Mobile Computing ◽

10.1155/2021/9966395 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Raheel Siddiqui ◽

Hafeez Anwar ◽

Farman Ullah ◽

Rehmat Ullah ◽

Muhammad Abdul Rehman ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Power Plant ◽

Learning Algorithm ◽

Absolute Error ◽

Combined Cycle ◽

Machine Learning Algorithms ◽

Power Prediction ◽

Boosted Regression Tree ◽

Combined Cycle Power Plant

Power prediction is important not only for the smooth and economic operation of a combined cycle power plant (CCPP) but also to avoid technical issues such as power outages. In this work, we propose to utilize machine learning algorithms to predict the hourly-based electrical power generated by a CCPP. For this, the generated power is considered a function of four fundamental parameters which are relative humidity, atmospheric pressure, ambient temperature, and exhaust vacuum. The measurements of these parameters and their yielded output power are used to train and test the machine learning models. The dataset for the proposed research is gathered over a period of six years and taken from a standard and publicly available machine learning repository. The utilized machine algorithms are K -nearest neighbors (KNN), gradient-boosted regression tree (GBRT), linear regression (LR), artificial neural network (ANN), and deep neural network (DNN). We report state-of-the-art performance where GBRT outperforms not only the utilized algorithms but also all the previous methods on the given CCPP dataset. It achieves the minimum values of root mean square error (RMSE) of 2.58 and absolute error (AE) of 1.85.

Download Full-text

Determination of Water Depth in Ports Using Satellite Data Based on Machine Learning Algorithms

Energies ◽

10.3390/en14092486 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2486

Author(s):

Vanesa Mateo-Pérez ◽

Marina Corral-Bobadilla ◽

Francisco Ortega-Fernández ◽

Vicente Rodríguez-Montequín

Keyword(s):

Machine Learning ◽

Water Depth ◽

Satellite Data ◽

Learning Algorithm ◽

Mean Absolute Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Support Vector ◽

Single Beam

One of the fundamental maintenance tasks of ports is the periodic dredging of them. This is necessary to guarantee a minimum draft that will enable ships to access ports safely. The determination of bathymetries is the instrument that determines the need for dredging and permits an analysis of the behavior of the port bottom over time, in order to achieve adequate water depth. Satellite data processing to predict environmental parameters is used increasingly. Based on satellite data and using different machine learning algorithm techniques, this study has sought to estimate the seabed in ports, taking into account the fact that the port areas are strongly anthropized areas. The algorithms that were used were Support Vector Machine (SVM), Random Forest (RF) and the Multi-Adaptive Regression Splines (MARS). The study was carried out in the ports of Candás and Luarca in the Principality of Asturias. In order to validate the results obtained, data was acquired in situ by using a single beam provided. The results show that this type of methodology can be used to estimate coastal bathymetry. However, when deciding which system was best, priority was given to simplicity and robustness. The results of the SVM and RF algorithms outperform those of the MARS. RF performs better in Candás with a mean absolute error (MAE) of 0.27 cm, whereas SVM performs better in Luarca with a mean absolute error of 0.37 cm. It is suggested that this approach is suitable as a simpler and more cost-effective rough resolution alternative, for estimating the depth of turbid water in ports, than single-beam sonar, which is labor-intensive and polluting.

Download Full-text

The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry

10.26434/chemrxiv.12609899 ◽

2020 ◽

Author(s):

Aditya Thawani ◽

Ryan-Rhys Griffiths ◽

Arian Jamasb ◽

Anthony Bourached ◽

Penelope Jones ◽

...

Keyword(s):

Machine Learning ◽

Density Functional ◽

Learning Algorithm ◽

Model Performance ◽

Molecular Machine ◽

Superior Performance ◽

Synthetic Chemistry ◽

Energy Applications ◽

Synthetic Chemist ◽

Quantum Mechanical Approach

The space of synthesizable molecules is greater than $10^{60}$, meaning only a vanishingly small fraction of these molecules have ever been realized in the lab. In order to prioritize which regions of this space to explore next, synthetic chemists need access to accurate molecular property predictions. While great advances in molecular machine learning have been made, there is a dearth of benchmarks featuring properties that are useful for the synthetic chemist. Focussing directly on the needs of the synthetic chemist, we introduce the Photoswitch Dataset, a new benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule's efficacy is governed by its electronic transition wavelengths. We demonstrate superior performance in predicting these wavelengths compared to both time-dependent density functional theory (TD-DFT), the incumbent first principles quantum mechanical approach, as well as a panel of human experts. Our baseline models are currently being deployed in the lab as part of the decision process for candidate synthesis. It is our hope that this benchmark can drive real discoveries in photoswitch chemistry and that future benchmarks can be introduced to pivot learning algorithm development to benefit more expansive areas of synthetic chemistry.

Download Full-text

Prediction of Heart Stroke using A Novel Framework – PySpark

International Journal of Preventive Medicine and Health ◽

10.35940/ijpmh.b1002.051221 ◽

2021 ◽

Vol 1 (2) ◽

pp. 1-4

Author(s):

Chitluri Sai Harish B ◽

G gnana krishna vamsi ◽

G jaya phani akhil ◽

J n v hari sravan ◽

V mounika chowdary

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Learning Algorithm ◽

Heart Diseases ◽

Classification Algorithms ◽

Machine Learning Algorithm ◽

Accuracy Score ◽

Random Forest Algorithm ◽

The World

Heart diseases are one of the most challenging problems faced by the Health Care sectors all over the world. These diseases are very basic now a days. With the expanding count of deaths because of heart illnesses, the necessity to build up a system to foresee heart ailments precisely. The work in this paper focuses on finding the best Machine Learning algorithm for identification of heart diseases. Our study compares the precision of three well known classification algorithms, Decision Tree and Naïve Bayes, Random Forest for the prediction of heart disease by making the use of dataset provided by Kaggle. We utilized various characteristics which relate with this heart diseases well, to find the better algorithm for prediction. The result of this study indicates that the Random Forest algorithm is the most efficient algorithm for prediction of heart disease with accuracy score of 97.17%.

Download Full-text

A Novel Approach to Enhance the Generalization Capability of the Hourly Solar Diffuse Horizontal Irradiance Models on Diverse Climates

Energies ◽

10.3390/en13184868 ◽

2020 ◽

Vol 13 (18) ◽

pp. 4868

Author(s):

Raghuram Kalyanam ◽

Sabine Hoffmann

Keyword(s):

Machine Learning ◽

Diffuse Radiation ◽

Absolute Error ◽

Training Data ◽

Learning Approaches ◽

Simulation Tools ◽

Energy Applications ◽

Novel Approach ◽

Machine Learning Model ◽

Different Climates

Solar radiation data is essential for the development of many solar energy applications ranging from thermal collectors to building simulation tools, but its availability is limited, especially the diffuse radiation component. There are several studies aimed at predicting this value, but very few studies cover the generalizability of such models on varying climates. Our study investigates how well these models generalize and also show how to enhance their generalizability on different climates. Since machine learning approaches are known to generalize well, we apply them to truly understand how well they perform on different climates than they are originally trained. Therefore, we trained them on datasets from the U.S. and tested on several European climates. The machine learning model that is developed for U.S. climates not only showed low mean absolute error (MAE) of 23 W/m2, but also generalized very well on European climates with MAE in the range of 20 to 27 W/m2. Further investigation into the factors influencing the generalizability revealed that careful selection of the training data can improve the results significantly.

Download Full-text

The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry

10.26434/chemrxiv.12609899.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Aditya Thawani ◽

Ryan-Rhys Griffiths ◽

Arian Jamasb ◽

Anthony Bourached ◽

Penelope Jones ◽

...

Keyword(s):

Machine Learning ◽

Density Functional ◽

Learning Algorithm ◽

Model Performance ◽

Molecular Machine ◽

Superior Performance ◽

Synthetic Chemistry ◽

Energy Applications ◽

Synthetic Chemist ◽

Quantum Mechanical Approach

The space of synthesizable molecules is greater than $10^{60}$, meaning only a vanishingly small fraction of these molecules have ever been realized in the lab. In order to prioritize which regions of this space to explore next, synthetic chemists need access to accurate molecular property predictions. While great advances in molecular machine learning have been made, there is a dearth of benchmarks featuring properties that are useful for the synthetic chemist. Focussing directly on the needs of the synthetic chemist, we introduce the Photoswitch Dataset, a new benchmark for molecular machine learning where improvements in model performance can be immediately observed in the throughput of promising molecules synthesized in the lab. Photoswitches are a versatile class of molecule for medical and renewable energy applications where a molecule's efficacy is governed by its electronic transition wavelengths. We demonstrate superior performance in predicting these wavelengths compared to both time-dependent density functional theory (TD-DFT), the incumbent first principles quantum mechanical approach, as well as a panel of human experts. Our baseline models are currently being deployed in the lab as part of the decision process for candidate synthesis. It is our hope that this benchmark can drive real discoveries in photoswitch chemistry and that future benchmarks can be introduced to pivot learning algorithm development to benefit more expansive areas of synthetic chemistry.

Download Full-text