Forecasting System of Computational Time of DFT/TDDFT Calculations under the Multiverse Ansatz via Machine Learning and Cheminformatics

Real-time river flood forecasting models can be useful for issuing flood alerts and reducing or preventing inundations. To this end, machine-learning (ML) methods are becoming increasingly popular thanks to their low computational requirements and to their reliance on observed data only. This work aimed to evaluate the ML models’ capability of predicting flood stages at a critical gauge station, using mainly upstream stage observations, though downstream levels should also be included to consider backwater, if present. The case study selected for this analysis was the lower stretch of the Parma River (Italy), and the forecast horizon was extended up to 9 h. The performances of three ML algorithms, namely Support Vector Regression (SVR), MultiLayer Perceptron (MLP), and Long Short-term Memory (LSTM), were compared herein in terms of accuracy and computational time. Up to 6 h ahead, all models provided sufficiently accurate predictions for practical purposes (e.g., Root Mean Square Error < 15 cm, and Nash-Sutcliffe Efficiency coefficient > 0.99), while peak levels were poorly predicted for longer lead times. Moreover, the results suggest that the LSTM model, despite requiring the longest training time, is the most robust and accurate in predicting peak values, and it should be preferred for setting up an operational forecasting system.

Download Full-text

Improved Visible Light-Based Indoor Positioning System Using Machine Learning Classification and Regression

Applied Sciences ◽

10.3390/app9061048 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1048 ◽

Cited By ~ 8

Author(s):

Huy Tran ◽

Cheolkeun Ha

Keyword(s):

Machine Learning ◽

Visible Light ◽

Noise Reduction ◽

Indoor Positioning ◽

Computational Time ◽

Dual Function ◽

Positioning System ◽

Positioning Accuracy ◽

Positioning Systems ◽

Machine Learning Classification

Recently, indoor positioning systems have attracted a great deal of research attention, as they have a variety of applications in the fields of science and industry. In this study, we propose an innovative and easily implemented solution for indoor positioning. The solution is based on an indoor visible light positioning system and dual-function machine learning (ML) algorithms. Our solution increases positioning accuracy under the negative effect of multipath reflections and decreases the computational time for ML algorithms. Initially, we perform a noise reduction process to eliminate low-intensity reflective signals and minimize noise. Then, we divide the floor of the room into two separate areas using the ML classification function. This significantly reduces the computational time and partially improves the positioning accuracy of our system. Finally, the regression function of those ML algorithms is applied to predict the location of the optical receiver. By using extensive computer simulations, we have demonstrated that the execution time required by certain dual-function algorithms to determine indoor positioning is decreased after area division and noise reduction have been applied. In the best case, the proposed solution took 78.26% less time and provided a 52.55% improvement in positioning accuracy.

Download Full-text

Machine Learning Application to CO2 Foam Rheology

10.2118/208016-ms ◽

2021 ◽

Author(s):

Javad Iskandarov ◽

George Fanourgakis ◽

Waleed Alameri ◽

George Froudakis ◽

Georgios Karanikolos

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

Experimental Studies ◽

Training Data ◽

Computational Time ◽

Gradient Boosting ◽

Operational Conditions ◽

Co2 Foam ◽

Modelling Techniques ◽

Foam Rheology

Abstract Conventional foam modelling techniques require tuning of too many parameters and long computational time in order to provide accurate predictions. Therefore, there is a need for alternative methodologies for the efficient and reliable prediction of the foams’ performance. Foams are susceptible to various operational conditions and reservoir parameters. This research aims to apply machine learning (ML) algorithms to experimental data in order to correlate important affecting parameters to foam rheology. In this way, optimum operational conditions for CO2 foam enhanced oil recovery (EOR) can be determined. In order to achieve that, five different ML algorithms were applied to experimental rheology data from various experimental studies. It was concluded that the Gradient Boosting (GB) algorithm could successfully fit the training data and give the most accurate predictions for unknown cases.

Download Full-text

Industry Experience of Developing Day-Ahead Photovoltaic Plant Forecasting System Based on Machine Learning

Remote Sensing ◽

10.3390/rs12203420 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3420 ◽

Cited By ~ 1

Author(s):

Alexandra I. Khalyasmaa ◽

Stanislav A. Eroshenko ◽

Valeriy A. Tashchilin ◽

Hariprakash Ramachandran ◽

Teja Piepur Chakravarthi ◽

...

Keyword(s):

Machine Learning ◽

Power Plant ◽

Power Plants ◽

Machine Learning Algorithms ◽

Solar Irradiation ◽

Gradient Boosting ◽

Forecasting Accuracy ◽

Photovoltaic Power ◽

Forecasting System ◽

Industry Experience

This article highlights the industry experience of the development and practical implementation of a short-term photovoltaic forecasting system based on machine learning methods for a real industry-scale photovoltaic power plant implemented in a Russian power system using remote data acquisition. One of the goals of the study is to improve photovoltaic power plants generation forecasting accuracy based on open-source meteorological data, which is provided in regular weather forecasts. In order to improve the robustness of the system in terms of the forecasting accuracy, we apply newly derived feature introduction, a factor obtained as a result of feature engineering procedure, characterizing the relationship between photovoltaic power plant energy production and solar irradiation on a horizontal surface, thus taking into account the impacts of atmospheric and electrical nature. The article scrutinizes the application of different machine learning algorithms, including Random Forest regressor, Gradient Boosting Regressor, Linear Regression and Decision Trees regression, to the remotely obtained data. As a result of the application of the aforementioned approaches together with hyperparameters, tuning and pipelining of the algorithms, the optimal structure, parameters and the application sphere of different regressors were identified for various testing samples. The mathematical model developed within the framework of the study gave us the opportunity to provide robust photovoltaic energy forecasting results with mean accuracy over 92% for mostly-sunny sample days and over 83% for mostly cloudy days with different types of precipitation.

Download Full-text

A Comparative Study for Condition Monitoring on Wind Turbine Blade using Vibration Signals through Statistical Features: a Lazy Learning Approach

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.10.20833 ◽

2018 ◽

Vol 7 (4.10) ◽

pp. 190 ◽

Cited By ~ 2

Author(s):

A. Joshuva ◽

V. Sugumaran

Keyword(s):

Machine Learning ◽

Wind Turbine ◽

Phase 1 ◽

Turbine Blades ◽

Computational Time ◽

Learning Approach ◽

Statistical Features ◽

Wind Turbine Blades ◽

Vibration Signals ◽

Machine Learning Approach

This study is to identify whether the wind turbine blades are in good or faulty conditions. If faulty, then the objective to find which fault condition are the blades subjected to. The problem identification is carried out by machine learning approach using vibration signals through statistical features. In this study, a three bladed wind turbine was chosen and faults like blade cracks, hub-blade loose connection, blade bend, pitch angle twist and blade erosion were considered. Here, the study is carried out in three phases namely, feature extraction, feature selection and feature classification. In phase 1, the required statistical features are extracted from the vibration signals which obtained from the wind turbine through accelerometer. In phase 2, the most dominating or the relevant feature is selected from the extracted features using J48 decision tree algorithm. In phase 3, the selected features are classified using machine learning classifiers namely, K-star (KS), locally weighted learning (LWL), nearest neighbour (NN), k-nearest neighbours (kNN), instance based K-nearest using log and Gaussian weight kernels (IBKLG) and lazy Bayesian rules classifier (LBRC). The results were compared with respect to the classification accuracy and the computational time of the classifier.

Download Full-text

Monitoring Forest Change in the Amazon Using Multi-Temporal Remote Sensing Data and Machine Learning Classification on Google Earth Engine

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9100580 ◽

2020 ◽

Vol 9 (10) ◽

pp. 580 ◽

Cited By ~ 1

Author(s):

Maria Antonia Brovelli ◽

Yaru Sun ◽

Vasil Yordanov

Keyword(s):

Machine Learning ◽

Forest Dynamics ◽

Google Earth ◽

Classification Model ◽

Mitigation Measures ◽

Computational Time ◽

Forest Change ◽

Machine Learning Classification ◽

Cloud Processing ◽

Google Earth Engine

Deforestation causes diverse and profound consequences for the environment and species. Direct or indirect effects can be related to climate change, biodiversity loss, soil erosion, floods, landslides, etc. As such a significant process, timely and continuous monitoring of forest dynamics is important, to constantly follow existing policies and develop new mitigation measures. The present work had the aim of mapping and monitoring the forest change from 2000 to 2019 and of simulating the future forest development of a rainforest region located in the Pará state, Brazil. The land cover dynamics were mapped at five-year intervals based on a supervised classification model deployed on the cloud processing platform Google Earth Engine. Besides the benefits of reduced computational time, the service is coupled with a vast data catalogue providing useful access to global products, such as multispectral images of the missions Landsat five, seven, eight and Sentinel-2. The validation procedures were done through photointerpretation of high-resolution panchromatic images obtained from CBERS (China–Brazil Earth Resources Satellite). The more than satisfactory results allowed an estimation of peak deforestation rates for the period 2000–2006; for the period 2006–2015, a significant decrease and stabilization, followed by a slight increase till 2019. Based on the derived trends a forest dynamics was simulated for the period 2019–2028, estimating a decrease in the deforestation rate. These results demonstrate that such a fusion of satellite observations, machine learning, and cloud processing, benefits the analysis of the forest dynamics and can provide useful information for the development of forest policies.

Download Full-text

Machine learning driven non-invasive approach of water content estimation in living plant leaves using terahertz waves

Plant Methods ◽

10.1186/s13007-019-0522-9 ◽

2019 ◽

Vol 15 (1) ◽

Cited By ~ 7

Author(s):

Adnan Zahid ◽

Hasan T. Abbas ◽

Aifeng Ren ◽

Ahmed Zoha ◽

Hadi Heidari ◽

...

Keyword(s):

Machine Learning ◽

Water Content ◽

Operating Time ◽

Computational Time ◽

Support Vector ◽

Terahertz Waves ◽

Living Plant ◽

Invasive Approach ◽

Non Invasive ◽

Precise Estimation

Abstract Background The demand for effective use of water resources has increased because of ongoing global climate transformations in the agriculture science sector. Cost-effective and timely distributions of the appropriate amount of water are vital not only to maintain a healthy status of plants leaves but to drive the productivity of the crops and achieve economic benefits. In this regard, employing a terahertz (THz) technology can be more reliable and progressive technique due to its distinctive features. This paper presents a novel, and non-invasive machine learning (ML) driven approach using terahertz waves with a swissto12 material characterization kit (MCK) in the frequency range of 0.75 to 1.1 THz in real-life digital agriculture interventions, aiming to develop a feasible and viable technique for the precise estimation of water content (WC) in plants leaves for 4 days. For this purpose, using measurements observations data, multi-domain features are extracted from frequency, time, time–frequency domains to incorporate three different machine learning algorithms such as support vector machine (SVM), K-nearest neighbour (KNN) and decision-tree (D-Tree). Results The results demonstrated SVM outperformed other classifiers using tenfold and leave-one-observations-out cross-validation for different days classification with an overall accuracy of 98.8%, 97.15%, and 96.82% for Coffee, pea shoot, and baby spinach leaves respectively. In addition, using SFS technique, coffee leaf showed a significant improvement of 15%, 11.9%, 6.5% in computational time for SVM, KNN and D-tree. For pea-shoot, 21.28%, 10.01%, and 8.53% of improvement was noticed in operating time for SVM, KNN and D-Tree classifiers, respectively. Lastly, baby spinach leaf exhibited a further improvement of 21.28% in SVM, 10.01% in KNN, and 8.53% in D-tree in overall operating time for classifiers. These improvements in classifiers produced significant advancements in classification accuracy, indicating a more precise quantification of WC in leaves. Conclusion Thus, the proposed method incorporating ML using terahertz waves can be beneficial for precise estimation of WC in leaves and can provide prolific recommendations and insights for growers to take proactive actions in relations to plants health monitoring.

Download Full-text

Enhancing Machine Learning Prediction in Cybersecurity Using Dynamic Feature Selector

Journal of Cybersecurity and Privacy ◽

10.3390/jcp1010011 ◽

2021 ◽

Vol 1 (1) ◽

pp. 199-218

Author(s):

Mostofa Ahsan ◽

Rahul Gomes ◽

Md. Minhaz Chowdhury ◽

Kendall E. Nygard

Keyword(s):

Machine Learning ◽

Short Term Memory ◽

Computational Time ◽

Dynamic Feature ◽

Short Term ◽

Feature Size ◽

Learning Stage ◽

Term Memory ◽

Feature Selector ◽

Long Short Term Memory

Machine learning algorithms are becoming very efficient in intrusion detection systems with their real time response and adaptive learning process. A robust machine learning model can be deployed for anomaly detection by using a comprehensive dataset with multiple attack types. Nowadays datasets contain many attributes. Such high dimensionality of datasets poses a significant challenge to information extraction in terms of time and space complexity. Moreover, having so many attributes may be a hindrance towards creation of a decision boundary due to noise in the dataset. Large scale data with redundant or insignificant features increases the computational time and often decreases goodness of fit which is a critical issue in cybersecurity. In this research, we have proposed and implemented an efficient feature selection algorithm to filter insignificant variables. Our proposed Dynamic Feature Selector (DFS) uses statistical analysis and feature importance tests to reduce model complexity and improve prediction accuracy. To evaluate DFS, we conducted experiments on two datasets used for cybersecurity research namely Network Security Laboratory (NSL-KDD) and University of New South Wales (UNSW-NB15). In the meta-learning stage, four algorithms were compared namely Bidirectional Long Short-Term Memory (Bi-LSTM), Gated Recurrent Units, Random Forest and a proposed Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) for accuracy estimation. For NSL-KDD, experiments revealed an increment in accuracy from 99.54% to 99.64% while reducing feature size of one-hot encoded features from 123 to 50. In UNSW-NB15 we observed an increase in accuracy from 90.98% to 92.46% while reducing feature size from 196 to 47. The proposed approach is thus able to achieve higher accuracy while significantly lowering number of features required for processing.

Download Full-text

Entropy Ensemble Filter: Does information content assessment of bootstrapped training datasets before model training lead to better trade-off between ensemble size and predictive performance?

10.5194/egusphere-egu2020-1963 ◽

2020 ◽

Author(s):

Hossein Foroozand ◽

Steven V. Weijs

Keyword(s):

Machine Learning ◽

Computational Cost ◽

Predictive Performance ◽

Original Data ◽

Training Data ◽

Computational Time ◽

Limiting Factor ◽

Ensemble Size ◽

Content Assessment ◽

Model Training

<p>Machine learning is the fast-growing branch of data-driven models, and its main objective is to use computational methods to become more accurate in predicting outcomes without being explicitly programmed. In this field, a way to improve model predictions is to use a large collection of models (called ensemble) instead of a single one. Each model is then trained on slightly different samples of the original data, and their predictions are averaged. This is called bootstrap aggregating, or Bagging, and is widely applied. A recurring question in previous works was: how to choose the ensemble size of training data sets for tuning the weights in machine learning? The computational cost of ensemble-based methods scales with the size of the ensemble, but excessively reducing the ensemble size comes at the cost of reduced predictive performance. The choice of ensemble size was often determined based on the size of input data and available computational power, which can become a limiting factor for larger datasets and complex models&#8217; training. In this research, it is our hypothesis that if an ensemble of artificial neural networks (ANN) models or any other machine learning technique uses the most informative ensemble members for training purpose rather than all bootstrapped ensemble members, it could reduce the computational time substantially without negatively affecting the performance of simulation.</p>

Download Full-text

Forecasting residential gas consumption with machine learning algorithms on weather data

E3S Web of Conferences ◽

10.1051/e3sconf/201911105019 ◽

2019 ◽

Vol 111 ◽

pp. 05019

Author(s):

Brian de Keijzer ◽

Pol de Visser ◽

Víctor García Romillo ◽

Víctor Gómez Muñoz ◽

Daan Boesten ◽

...

Keyword(s):

Machine Learning ◽

Energy Consumption ◽

Energy Use ◽

Machine Learning Algorithms ◽

Weather Data ◽

Computational Time ◽

Percentage Error ◽

Learning Models ◽

Gas Consumption ◽

Machine Learning Models

Machine learning models have proven to be reliable methods in the forecasting of energy use in commercial and office buildings. However, little research has been done on energy forecasting in dwellings, mainly due to the difficulty of obtaining household level data while keeping the privacy of inhabitants in mind. Gaining insight into the energy consumption in the near future can be helpful in balancing the grid and insights in how to reduce the energy consumption can be received. In collaboration with OPSCHALER, a measurement campaign on the influence of housing characteristics on energy costs and comfort, several machine learning models were compared on forecasting performance and the computational time needed. Nine months of data containing the mean gas consumption of 52 dwellings on a one hour resolution was used for this research. The first 6 months were used for training, whereas the last 3 months were used to evaluate the models. The results showed that the Deep Neural Network (DNN) performed best with a 50.1 % Mean Absolute Percentage Error (MAPE) on a one hour resolution. When comparing daily and weekly resolutions, the Multivariate Linear Regression (MVLR) outperformed other models, with a 20.1 % and 17.0 % MAPE, respectively. The models were programmed in Python.

Download Full-text