Short-Term Power Prediction of Building Integrated Photovoltaic (BIPV) System Based on Machine Learning Algorithms

One of the biggest challenges is towards ensuring large-scale integration of photovoltaic systems into buildings. This work is aimed at presenting a building integrated photovoltaic system power prediction concerning the building’s various orientations based on the machine learning data science tools. The proposed prediction methodology comprises a data quality stage, machine learning algorithm, weather clustering assessment, and an accuracy assessment. The results showed that the application of linear regression coefficients to the forecast outputs of the developed photovoltaic power generation neural network improved the PV power generation’s forecast output. The final model resulted from accurate forecasts, exhibiting a root mean square error of 4.42% in NN, 16.86% in QSVM, and 8.76% in TREE. The results are presented with the building facade and roof application such as flat roof, south façade, east façade, and west façade.

Download Full-text

Large-Scale Machine Learning Algorithms for Biomedical Data Science

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '19 ◽

10.1145/3307339.3342130 ◽

2019 ◽

Author(s):

Heng Huang

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Biomedical Data

Download Full-text

Power Prediction of Combined Cycle Power Plant (CCPP) Using Machine Learning Algorithm-Based Paradigm

Wireless Communications and Mobile Computing ◽

10.1155/2021/9966395 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Raheel Siddiqui ◽

Hafeez Anwar ◽

Farman Ullah ◽

Rehmat Ullah ◽

Muhammad Abdul Rehman ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Power Plant ◽

Learning Algorithm ◽

Absolute Error ◽

Combined Cycle ◽

Machine Learning Algorithms ◽

Power Prediction ◽

Boosted Regression Tree ◽

Combined Cycle Power Plant

Power prediction is important not only for the smooth and economic operation of a combined cycle power plant (CCPP) but also to avoid technical issues such as power outages. In this work, we propose to utilize machine learning algorithms to predict the hourly-based electrical power generated by a CCPP. For this, the generated power is considered a function of four fundamental parameters which are relative humidity, atmospheric pressure, ambient temperature, and exhaust vacuum. The measurements of these parameters and their yielded output power are used to train and test the machine learning models. The dataset for the proposed research is gathered over a period of six years and taken from a standard and publicly available machine learning repository. The utilized machine algorithms are K -nearest neighbors (KNN), gradient-boosted regression tree (GBRT), linear regression (LR), artificial neural network (ANN), and deep neural network (DNN). We report state-of-the-art performance where GBRT outperforms not only the utilized algorithms but also all the previous methods on the given CCPP dataset. It achieves the minimum values of root mean square error (RMSE) of 2.58 and absolute error (AE) of 1.85.

Download Full-text

Exploring the Efficiency of Various Supervised Machine Learning Techniques to Predict the Heart Disease using Risk Factors

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a1063.1191s19 ◽

2019 ◽

Vol 9 (1S) ◽

pp. 309-312

Keyword(s):

Machine Learning ◽

Health Care ◽

Heart Disease ◽

Major Part ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set

Data Science in healthcare is a innovative and capable for industry implementing the data science applications. Data analytics is recent science in to discover the medical data set to explore and discover the disease. It’s a beginning attempt to identify the disease with the help of large amount of medical dataset. Using this data science methodology, it makes the user to find their disease without the help of health care centres. Healthcare and data science are often linked through finances as the industry attempts to reduce its expenses with the help of large amounts of data. Data science and medicine are rapidly developing, and it is important that they advance together. Health care information is very effective in the society. In a human life day to day heart disease had increased. Based on the heart disease to monitor different factors in human body to analyse and prevent the heart disease. To classify the factors using the machine learning algorithms and to predict the disease is major part. Major part of involves machine level based supervised learning algorithm such as SVM, Naviebayes, Decision Trees and Random forest.

Download Full-text

Earthquake Prediction using Machine Learning Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e9110.018620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4684-4688

Keyword(s):

Machine Learning ◽

Structural Damage ◽

Data Science ◽

Learning Algorithm ◽

Economic Loss ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Science Data ◽

Data Set

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.

Download Full-text

A Novel Machine Learning Assisted Upscaling Workflow for Simulating the Waterflooding Process

10.2118/205595-ms ◽

2021 ◽

Author(s):

Yanji Wang ◽

Hangyu Li ◽

Jianchun Xu ◽

Ling Fan ◽

Xiaopu Wang ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Algorithm ◽

Learning Algorithms ◽

Flow Simulation ◽

Machine Learning Algorithms ◽

Scale Model ◽

Two Phase ◽

Flow Problems ◽

Similar Accuracy

Abstract Conventional flow-based two-phase upscaling for simulating the waterflooding process requires the calculations of upscaled two-phase parameters for each coarse interface or block. The whole procedure can be greatly time-consuming especially for large-scale reservoir models. To address this problem, flow-based two-phase upscaling techniques are combined with machine learning algorithms, in which the flow-based two-phase upscaling is needed only for a small fraction of coarse interfaces (or blocks), while the upscaled two-phase parameters for the rest of the coarse interfaces (or blocks) are directly provided by the machine learning algorithms instead of performing upscaling computation on each coarse interfaces (or blocks). The new two-phase upscaling workflow was tested for generic (left to right) flow problems using a 2D large-scale model. We observed similar accuracy for results using the machine learning assisted workflow compared with the results using full flow-based upscaling. And significant speedup (nearly 70) is achieved. The workflow developed in this work is one of the pioneering work in combining machine learning algorithm with the time-consuming flow-based two-phase upscaling method. It is a valuable addition to the existing multiscale techniques for subsurface flow simulation.

Download Full-text

Data Science in Healthcare- Current Challenges and Opportunities

10.20944/preprints202103.0425.v1 ◽

2021 ◽

Author(s):

Pankaj Khurana ◽

Rajeev Varshney

Keyword(s):

Machine Learning ◽

Large Scale ◽

Data Science ◽

Healthcare Providers ◽

Life Sciences ◽

Machine Learning Algorithms ◽

Successful Implementation ◽

Challenges And Opportunities ◽

Analytical Approaches ◽

Implementation Factors

The rise in the volume, variety and complexity of data in healthcare has made it as a fertile-bed for Artificial intelligence (AI) and Machine Learning (ML). Several types of AI are already being employed by healthcare providers and life sciences companies. The review summarises a classical machine learning cycle, different machine learning algorithms; different data analytical approaches and successful implementation in haematology. Although there are many instances where AI has been found to be great tool that can augment the clinician’s ability to provide better health outcomes, implementation factors need to be put in place to ascertain large-scale acceptance and popularity.

Download Full-text

Using Machine Learning to Predict Heart Disease

WSEAS TRANSACTIONS ON BIOLOGY AND BIOMEDICINE ◽

10.37394/23208.2022.19.1 ◽

2022 ◽

Vol 19 ◽

pp. 1-9

Author(s):

Nikhil Bora ◽

Sreedevi Gutta ◽

Ahmad Hadaegh

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Heart Disease ◽

Random Forest ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor

Heart Disease has become one of the most leading cause of the death on the planet and it has become most life-threatening disease. The early prediction of the heart disease will help in reducing death rate. Predicting Heart Disease has become one of the most difficult challenges in the medical sector in recent years. As per recent statistics, about one person dies from heart disease every minute. In the realm of healthcare, a massive amount of data was discovered for which the data-science is critical for analyzing this massive amount of data. This paper proposes heart disease prediction using different machine-learning algorithms like logistic regression, naïve bayes, support vector machine, k nearest neighbor (KNN), random forest, extreme gradient boost, etc. These machine learning algorithm techniques we used to predict likelihood of person getting heart disease on the basis of features (such as cholesterol, blood pressure, age, sex, etc. which were extracted from the datasets. In our research we used two separate datasets. The first heart disease dataset we used was collected from very famous UCI machine learning repository which has 303 record instances with 14 different attributes (13 features and one target) and the second dataset that we used was collected from Kaggle website which contained 1190 patient’s record instances with 11 features and one target. This dataset is a combination of 5 popular datasets for heart disease. This study compares the accuracy of various machine learning techniques. In our research, for the first dataset we got the highest accuracy of 92% by Support Vector Machine (SVM). And for the second dataset, Random Forest gave us the highest accuracy of 94.12%. Then, we combined both the datasets which we used in our research for which we got the highest accuracy of 93.31% using Random Forest.

Download Full-text

A Deep Learning Algorithm to Predict Hazardous Drinkers and the Severity of Alcohol-Related Problems Using K-NHANES

Frontiers in Psychiatry ◽

10.3389/fpsyt.2021.684406 ◽

2021 ◽

Vol 12 ◽

Author(s):

Suk-Young Kim ◽

Taesung Park ◽

Kwonyoung Kim ◽

Jihoon Oh ◽

Yoonjae Park ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Deep Learning Algorithm ◽

Conventional Machine ◽

Large Scale Survey ◽

Alcohol Related Problems

Purpose: The number of patients with alcohol-related problems is steadily increasing. A large-scale survey of alcohol-related problems has been conducted. However, studies that predict hazardous drinkers and identify which factors contribute to the prediction are limited. Thus, the purpose of this study was to predict hazardous drinkers and the severity of alcohol-related problems of patients using a deep learning algorithm based on a large-scale survey data.Materials and Methods: Datasets of National Health and Nutrition Examination Survey of South Korea (K-NHANES), a nationally representative survey for the entire South Korean population, were used to train deep learning and conventional machine learning algorithms. Datasets from 69,187 and 45,672 participants were used to predict hazardous drinkers and the severity of alcohol-related problems, respectively. Based on the degree of contribution of each variable to deep learning, it was possible to determine which variable contributed significantly to the prediction of hazardous drinkers.Results: Deep learning showed the higher performance than conventional machine learning algorithms. It predicted hazardous drinkers with an AUC (Area under the receiver operating characteristic curve) of 0.870 (Logistic regression: 0.858, Linear SVM: 0.849, Random forest classifier: 0.810, K-nearest neighbors: 0.740). Among 325 variables for predicting hazardous drinkers, energy intake was a factor showing the greatest contribution to the prediction, followed by carbohydrate intake. Participants were classified into Zone I, Zone II, Zone III, and Zone IV based on the degree of alcohol-related problems, showing AUCs of 0.881, 0.774, 0.853, and 0.879, respectively.Conclusion: Hazardous drinking groups could be effectively predicted and individuals could be classified according to the degree of alcohol-related problems using a deep learning algorithm. This algorithm could be used to screen people who need treatment for alcohol-related problems among the general population or hospital visitors.

Download Full-text

A Knowledge-Oriented Recommendation System for Machine Learning Algorithm Finding and Data Processing

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/ijertcs.2019100102 ◽

2019 ◽

Vol 10 (4) ◽

pp. 20-38

Author(s):

Man Tianxing ◽

Ildar Raisovich Baimuratov ◽

Natalia Alexandrovna Zhukova

Keyword(s):

Machine Learning ◽

Data Processing ◽

Data Science ◽

Recommendation System ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Science Field ◽

Information Theoretic ◽

Computer Professional

With the development of the Big Data, data analysis technology has been actively developed, and now it is used in various subject fields. More and more non-computer professional researchers use machine learning algorithms in their work. Unfortunately, datasets can be messy and knowledge cannot be directly extracted, which is why they need preprocessing. Because of the diversity of the algorithms, it is difficult for researchers to find the most suitable algorithm. Most of them choose algorithms through their intuition. The result is often unsatisfactory. Therefore, this article proposes a recommendation system for data processing. This system consists of an ontology subsystem and an estimation subsystem. Ontology technology is used to represent machine learning algorithm taxonomy, and information-theoretic based criteria are used to form recommendations. This system helps users to apply data processing algorithms without specific knowledge from the data science field.

Download Full-text

Random Forest Algorithm to Investigate the Case of Acute Coronary Syndrome

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i2.3000 ◽

2021 ◽

Vol 5 (2) ◽

pp. 369-378

Author(s):

Eka Pandu Cynthia ◽

M. Afif Rizky A. ◽

Alwis Nazir ◽

Fadhilah Syafria

Keyword(s):

Machine Learning ◽

Acute Coronary Syndrome ◽

Random Forest ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Random Forest Algorithm ◽

Coronary Syndrome ◽

Use Of Data

This paper explains the use of the Random Forest Algorithm to investigate the Case of Acute Coronary Syndrome (ACS). The objectives of this study are to review the evaluation of the use of data science techniques and machine learning algorithms in creating a model that can classify whether or not cases of acute coronary syndrome occur. The research method used in this study refers to the IBM Foundational Methodology for Data Science, include: i) inventorying dataset about ACS, ii) preprocessing for the data into four sub-processes, i.e. requirements, collection, understanding, and preparation, iii) determination of RFA, i.e. the "n" of the tree which will form a forest and forming trees from the random forest that has been created, and iv) determination of the model evaluation and result in analysis based on Python programming language. Based on the experiments that the learning have been conducted using a random forest machine-learning algorithm with an n-estimator value of 100 and each tree's depth (max depth) with a value of 4, learning scenarios of 70:30, 80:20, and 90:10 on 444 cases of acute coronary syndrome data. The results show that the 70:30 scenario model has the best results, with an accuracy value of 83.45%, a precision value of 85%, and a recall value of 92.4%. Conclusions obtained from the experiment results were evaluated with various statistical metrics (accuracy, precision, and recall) in each learning scenario on 444 cases of acute coronary syndrome data with a cross-validation value of 10 fold.

Download Full-text