An ensemble based on neural networks with random weights for online data stream regression

Abstract Most information sources in the current technological world are generating data sequentially and rapidly, in the form of data streams. The evolving nature of processes may often cause changes in data distribution, also known as concept drift, which is difficult to detect and causes loss of accuracy in supervised learning algorithms. As a consequence, online machine learning algorithms that are able to update actively according to possible changes in the data distribution are required. Although many strategies have been developed to tackle this problem, most of them are designed for classification problems. Therefore, in the domain of regression problems, there is a need for the development of accurate algorithms with dynamic updating mechanisms that can operate in a computational time compatible with today’s demanding market. In this article, the authors propose a new bagging ensemble approach based on neural network with random weights for online data stream regression. The proposed method improves the data prediction accuracy as well as minimises the required computational time compared to a recent algorithm for online data stream regression from literature. The experiments are carried out using four synthetic datasets to evaluate the algorithm’s response to concept drift, along with four benchmark datasets from different industries. The results indicate improvement in data prediction accuracy, effectiveness in handling concept drift, and much faster updating times compared to the existing available approach. Additionally, the use of design of experiments as an effective tool for hyperparameter tuning is demonstrated.

Download Full-text

Anomaly Detection Technique for Intrusion Detection in SDN Environment using Continuous Data Stream Machine Learning Algorithms

2021 IEEE International Systems Conference (SysCon) ◽

10.1109/syscon48628.2021.9447092 ◽

2021 ◽

Author(s):

Admilson de Ribamar Lima Ribeiro ◽

Reneilson Yves Carvalho Santos ◽

Anderson Clayton Alves Nascimento

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Anomaly Detection ◽

Data Stream ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Detection Technique ◽

Continuous Data

Download Full-text

Learning from Ontology Streams with Semantic Concept Drift

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/133 ◽

2017 ◽

Cited By ~ 7

Author(s):

Jiaoyan Chen ◽

Freddy Lecue ◽

Jeff Z. Pan ◽

Huajun Chen

Keyword(s):

Semantic Web ◽

Data Stream ◽

Concept Drift ◽

Data Distribution ◽

Accurate Prediction ◽

Knowledge Structures ◽

Semantic Concept ◽

Web Data ◽

Semantic Inference

Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.

Download Full-text

Audit Fraud Data Prediction Using Machine Learning Algorithms

Algorithms for Intelligent Systems - Proceedings of International Conference on Communication and Computational Technologies ◽

10.1007/978-981-15-5077-5_38 ◽

2020 ◽

pp. 413-419

Author(s):

Ankita Sharma ◽

Amit Sinhal ◽

Manish Tiwari ◽

Mayank Patel

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Data Prediction

Download Full-text

A Comparative Study of Machine Learning Algorithms for Financial Data Prediction

2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT) ◽

10.1109/isaect.2018.8618774 ◽

2018 ◽

Author(s):

Bencharef Omar ◽

Bousbaa Zineb ◽

Aida Cortes Jofre ◽

Daniel Gonzalez Cortes

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Financial Data ◽

Data Prediction

Download Full-text

Incremental Learning on Non-stationary Data Stream using Ensemble Approach

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i4.10255 ◽

2016 ◽

Vol 6 (4) ◽

pp. 1811 ◽

Cited By ~ 1

Author(s):

Meenakshi Anurag Thalor ◽

Shrishailapa Patil

Keyword(s):

Machine Learning ◽

Incremental Learning ◽

Data Stream ◽

Recommendation System ◽

Concept Drift ◽

Joint Probability ◽

Machine Learning Algorithms ◽

Training Data ◽

Joint Probability Distribution ◽

Changes Over Time

Incremental Learning on non stationary distribution has been shown to be a very challenging problem in machine learning and data mining, because the joint probability distribution between the data and classes changes over time. Many real time problems suffer concept drift as they changes with time. For example, an advertisement recommendation system, in which customer’s behavior may change depending on the season of the year, on the inflation and on new products made available. An extra challenge arises when the classes to be learned are not represented equally in the training data i.e. classes are imbalanced, as most machine learning algorithms work well only when the training data is balanced. The objective of this paper is to develop an ensemble based classification algorithm for non-stationary data stream (ENSDS) with focus on two-class problems. In addition, we are presenting here an exhaustive comparison of purposed algorithms with state-of-the-art classification approaches using different evaluation measures like recall, f-measure and g-mean

Download Full-text

Comparison of Machine Learning Algorithms to Increase Prediction Accuracy of COPD Domain

Enhanced Quality of Life and Smart Living - Lecture Notes in Computer Science ◽

10.1007/978-3-319-66188-9_22 ◽

2017 ◽

pp. 247-254 ◽

Cited By ~ 1

Author(s):

Lokman Saleh ◽

Hamid Mcheick ◽

Hicham Ajami ◽

Hafedh Mili ◽

Joumana Dargham

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

The Right Time for Crowd Communication during Campaigns for Sustainable Success of Crowdfunding: Evidence from Kickstarter Platform

Sustainability ◽

10.3390/su12187642 ◽

2020 ◽

Vol 12 (18) ◽

pp. 7642 ◽

Cited By ~ 1

Author(s):

Michael J. Ryoba ◽

Shaojian Qu ◽

Ying Ji ◽

Deqiang Qu

Keyword(s):

Machine Learning ◽

At Risk ◽

Prediction Accuracy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Success Prediction ◽

Combined Effects ◽

Baseline Model ◽

Predicting Success ◽

The Right

Only a small percentage of crowdfunding projects succeed in securing funds, the fact of which puts the sustainability of crowdfunding platforms at risk. Researchers have examined the influences of phased aspects of communication, drawn from updates and comments, on success of crowdfunding campaigns, but in most cases they have focused on the combined effects of the aspects. This paper investigated campaign success contribution of various combinations of phased communication aspects from updates and comments, the best of which can help creators to successfully manage campaigns by focusing on the important communication aspects. Metaheuristic and machine learning algorithms were used to search and evaluate the best combination of phased communication aspects for predicting success using Kickstarter dataset. The study found that the number of updates in phase one, the polarity of comments in phase two, readability of updates and polarity of comments in phase three, and the polarity of comments in phase five are the most important communication aspects in predicting campaign success. Moreover, the success prediction accuracy with the aspects identified after phasing is more than the baseline model without phasing. Our findings can help crowdfunding actors to focus on the important communication aspects leading to improved likelihood of success.

Download Full-text

CHANGE SEMANTIC CONSTRAINED ONLINE DATA CLEANING METHOD FOR REAL-TIME OBSERVATIONAL DATA STREAM

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b2-177-2016 ◽

2016 ◽

Vol XLI-B2 ◽

pp. 177-183

Author(s):

Yulin Ding ◽

Hui Lin ◽

Rongrong Li

Keyword(s):

Real Time ◽

Observational Data ◽

Data Streams ◽

Data Stream ◽

Data Cleaning ◽

Data Distribution ◽

Online Data ◽

Filter Parameter ◽

Changing Patterns ◽

Cleaning Methods

Recent breakthroughs in sensor networks have made it possible to collect and assemble increasing amounts of real-time observational data by observing dynamic phenomena at previously impossible time and space scales. Real-time observational data streams present potentially profound opportunities for real-time applications in disaster mitigation and emergency response, by providing accurate and timeliness estimates of environment’s status. However, the data are always subject to inevitable anomalies (including errors and anomalous changes/events) caused by various effects produced by the environment they are monitoring. The “big but dirty” real-time observational data streams can rarely achieve their full potential in the following real-time models or applications due to the low data quality. Therefore, timely and meaningful online data cleaning is a necessary pre-requisite step to ensure the quality, reliability, and timeliness of the real-time observational data. In general, a straightforward streaming data cleaning approach, is to define various types of models/classifiers representing normal behavior of sensor data streams and then declare any deviation from this model as normal or erroneous data. The effectiveness of these models is affected by dynamic changes of deployed environments. Due to the changing nature of the complicated process being observed, real-time observational data is characterized by diversity and dynamic, showing a typical Big (Geo) Data characters. Dynamics and diversity is not only reflected in the data values, but also reflected in the complicated changing patterns of the data distributions. This means the pattern of the real-time observational data distribution is not stationary or static but changing and dynamic. After the data pattern changed, it is necessary to adapt the model over time to cope with the changing patterns of real-time data streams. Otherwise, the model will not fit the following observational data streams, which may led to large estimation error. In order to achieve the best generalization error, it is an important challenge for the data cleaning methodology to be able to characterize the behavior of data stream distributions and adaptively update a model to include new information and remove old information. However, the complicated data changing property invalidates traditional data cleaning methods, which rely on the assumption of a stationary data distribution, and drives the need for more dynamic and adaptive online data cleaning methods. To overcome these shortcomings, this paper presents a change semantics constrained online filtering method for real-time observational data. Based on the principle that the filter parameter should vary in accordance to the data change patterns, this paper embeds semantic description, which quantitatively depicts the change patterns in the data distribution to self-adapt the filter parameter automatically. Real-time observational water level data streams of different precipitation scenarios are selected for testing. Experimental results prove that by means of this method, more accurate and reliable water level information can be available, which is prior to scientific and prompt flood assessment and decision-making.

Download Full-text