Meta_LASH Tree: Bagging at Meta Level Using LASSO Regression Hoeffding Tree for Streaming Data

Author(s):  
D.Christy Sujatha ◽  
J.Gnana Jayanthi
2020 ◽  
Vol 24 (04) ◽  
pp. 3022-3033
Author(s):  
Christy Sujatha D ◽  
Gnana Jayanthi Dr.J

2020 ◽  
Vol 11 (1) ◽  
pp. 15-26
Author(s):  
Jay Gandhi ◽  
Vaibhav Gandhi

Data stream mining has become an interesting analysis topic and it is a growing interest in data discovery method. There are several applications supporting stream data processing like device network, electronic network, etc. Our approach AhtNODE (Adaptive Hoeffding Tree based NOvel class DEtection) detects novel class in the presence of concept drift in streaming data. It addresses there are three challenges of streaming data: infinite length, concept drift, and concept evolution. This approach automatically detects the novel class whenever it arrives in the data stream. It is a multi-class approach that distinguishes novel class from existing classes. The authors tend to apply the Adaptive Hoeffding Tree as a classification model that is also used to handle the concept drift situation. Previous approaches used the ensemble model to handle concept drift. In AHT, classification is done in the single pass. The experiment result proves the effectiveness of AhtNODE compared to existing ensemble classifier in terms of classification accuracy, speed and use of memory.


2017 ◽  
Vol 60 ◽  
pp. 1031-1055 ◽  
Author(s):  
Rocco De Rosa ◽  
Nicolò Cesa-Bianchi

Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. We also extend our confidence analysis to a selective sampling setting, in which the decision tree learner adaptively decides which labels to query in the stream. We provide theoretical guarantees bounding the probability that the decision tree learned via our selective sampling strategy classifies suboptimally the next example in the stream. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by state-of-the-art techniques. In addition to that, our active learning module empirically uses fewer labels without significantly hurting the performance.


Author(s):  
Hannah Lee

This paper is the attempt to show how system theory could provide critical insight into the transdisciplinary field of library and information sciences (LIS). It begins with a discussion on the categorization of library and information sciences as an academic and professional field (or rather, the lack of evidence on the subject) and what is exactly meant by system theory, drawing upon the general system theory established by Ludwig von Bertalanffy. The main conversation of this paper focuses on the inadequacies of current meta-level discussions of LIS and the benefits of general system theory (particularly when considering the exponential rapidity in which information travels) with LIS.


Author(s):  
Yu.V. Andreyev ◽  
◽  
L.V. Kuzmin ◽  
M.G. Popov ◽  
A.I. Ryshov ◽  
...  

2019 ◽  
Vol 23 (1) ◽  
pp. 346-357
Author(s):  
Vithya G ◽  
Naren J ◽  
Varun V

Author(s):  
Vijay Kumar Dwivedi ◽  
Manoj Madhava Gore

Background: Stock price prediction is a challenging task. The social, economic, political, and various other factors cause frequent abrupt changes in the stock price. This article proposes a historical data-based ensemble system to predict the closing stock price with higher accuracy and consistency over the existing stock price prediction systems. Objective: The primary objective of this article is to predict the closing price of a stock for the next trading in more accurate and consistent manner over the existing methods employed for the stock price prediction. Method: The proposed system combines various machine learning-based prediction models employing least absolute shrinkage and selection operator (LASSO) regression regularization technique to enhance the accuracy of stock price prediction system as compared to any one of the base prediction models. Results: The analysis of results for all the eleven stocks (listed under Information Technology sector on the Bombay Stock Exchange, India) reveals that the proposed system performs best (on all defined metrics of the proposed system) for training datasets and test datasets comprising of all the stocks considered in the proposed system. Conclusion: The proposed ensemble model consistently predicts stock price with a high degree of accuracy over the existing methods used for the prediction.


2019 ◽  
Vol 9 (12) ◽  
pp. 2560 ◽  
Author(s):  
Yunkon Kim ◽  
Eui-Nam Huh

This paper explores data caching as a key factor of edge computing. State-of-the-art research of data caching on edge nodes mainly considers reactive and proactive caching, and machine learning based caching, which could be a heavy task for edge nodes. However, edge nodes usually have relatively lower computing resources than cloud datacenters as those are geo-distributed from the administrator. Therefore, a caching algorithm should be lightweight for saving computing resources on edge nodes. In addition, the data caching should be agile because it has to support high-quality services on edge nodes. Accordingly, this paper proposes a lightweight, agile caching algorithm, EDCrammer (Efficient Data Crammer), which performs agile operations to control caching rate for streaming data by using the enhanced PID (Proportional-Integral-Differential) controller. Experimental results using this lightweight, agile caching algorithm show its significant value in each scenario. In four common scenarios, the desired cache utilization was reached in 1.1 s on average and then maintained within a 4–7% deviation. The cache hit ratio is about 96%, and the optimal cache capacity is around 1.5 MB. Thus, EDCrammer can help distribute the streaming data traffic to the edge nodes, mitigate the uplink load on the central cloud, and ultimately provide users with high-quality video services. We also hope that EDCrammer can improve overall service quality in 5G environment, Augmented Reality/Virtual Reality (AR/VR), Intelligent Transportation System (ITS), Internet of Things (IoT), etc.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2298
Author(s):  
Pablo Cano Marchal ◽  
Chiara Sanmartin ◽  
Silvia Satorres Martínez ◽  
Juan Gómez Ortega ◽  
Fabio Mencarelli ◽  
...  

The organoleptic profile of a Virgin Olive Oil is a key quality parameter that is currently obtained by human sensory panels. The development of an instrumental technique capable of providing information about this profile quickly and online is of great interest. This work employed a general purpose e-nose, in lab conditions, to predict the level of fruity aroma and the presence of defects in Virgin Olive Oils. The raw data provided by the e-nose were used to extract a set of features that fed a regressor to predict the level of fruity aroma and a classifier to detect the presence of defects. The results obtained were a mean validation error of 0.5 units for the prediction of fruity aroma using lasso regression; and 88% accuracy for the defect detection using logistic regression. Finally, the identification of two out of ten specific sensors of the e-nose that can provide successful results paves the way to the design of low-cost specific electronic noses for this application.


Author(s):  
A. Lenardic ◽  
J. Seales

The term habitable is used to describe planets that can harbour life. Debate exists as to specific conditions that allow for habitability but the use of the term as a planetary variable has become ubiquitous. This paper poses a meta-level question: What type of variable is habitability? Is it akin to temperature, in that it is something that characterizes a planet, or is something that flows through a planet, akin to heat? That is, is habitability a state or a process variable? Forth coming observations can be used to discriminate between these end-member hypotheses. Each has different implications for the factors that lead to differences between planets (e.g. the differences between Earth and Venus). Observational tests can proceed independent of any new modelling of planetary habitability. However, the viability of habitability as a process can influence future modelling. We discuss a specific modelling framework based on anticipating observations that can discriminate between different views of habitability.


Sign in / Sign up

Export Citation Format

Share Document