A Review of Classification and Novel Class Detection Technique of Data Streams

Manish Rai; Rekha Pandit

doi:10.24297/ijct.v3i2c.2891

A Review of Classification and Novel Class Detection Technique of Data Streams

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2c.2891 ◽

2012 ◽

Vol 3 (2) ◽

pp. 314-316

Author(s):

Manish Rai ◽

Rekha Pandit

Keyword(s):

Machine Learning ◽

Data Streams ◽

Concept Drift ◽

Data Classification ◽

Classification Model ◽

Infinite Length ◽

Stream Data ◽

Machine Learning Technique ◽

Feature Evaluation ◽

Learning Technique

Stream data classification suffered from a problem of infinite length, concept evaluation, feature evaluation and data drift. Data stream labeling is more challenging than label static data because of several unique properties of data streams. Data streams are suppose to have infinite length, which makes it difficult to store and use all the historical data for training. Earlier multi-pass machine learning technique is not directly applied to data streams. Data streams discover concept-drift, which occurs when the discontinue concept of the data changes over time. In order to address concept drift, a classification model must endlessly adapt itself to the most recent concept. Various authors reduce these problem using machine learning approach and feature optimization technique. In this paper we present various method for reducing such problem occurred in stream data classification. Here we also discuss a machine learning technique for feature evaluation process for generation of novel class.

Download Full-text

A study secure multi authentication based data classification model in cloud based system

International Journal of Advances in Applied Sciences ◽

10.11591/ijaas.v9.i3.pp240-254 ◽

2020 ◽

Vol 9 (3) ◽

pp. 240

Author(s):

Sakshi Kaushal ◽

Bala Buksh

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Data Classification ◽

Classification Model ◽

Sensitive Data ◽

Learning Technique ◽

Mathematical Algorithms ◽

Encryption Algorithms ◽

Cloud Applications ◽

Technology Resources

Cloud computing is the most popular term among enterprises and news. The concepts come true because of fast internet bandwidth and advanced cooperation technology. Resources on the cloud can be accessed through internet without self built infrastructure. Cloud computing is effectively manage the security in the cloud applications. Data classification is a machine learning technique used to predict the class of the unclassified data. Data mining uses different tools to know the unknown, valid patterns and relationships in the dataset. These tools are mathematical algorithms, statistical models and Machine Learning (ML) algorithms. In this paper author uses improved Bayesian technique to classify the data and encrypt the sensitive data using hybrid stagnography. The encrypted and non encrypted sensitive data is sent to cloud environment and evaluate the parameters with different encryption algorithms.

Download Full-text

Novel Class Detection with Concept Drift in Data Stream - AhtNODE

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2020010102 ◽

2020 ◽

Vol 11 (1) ◽

pp. 15-26

Author(s):

Jay Gandhi ◽

Vaibhav Gandhi

Keyword(s):

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Streaming Data ◽

Classification Model ◽

Infinite Length ◽

The Novel ◽

Stream Data ◽

Hoeffding Tree ◽

Discovery Method

Data stream mining has become an interesting analysis topic and it is a growing interest in data discovery method. There are several applications supporting stream data processing like device network, electronic network, etc. Our approach AhtNODE (Adaptive Hoeffding Tree based NOvel class DEtection) detects novel class in the presence of concept drift in streaming data. It addresses there are three challenges of streaming data: infinite length, concept drift, and concept evolution. This approach automatically detects the novel class whenever it arrives in the data stream. It is a multi-class approach that distinguishes novel class from existing classes. The authors tend to apply the Adaptive Hoeffding Tree as a classification model that is also used to handle the concept drift situation. Previous approaches used the ensemble model to handle concept drift. In AHT, classification is done in the single pass. The experiment result proves the effectiveness of AhtNODE compared to existing ensemble classifier in terms of classification accuracy, speed and use of memory.

Download Full-text

Breast cancer prediction using an optimal machine learning technique for next generation sequences

Concurrent Engineering ◽

10.1177/1063293x21991808 ◽

2021 ◽

pp. 1063293X2199180

Author(s):

Babymol Kurian ◽

VL Jyothi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Decision Tree ◽

Classification Model ◽

Supervised Machine Learning ◽

Support Vector ◽

Next Generation ◽

Machine Learning Technique ◽

Cancer Prediction ◽

Learning Technique

A wide reach on cancer prediction and detection using Next Generation Sequencing (NGS) by the application of artificial intelligence is highly appreciated in the current scenario of the medical field. Next generation sequences were extracted from NCBI (National Centre for Biotechnology Information) gene repository. Sequences of normal Homo sapiens (Class 1), BRCA1 (Class 2) and BRCA2 (Class 3) were extracted for Machine Learning (ML) purpose. The total volume of datasets extracted for the process were 1580 in number under four categories of 50, 100, 150 and 200 sequences. The breast cancer prediction process was carried out in three major steps such as feature extraction, machine learning classification and performance evaluation. The features were extracted with sequences as input. Ten features of DNA sequences such as ORF (Open Reading Frame) count, individual nucleobase average count of A, T, C, G, AT and GC-content, AT/GC composition, G-quadruplex occurrence, MR (Mutation Rate) were extracted from three types of sequences for the classification process. The sequence type was also included as a target variable to the feature set with values 0, 1 and 2 for classes 1, 2 and 3 respectively. Nine various supervised machine learning techniques like LR (Logistic Regression statistical model), LDA (Linear Discriminant analysis model), k-NN (k nearest neighbours’ algorithm), DT (Decision tree technique), NB (Naive Bayes classifier), SVM (Support-Vector Machine algorithm), RF (Random Forest learning algorithm), AdaBoost (AB) and Gradient Boosting (GB) were employed on four various categories of datasets. Of all supervised models, decision tree machine learning technique performed most with maximum accuracy in classification of 94.03%. Classification model performance was evaluated using precision, recall, F1-score and support values wherein F1-score was most similar to the classification accuracy.

Download Full-text

Postsurgery Classification of Best-Corrected Visual Acuity Changes Based on Pterygium Characteristics Using the Machine Learning Technique

The Scientific World JOURNAL ◽

10.1155/2021/6211006 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Fatin Nabihah Jais ◽

Mohd Zulfaezal Che Azemin ◽

Mohd Radzi Hilmi ◽

Mohd Izzuddin Mohd Tamrin ◽

Khairidzan Mohd Kamal

Keyword(s):

Machine Learning ◽

Visual Acuity ◽

Visual Impairment ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Machine Learning Technique ◽

Corrected Visual Acuity ◽

Learning Technique ◽

Best Corrected Visual Acuity

Introduction. Early detection of visual symptoms in pterygium patients is crucial as the progression of the disease can cause visual disruption and contribute to visual impairment. Best-corrected visual acuity (BCVA) and corneal astigmatism influence the degree of visual impairment due to direct invasion of fibrovascular tissue into the cornea. However, there were different characteristics of pterygium used to evaluate the severity of visual impairment, including fleshiness, size, length, and redness. The innovation of machine learning technology in visual science may contribute to developing a highly accurate predictive analytics model of BCVA outcomes in postsurgery pterygium patients. Aim. To produce an accurate model of BCVA changes of postpterygium surgery according to its morphological characteristics by using the machine learning technique. Methodology. A retrospective of the secondary dataset of 93 samples of pterygium patients with different pterygium attributes was used and imported into four different machine learning algorithms in RapidMiner software to predict the improvement of BCVA after pterygium surgery. Results. The performance of four machine learning techniques were evaluated, and it showed the support vector machine (SVM) model had the highest average accuracy (94.44% ± 5.86%), specificity (100%), and sensitivity (92.14% ± 8.33%). Conclusion. Machine learning algorithms can produce a highly accurate postsurgery classification model of BCVA changes using pterygium characteristics.

Download Full-text

Stream Classification Algorithm Based on Decision Tree

Mobile Information Systems ◽

10.1155/2021/3103053 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jinlin Guo ◽

Haoran Wang ◽

Xinwei Li ◽

Li Zhang

Keyword(s):

Decision Tree ◽

Concept Drift ◽

Data Classification ◽

Classification Algorithm ◽

Current Data ◽

Classification Model ◽

Stream Data ◽

Integration Algorithm ◽

Stream Classification ◽

Model Classification

Due to the rise of many fields such as e-commerce platforms, a large number of stream data has emerged. The incomplete labeling problem and concept drift problem of these data pose a huge challenge to the existing stream data classification methods. In this respect, a dynamic stream data classification algorithm is proposed for the stream data. For the incomplete labeling problem, this method introduces randomization and iterative strategy based on the very fast decision tree VFDT algorithm to design an iterative integration algorithm, and the algorithm uses the previous model classification result as the next model input and implements the voting mechanism for new data classification. At the same time, the window mechanism is used to store data and calculate the data distribution characteristics in the window, then, combined with the calculated result and the predicted amount of data to adjust the size of the sliding window. Experiments show the superiority of the algorithm in classification accuracy. The aim of the study is to compare different algorithms to evaluate whether classification model adapts to the current data environment.

Download Full-text

The classification model of social functional remission in patients with schizophrenia treated with Paliperidone palmitate using machine learning technique

10.26226/morressier.5785edccd462b80296c98f4c ◽

2016 ◽

Author(s):

Hisanori Kobayashi

Keyword(s):

Machine Learning ◽

Classification Model ◽

Paliperidone Palmitate ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

Improvement of LiDAR data classification algorithm using the machine learning technique

Polarization Science and Remote Sensing IX ◽

10.1117/12.2525415 ◽

2019 ◽

Author(s):

Md. Ali Haider ◽

Songxin Tan

Keyword(s):

Machine Learning ◽

Data Classification ◽

Classification Algorithm ◽

Lidar Data ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

The classification model of social functional remission in patients with schizophrenia treated with paliperidone palmitate using machine learning technique

European Neuropsychopharmacology ◽

10.1016/s0924-977x(16)31524-3 ◽

2016 ◽

Vol 26 ◽

pp. S504-S505

Author(s):

H. Kobayashi ◽

T. Ohnishi ◽

R. Nakagawa ◽

K. Yoshizawa

Keyword(s):

Machine Learning ◽

Classification Model ◽

Paliperidone Palmitate ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

What Should Investors Care About? Mutual Fund Ratings by Analysts vs. Machine Learning Technique

SSRN Electronic Journal ◽

10.2139/ssrn.3702749 ◽

2020 ◽

Author(s):

Si Cheng ◽

Ruichang Lu ◽

Xiaojun Zhang

Keyword(s):

Machine Learning ◽

Mutual Fund ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

The Development of a Quantitative Precipitation Forecast Correction Technique Based on Machine Learning for Hydrological Applications

Atmosphere ◽

10.3390/atmos11010111 ◽

2020 ◽

Vol 11 (1) ◽

pp. 111 ◽

Cited By ~ 2

Author(s):

Chul-Min Ko ◽

Yeong Yun Jeong ◽

Young-Mi Lee ◽

Byung-Sik Kim

Keyword(s):

Machine Learning ◽

Heavy Rainfall ◽

Extreme Rainfall ◽

Machine Learning Techniques ◽

Precipitation Forecast ◽

Machine Learning Technique ◽

Rainfall Forecast ◽

Quantitative Precipitation Forecast ◽

Correction Technique ◽

Learning Technique

This study aimed to enhance the accuracy of extreme rainfall forecast, using a machine learning technique for forecasting hydrological impact. In this study, machine learning with XGBoost technique was applied for correcting the quantitative precipitation forecast (QPF) provided by the Korea Meteorological Administration (KMA) to develop a hydrological quantitative precipitation forecast (HQPF) for flood inundation modeling. The performance of machine learning techniques for HQPF production was evaluated with a focus on two cases: one for heavy rainfall events in Seoul and the other for heavy rainfall accompanied by Typhoon Kong-rey (1825). This study calculated the well-known statistical metrics to compare the error derived from QPF-based rainfall and HQPF-based rainfall against the observational data from the four sites. For the heavy rainfall case in Seoul, the mean absolute errors (MAE) of the four sites, i.e., Nowon, Jungnang, Dobong, and Gangnam, were 18.6 mm/3 h, 19.4 mm/3 h, 48.7 mm/3 h, and 19.1 mm/3 h for QPF and 13.6 mm/3 h, 14.2 mm/3 h, 33.3 mm/3 h, and 12.0 mm/3 h for HQPF, respectively. These results clearly indicate that the machine learning technique is able to improve the forecasting performance for localized rainfall. In addition, the HQPF-based rainfall shows better performance in capturing the peak rainfall amount and spatial pattern. Therefore, it is considered that the HQPF can be helpful to improve the accuracy of intense rainfall forecast, which is subsequently beneficial for forecasting floods and their hydrological impacts.

Download Full-text