Scalable density based spatial clustering with integrated  one-class SVM for noise reduction

Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class SVM, a machine learning technique for noise reduction, a modified variant of DBSCAN called NRDBSCAN. Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. Further, the proposed technique is parallelized using Spark architecture, thereby increasing its scalability and its ability to handle large amounts of data. Experiments and comparisons with similar techniques indicate high scalability levels and high homogeneity levels in the clustering process.

Download Full-text

Density Based Clustering with Integrated One-Class SVM for Noise Reduction

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v6i3.pp199-208 ◽

2017 ◽

Vol 6 (3) ◽

pp. 199

Author(s):

K. Nafees Ahmed ◽

T. Abdul Razak

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Data Analysis ◽

Noise Reduction ◽

Spatial Clustering ◽

Support Vector ◽

Machine Learning Technique ◽

Learning Classifier ◽

Density Based Clustering ◽

Learning Technique

<p>Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.</p>

Download Full-text

Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

Literator ◽

10.4102/lit.v29i1.99 ◽

2008 ◽

Vol 29 (1) ◽

pp. 21-42 ◽

Cited By ~ 1

Author(s):

S. Pilon ◽

M.J. Puttkammer ◽

G.B. Van Huyssteen

Keyword(s):

Machine Learning ◽

Training Data ◽

Practical Implementation ◽

Manual Annotation ◽

Machine Learning Technique ◽

Rule Based ◽

The Core ◽

Learning Classifier ◽

Learning Technique ◽

Rule Based Approach

The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

Download Full-text

Market Data Analysis by Using Support Vector Machine Learning Technique

Proceedings of International Conference on Computational Intelligence and Data Engineering - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-13-6459-4_3 ◽

2019 ◽

pp. 19-27 ◽

Cited By ~ 1

Author(s):

Raghavendra Reddy ◽

Gopal K. Shyam

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Data Analysis ◽

Support Vector ◽

Machine Learning Technique ◽

Market Data ◽

Learning Technique

Download Full-text

A Machine Learning Technique for Drill Core Hyperspectral Data Analysis

2018 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) ◽

10.1109/whispers.2018.8747022 ◽

2018 ◽

Cited By ~ 1

Author(s):

Cecilia Contreras ◽

Mahdi Khodadadzadeh ◽

Laura Tusa ◽

Pedram Ghamisi ◽

Richard Gloaguen

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Hyperspectral Data ◽

Drill Core ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

What Should Investors Care About? Mutual Fund Ratings by Analysts vs. Machine Learning Technique

SSRN Electronic Journal ◽

10.2139/ssrn.3702749 ◽

2020 ◽

Author(s):

Si Cheng ◽

Ruichang Lu ◽

Xiaojun Zhang

Keyword(s):

Machine Learning ◽

Mutual Fund ◽

Machine Learning Technique ◽

Learning Technique

Download Full-text

The Development of a Quantitative Precipitation Forecast Correction Technique Based on Machine Learning for Hydrological Applications

Atmosphere ◽

10.3390/atmos11010111 ◽

2020 ◽

Vol 11 (1) ◽

pp. 111 ◽

Cited By ~ 2

Author(s):

Chul-Min Ko ◽

Yeong Yun Jeong ◽

Young-Mi Lee ◽

Byung-Sik Kim

Keyword(s):

Machine Learning ◽

Heavy Rainfall ◽

Extreme Rainfall ◽

Machine Learning Techniques ◽

Precipitation Forecast ◽

Machine Learning Technique ◽

Rainfall Forecast ◽

Quantitative Precipitation Forecast ◽

Correction Technique ◽

Learning Technique

This study aimed to enhance the accuracy of extreme rainfall forecast, using a machine learning technique for forecasting hydrological impact. In this study, machine learning with XGBoost technique was applied for correcting the quantitative precipitation forecast (QPF) provided by the Korea Meteorological Administration (KMA) to develop a hydrological quantitative precipitation forecast (HQPF) for flood inundation modeling. The performance of machine learning techniques for HQPF production was evaluated with a focus on two cases: one for heavy rainfall events in Seoul and the other for heavy rainfall accompanied by Typhoon Kong-rey (1825). This study calculated the well-known statistical metrics to compare the error derived from QPF-based rainfall and HQPF-based rainfall against the observational data from the four sites. For the heavy rainfall case in Seoul, the mean absolute errors (MAE) of the four sites, i.e., Nowon, Jungnang, Dobong, and Gangnam, were 18.6 mm/3 h, 19.4 mm/3 h, 48.7 mm/3 h, and 19.1 mm/3 h for QPF and 13.6 mm/3 h, 14.2 mm/3 h, 33.3 mm/3 h, and 12.0 mm/3 h for HQPF, respectively. These results clearly indicate that the machine learning technique is able to improve the forecasting performance for localized rainfall. In addition, the HQPF-based rainfall shows better performance in capturing the peak rainfall amount and spatial pattern. Therefore, it is considered that the HQPF can be helpful to improve the accuracy of intense rainfall forecast, which is subsequently beneficial for forecasting floods and their hydrological impacts.

Download Full-text