PEMODELAN KEMISKINAN DI JAWA MENGGUNAKAN  BAYESIAN SPASIAL PROBIT PENDEKATAN  INTEGRATED NESTED LAPLACE APPROXIMATION (INLA)

Poverty is a complex and multidimensional problem so that it becomes a development priority. Applications of poverty modeling in discrete data are still few and applications of the Bayesian paradigm are also still few. The Bayes Method is a parameter estimation method that utilizes initial information (prior) and sample information so that it can provide predictions that have a higher accuracy than the classical methods. Bayes inference using INLA approach provides faster computation than MCMC and possible uses large data sets. This study aims to model Javanese poverty using the Bayesian Spatial Probit with the INLA approach with three weighting matrices, namely K-Nearest Neighbor (KNN), Inverse Distance, and Exponential Distance. Furthermore, the result showed poverty analysis in Java based on the best model is using Bayesian SAR Probit INLA with KNN weighting matrix produced the highest level of classification accuracy, with specificity is 85.45%, sensitivity is 93.75%, and accuracy is 89.92%.

Download Full-text

ClusterTree: Integration of cluster representation and nearest-neighbor search for large data sets with high dimensions

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2003.1232281 ◽

2003 ◽

Vol 15 (5) ◽

pp. 1316-1337 ◽

Cited By ~ 24

Author(s):

Dantong Yu ◽

Aidong Zhang

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Nearest Neighbor Search ◽

Large Data Sets ◽

Data Sets ◽

High Dimensions ◽

Neighbor Search ◽

Cluster Representation

Download Full-text

Scalable Non-Parametric Methods for Large Data Sets

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch260 ◽

2011 ◽

pp. 1708-1713

Author(s):

V. Suresh Babu ◽

P. Viswanath ◽

Narasimha M. Murty

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Parametric Methods ◽

Clustering Method ◽

Data Set ◽

Computational Burden ◽

Set Size ◽

Non Parametric

Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996). The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are O(n) where n is the training set size. The time complexity of DBSCAN is O(n2). So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example, Leaders-subleaders method and l-DBSCAN method are of this type (Vijaya, Murthy & Subramanian, 2004 and Viswanath & Rajwala, 2006). These two remedies can reduce the computational burden, but this can also result in a poor performance of the method. Using enriched prototypes can improve the performance as done in (Asharaf & Murthy, 2003) where the prototypes are derived using adaptive rough fuzzy set theory and as in (Suresh Babu & Viswanath, 2007) where the prototypes are used along with their relative weights. Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980), the k-means method (Jain, Dubes, & Chen, 1987), etc., which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, etc. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the counted-leader method which along with deriving the leaders preserves the crucial density information in the form of a count which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the counted k-nearest leader classifier (ck-NLC) which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called l-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.

Download Full-text

Fast Nearest Neighbor Condensation for Large Data Sets Classification

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2007.190645 ◽

2007 ◽

Vol 19 (11) ◽

pp. 1450-1464 ◽

Cited By ~ 112

Author(s):

Fabrizio Angiulli

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Nearest Neighbor-Based Clustering Algorithm for Large Data Sets

Advances in Intelligent Systems and Computing - Advances in Computer Communication and Computational Sciences ◽

10.1007/978-981-13-0344-9_6 ◽

2018 ◽

pp. 73-84

Author(s):

Yadav Pankaj Kumar ◽

Sriniwas Pandey ◽

Mamata Samal ◽

Mohanty Sraban Kumar

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Clustering Based on a Novel Density Estimation Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.748.590 ◽

2013 ◽

Vol 748 ◽

pp. 590-594

Author(s):

Li Liao ◽

Yong Gang Lu ◽

Xu Rong Chen

Keyword(s):

Density Estimation ◽

Nearest Neighbor ◽

Mean Shift ◽

Estimation Method ◽

Synthetic Data ◽

Real Data ◽

Data Sets ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Data Set

We propose a novel density estimation method using both the k-nearest neighbor (KNN) graph and the potential field of the data points to capture the local and global data distribution information respectively. The clustering is performed based on the computed density values. A forest of trees is built using each data point as the tree node. And the clusters are formed according to the trees in the forest. The new clustering method is evaluated by comparing with three popular clustering methods, K-means++, Mean Shift and DBSCAN. Experiments on two synthetic data sets and one real data set show that our approach can effectively improve the clustering results.

Download Full-text

Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2007.190665 ◽

2007 ◽

Vol 19 (12) ◽

pp. 1593-1606 ◽

Cited By ~ 22

Author(s):

F. Angiulli ◽

G. Folino

Keyword(s):

Nearest Neighbor ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Wrapper Feature Selection based on Genetic Algorithm for Recognizing Objects from Satellite Imagery

Journal of Information Technology Research ◽

10.4018/jitr.2015070101 ◽

2015 ◽

Vol 8 (3) ◽

pp. 1-20 ◽

Cited By ~ 1

Author(s):

Nabil M. Hewahi ◽

Eyad A. Alashqar

Keyword(s):

Feature Selection ◽

Satellite Imagery ◽

Nearest Neighbor ◽

Large Data ◽

Research Area ◽

Small Subset ◽

Data Sets ◽

K Nearest Neighbor ◽

Spatial Features ◽

Maintenance Cycle

Object recognition is a research area that aims to associate objects to categories or classes. The recognition of object specific geospatial features, such as roads, buildings and rivers, from high-resolution satellite imagery is a time consuming and expensive problem in the maintenance cycle of a Geographic Information System (GIS). Feature selection is the task of selecting a small subset from original features that can achieve maximum classification accuracy and reduce data dimensionality. This subset of features has some very important benefits like, it reduces computational complexity of learning algorithms, saves time, improve accuracy and the selected features can be insightful for the people involved in problem domain. This makes feature selection as an indispensable task in classification task. In this work, the authors propose a new approach that combines Genetic Algorithms (GA) with Correlation Ranking Filter (CRF) wrapper to eliminate unimportant features and obtain better features set that can show better results with various classifiers such as Neural Networks (NN), K-nearest neighbor (KNN), and Decision trees. The approach is based on GA as an optimization algorithm to search the space of all possible subsets related to object geospatial features set for the purpose of recognition. GA is wrapped with three different classifier algorithms namely neural network, k-nearest neighbor and decision tree J48 as subset evaluating mechanism. The GA-ANN, GA-KNN and GA-J48 methods are implemented using the WEKA software on dataset that contains 38 extracted features from satellite images using ENVI software. The proposed wrapper approach incorporated the Correlation Ranking Filter (CRF) for spatial features to remove unimportant features. Results suggest that GA based neural classifiers and using CRF for spatial features are robust and effective in finding optimal subsets of features from large data sets.

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text

Faculty Opinions recommendation of Detecting novel associations in large data sets.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13805958.793484294 ◽

2014 ◽

Author(s):

Daniel Lee

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Novel Associations

Download Full-text