An Efficient Approach of Extracting Frequent Itemsets from Large Data Using HDFS Framework

Author(s):  
Prajakta G. Kulkarni ◽  
S. R. Khonde
Author(s):  
Weigang Huo ◽  
Xingjie Feng ◽  
Zhiyuan Zhang

Keeping the generated fuzzy frequent itemsets up-to-date and discovering the new fuzzy frequent itemsets are challenging problems in dynamic databases. In this paper, the classical H-struct structure is extended to mining fuzzy frequent itemsets. The extended H-mine algorithm can use any t-norm operator to calculate the support of fuzzy itemset. The FP-tree-based structure called the Initial-FP-tree and the New-FP-tree are built to maintain the fuzzy frequent itemsets in the original database and the new inserted transactions respectively. The strategy of incremental mining of fuzzy frequent itemsets is achieved by breath-first-traversing the Initial-FP-tree and the New-FP-tree. All of the fuzzy frequent itemsets in the updated database can be obtained by traversing the Initial-FP-tree. The experiments on real datasets show that the proposed approach runs faster than the batch extended H-mine algorithm. Comparing with the existing algorithm for incremental mining fuzzy frequent itemsets, the proposed approach is superior in terms of the execution time. The memory cost of the proposed approach is lower than that of the existing algorithm when the minimum support threshold is low.


2005 ◽  
Vol 149 (1) ◽  
pp. 17-29 ◽  
Author(s):  
Dieter Typke ◽  
Robert A. Nordmeyer ◽  
Arthur Jones ◽  
Juyoung Lee ◽  
Agustin Avila-Sakar ◽  
...  

Author(s):  
John A. Hunt

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].


Author(s):  
Thomas W. Shattuck ◽  
James R. Anderson ◽  
Neil W. Tindale ◽  
Peter R. Buseck

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.


Author(s):  
Hakan Ancin

This paper presents methods for performing detailed quantitative automated three dimensional (3-D) analysis of cell populations in thick tissue sections while preserving the relative 3-D locations of cells. Specifically, the method disambiguates overlapping clusters of cells, and accurately measures the volume, 3-D location, and shape parameters for each cell. Finally, the entire population of cells is analyzed to detect patterns and groupings with respect to various combinations of cell properties. All of the above is accomplished with zero subjective bias.In this method, a laser-scanning confocal light microscope (LSCM) is used to collect optical sections through the entire thickness (100 - 500μm) of fluorescently-labelled tissue slices. The acquired stack of optical slices is first subjected to axial deblurring using the expectation maximization (EM) algorithm. The resulting isotropic 3-D image is segmented using a spatially-adaptive Poisson based image segmentation algorithm with region-dependent smoothing parameters. Extracting the voxels that were labelled as "foreground" into an active voxel data structure results in a large data reduction.


Sign in / Sign up

Export Citation Format

Share Document