A Parallel Algorithm to Generate Formal Concepts for Large Data

Author(s):  
Huaiguo Fu ◽  
Engelbert Mephu Nguifo
2021 ◽  
Vol 30 (1) ◽  
pp. 479-486
Author(s):  
Lingrui Bu ◽  
Hui Zhang ◽  
Haiyan Xing ◽  
Lijun Wu

Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.


2017 ◽  
Vol 117 ◽  
pp. 46-55 ◽  
Author(s):  
Anders L. Madsen ◽  
Frank Jensen ◽  
Antonio Salmerón ◽  
Helge Langseth ◽  
Thomas D. Nielsen

Author(s):  
John A. Hunt

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].


Author(s):  
Thomas W. Shattuck ◽  
James R. Anderson ◽  
Neil W. Tindale ◽  
Peter R. Buseck

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.


Author(s):  
Hakan Ancin

This paper presents methods for performing detailed quantitative automated three dimensional (3-D) analysis of cell populations in thick tissue sections while preserving the relative 3-D locations of cells. Specifically, the method disambiguates overlapping clusters of cells, and accurately measures the volume, 3-D location, and shape parameters for each cell. Finally, the entire population of cells is analyzed to detect patterns and groupings with respect to various combinations of cell properties. All of the above is accomplished with zero subjective bias.In this method, a laser-scanning confocal light microscope (LSCM) is used to collect optical sections through the entire thickness (100 - 500μm) of fluorescently-labelled tissue slices. The acquired stack of optical slices is first subjected to axial deblurring using the expectation maximization (EM) algorithm. The resulting isotropic 3-D image is segmented using a spatially-adaptive Poisson based image segmentation algorithm with region-dependent smoothing parameters. Extracting the voxels that were labelled as "foreground" into an active voxel data structure results in a large data reduction.


1980 ◽  
Vol 19 (04) ◽  
pp. 187-194
Author(s):  
J.-Ph. Berney ◽  
R. Baud ◽  
J.-R. Scherrer

It is well known that Frame Selection Systems (FFS) have proved both popular and effective in physician-machine and patient-machine dialogue. A formal algorithm for definition of a Frame Selection System for handling man-machine dialogue is presented here. Besides, it is shown how the natural medical language can be handled using the approach of a tree branching logic. This logic appears to be based upon ordered series of selections which enclose a syntactic structure. The external specifications are discussed with regard to convenience and efficiency. Knowing that all communication between the user and the application programmes is handled only by FSS software, FSS contributes to achieving modularity and, therefore, also maintainability in a transaction-oriented system with a large data base and concurrent accesses.


Sign in / Sign up

Export Citation Format

Share Document