Evaluation of a Hierarchical Agglomerative Clustering Method Applied to WIBS Laboratory Data for Improved Discrimination of Biological Particles by Comparing Data Preparation Techniques.

2018 ◽  
Author(s):  
Anonymous
2018 ◽  
Vol 11 (8) ◽  
pp. 4929-4942 ◽  
Author(s):  
Nicole J. Savage ◽  
J. Alex Huffman

Abstract. Hierarchical agglomerative clustering (HAC) analysis has been successfully applied to several sets of ambient data (e.g., Crawford et al., 2015; Robinson et al., 2013) and with respect to standardized particles in the laboratory environment (Ruske et al., 2017, 2018). Here we show for the first time a systematic application of HAC to a comprehensive set of laboratory data collected for many individual particle types using the wideband integrated bioaerosol sensor (WIBS-4A) (Savage et al., 2017). The impact of the ratio of particle concentrations on HAC results was investigated, showing that clustering quality can vary dramatically as a function of ratio. Six strategies for particle preprocessing were also compared, concluding that using raw fluorescence intensity (without normalizing to particle size) and logarithmically transforming data values (scenario B) consistently produced the highest-quality results for the particle types analyzed. A total of 23 one-to-one matchups of individual particles types was investigated. Results showed a cluster misclassification of < 15 % for 12 of 17 numerical experiments using one biological and one nonbiological particle type each. Inputting fluorescence data using a baseline +3σ threshold produced a lower degree of misclassification than when inputting either all particles (without a fluorescence threshold) or a baseline +9σ threshold. Lastly, six numerical simulations of mixtures of four to seven components were analyzed using HAC. These results show that a range of 12 %–24 % of fungal clusters was consistently misclassified by inclusion of a mixture of nonbiological materials, whereas bacteria and diesel soot were each able to be separated with nearly 100 % efficiency. The study gives significant support to clustering analysis commonly being applied to data from commercial ultraviolet laser/light-induced fluorescence (UV-LIF) instruments used for bioaerosol research across the globe and provides practical tools that will improve clustering results within scientific studies as a part of diverse research disciplines.


2018 ◽  
Author(s):  
Nicole Savage ◽  
J. Alex Huffman

Abstract. Hierarchical agglomerative clustering (HAC) analysis has been successfully applied to several sets of ambient data (e.g. Crawford et al., 2015; Robinson et al., 2013) and with respect to standardized particles in the laboratory environment (Ruske et al., 2017). Here we show for the first time a systematic application of HAC to a comprehensive set of laboratory data collected using the wideband integrated bioaerosol sensor (WIBS-4A) (Savage et al., 2017). The impact of particle ratio on HAC results was investigated, showing that clustering quality can vary dramatically as a function of ratio. Six strategies for particle pre-processing were also compared, concluding that using raw fluorescence intensity (without normalizing to particle size) and inputting all data in logarithmic bins consistently produced the highest quality results. A total of 23 one-on-one matchups of individual particles types were investigated. Results showed cluster misclassification of


2010 ◽  
Vol 439-440 ◽  
pp. 1306-1311
Author(s):  
Fang Li ◽  
Qun Xiong Zhu

LSI based hierarchical agglomerative clustering algorithm is studied. Aiming to the problems of LSI based hierarchical agglomerative clustering method, NMF based hierarchical clustering method is proposed and analyzed. Two ways of implementing NMF based method are introduced. Finally the result of two groups of experiment based on the TanCorp document corpora show that the method proposed is effective.


Author(s):  
Nadjla Elong ◽  
Sidi Ahmed Rahal

For a deeper and richer analytic processing of medical datasets, feature selection aims to eliminate redundant and irrelevant features from the data. While filter has been touted as one of the simplest methods for feature selection, its applications have generally failed to identify and deal with embedded similarities among features. In this research, a hybrid approach for feature selection based on combining the filter method with the hierarchical agglomerative clustering method is proposed to eliminate irrelevant and redundant features in four medical datasets. A formal evaluation of the proposed approach unveils major improvements in the classification accuracy when results are compared to those obtained via only the applications of the filter methods and/or more classical-based feature selection approaches.


Sign in / Sign up

Export Citation Format

Share Document