A Confidence-Based Hierarchical Feature Clustering Algorithm for Text Classification

Author(s):  
Jung-Yi Jiang ◽  
Kai-Tai Yin ◽  
Shie-Jue Lee



2011 ◽  
Vol 23 (3) ◽  
pp. 335-349 ◽  
Author(s):  
Jung-Yi Jiang ◽  
Ren-Jia Liou ◽  
Shie-Jue Lee


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Kan Huang ◽  
Yong Zhang ◽  
Bo Lv ◽  
Yongbiao Shi

Automatic estimation of salient object without any prior knowledge tends to greatly enhance many computer vision tasks. This paper proposes a novel bottom-up based framework for salient object detection by first modeling background and then separating salient objects from background. We model the background distribution based on feature clustering algorithm, which allows for fully exploiting statistical and structural information of the background. Then a coarse saliency map is generated according to the background distribution. To be more discriminative, the coarse saliency map is enhanced by a two-step refinement which is composed of edge-preserving element-level filtering and upsampling based on geodesic distance. We provide an extensive evaluation and show that our proposed method performs favorably against other outstanding methods on two most commonly used datasets. Most importantly, the proposed approach is demonstrated to be more effective in highlighting the salient object uniformly and robust to background noise.



2019 ◽  
Vol 9 (8) ◽  
pp. 1578 ◽  
Author(s):  
Li ◽  
Yin ◽  
Shi ◽  
Mao ◽  
Shi

One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. Here, a feature reduction method is proposed that is based on two-stage feature clustering (TSFC), which is applied to short text classification. Features are semi-loosely clustered by combining spectral clustering with a graph traversal algorithm. Next, intra-cluster feature screening rules are designed to remove outlier feature words, which improves the effect of similar feature clusters. We classify short texts with corresponding similar feature clusters instead of original feature words. Similar feature clusters replace feature words, and the dimension of vector space is significantly reduced. Several classifiers are utilized to evaluate the effectiveness of this method. The results show that the method largely resolves the dimensional disaster and it can significantly improve the accuracy of short text classification.





Author(s):  
R. Srivastava ◽  
Aman Kumar Jain

Objective:: Defects in delivered software products not only have financial implications but also blemish the reputation of the organisation and lead to wastage of time and human resource. This paper aims to detect defects in software modules. Methods:: Our approach sequentially combines SMOTE algorithm to deal with class imbalance problem, K - means clustering algorithm to obtain a set of key features based on inter-class and intra-class coefficient of correlation and ensemble modelling to predict defects in software modules. After cautious examination, an ensemble framework of XGBoost, Decision Tree and Random Forest is used for prediction of software defects owing to numerous merits of ensembling approach. Results:: We have used five open-source datasets from NASA Promise Repository for Software Engineering. The result obtained from our approach has been compared with that of individual algorithms used in ensemble. A confidence interval for the accuracy of our approach with respect to performance evaluation metrics namely Accuracy, Precision, Recall, F1 score and AUC score has also been constructed at a significance level of 0.01. Conclusion:: Results have been depicted pictographically.



Sign in / Sign up

Export Citation Format

Share Document