Combining Clustering and Voting Scheme to Select Initial Training Set for Active Learning

2014 ◽  
Vol 926-930 ◽  
pp. 3008-3011
Author(s):  
Yan Leng ◽  
Guang Hui Qi ◽  
Xin Yan Xu ◽  
Xiao Peng Wang ◽  
Deng Wang Li

Currently, most researchers select clustering-based algorithms to generate the initial training set for active learning. Considering that for such algorithms, a single clustering is not stable, we propose an initial training set selection algorithm which combines multi-clustering results to select samples. Specifically, after each clustering, it delimits several representative regions. If a sample falls into its corresponding representative region, then the algorithm casts a vote for it to mark that it is a potential representative sample. Finally, after several clustering, the samples with the most votes are selected. Experimental results show that our algorithm can efficiently select the informative samples, and can make the classifier have a more stable performance.

1995 ◽  
Vol 3 (4) ◽  
pp. 279-292 ◽  
Author(s):  
I. T. Cousins ◽  
M. T. D. Cronin ◽  
J. C. Dearden ◽  
C. D. Watts

2021 ◽  
Vol 13 (11) ◽  
pp. 2234
Author(s):  
Xin Luo ◽  
Huaqiang Du ◽  
Guomo Zhou ◽  
Xuejian Li ◽  
Fangjie Mao ◽  
...  

An informative training set is necessary for ensuring the robust performance of the classification of very-high-resolution remote sensing (VHRRS) images, but labeling work is often difficult, expensive, and time-consuming. This makes active learning (AL) an important part of an image analysis framework. AL aims to efficiently build a representative and efficient library of training samples that are most informative for the underlying classification task, thereby minimizing the cost of obtaining labeled data. Based on ranked batch-mode active learning (RBMAL), this paper proposes a novel combined query strategy of spectral information divergence lowest confidence uncertainty sampling (SIDLC), called RBSIDLC. The base classifier of random forest (RF) is initialized by using a small initial training set, and each unlabeled sample is analyzed to obtain the classification uncertainty score. A spectral information divergence (SID) function is then used to calculate the similarity score, and according to the final score, the unlabeled samples are ranked in descending lists. The most “valuable” samples are selected according to ranked lists and then labeled by the analyst/expert (also called the oracle). Finally, these samples are added to the training set, and the RF is retrained for the next iteration. The whole procedure is iteratively implemented until a stopping criterion is met. The results indicate that RBSIDLC achieves high-precision extraction of urban land use information based on VHRRS; the accuracy of extraction for each land-use type is greater than 90%, and the overall accuracy (OA) is greater than 96%. After the SID replaces the Euclidean distance in the RBMAL algorithm, the RBSIDLC method greatly reduces the misclassification rate among different land types. Therefore, the similarity function based on SID performs better than that based on the Euclidean distance. In addition, the OA of RF classification is greater than 90%, suggesting that it is feasible to use RF to estimate the uncertainty score. Compared with the three single query strategies of other AL methods, sample labeling with the SIDLC combined query strategy yields a lower cost and higher quality, thus effectively reducing the misclassification rate of different land use types. For example, compared with the Batch_Based_Entropy (BBE) algorithm, RBSIDLC improves the precision of barren land extraction by 37% and that of vegetation by 14%. The 25 characteristics of different land use types screened by RF cross-validation (RFCV) combined with the permutation method exhibit an excellent separation degree, and the results provide the basis for VHRRS information extraction in urban land use settings based on RBSIDLC.


Author(s):  
Chung-Hsien Wu ◽  
Hung-Yu Su ◽  
Chao-Hong Liu

This chapter presents an efficient approach to personalized pronunciation assessment of Taiwanese-accented English. The main goal of this study is to detect frequently occurring mispronunciation patterns of Taiwanese-accented English instead of scoring English pronunciations directly. The proposed assessment help quickly discover personalized mispronunciations of a student, thus English teachers can spend more time on teaching or rectifying students’ pronunciations. In this approach, an unsupervised model adaptation method is performed on the universal acoustic models to recognize the speech of a specific speaker with mispronunciations and Taiwanese accent. A dynamic sentence selection algorithm, considering the mutual information of the related mispronunciations, is proposed to choose a sentence containing the most undetected mispronunciations in order to quickly extract personalized mispronunciations. The experimental results show that the proposed unsupervised adaptation approach obtains an accuracy improvement of about 2.1% on the recognition of Taiwanese-accented English speech.


Author(s):  
Changdong Xu ◽  
Xin Geng

Hierarchical classification is a challenging problem where the class labels are organized in a predefined hierarchy. One primary challenge in hierarchical classification is the small training set issue of the local module. The local classifiers in the previous hierarchical classification approaches are prone to over-fitting, which becomes a major bottleneck of hierarchical classification. Fortunately, the labels in the local module are correlated, and the siblings of the true label can provide additional supervision information for the instance. This paper proposes a novel method to deal with the small training set issue. The key idea of the method is to represent the correlation among the labels by the label distribution. It generates a label distribution that contains the supervision information of each label for the given instance, and then learns a mapping from the instance to the label distribution. Experimental results on several hierarchical classification datasets show that our method significantly outperforms other state-of-theart hierarchical classification approaches.


2009 ◽  
Vol 123 (1) ◽  
pp. 79-89 ◽  
Author(s):  
Tamo Nakamura ◽  
Anthony A. Wright ◽  
Jeffrey S. Katz ◽  
Kent D. Bodily ◽  
Bradley R. Sturz

Electronics ◽  
2018 ◽  
Vol 7 (10) ◽  
pp. 258 ◽  
Author(s):  
Abdus Hassan ◽  
Umar Afzaal ◽  
Tooba Arifeen ◽  
Jeong Lee

Recently, concurrent error detection enabled through invariant relationships between different wires in a circuit has been proposed. Because there are many such implications in a circuit, selection strategies have been developed to select the most valuable implications for inclusion in the checker hardware such that a sufficiently high probability of error detection ( P d e t e c t i o n ) is achieved. These algorithms, however, due to their heuristic nature cannot guarantee a lossless P d e t e c t i o n . In this paper, we develop a new input-aware implication selection algorithm with the help of ATPG which minimizes loss on P d e t e c t i o n . In our algorithm, the detectability of errors for each candidate implication is carefully evaluated using error prone vectors. The evaluation results are then utilized to select the most efficient candidates for achieving optimal P d e t e c t i o n . The experimental results on 15 representative combinatorial benchmark circuits from the MCNC benchmarks suite show that the implications selected from our algorithm achieve better P d e t e c t i o n in comparison to the state of the art. The proposed method also offers better performance, up to 41.10%, in terms of the proposed impact-level metric, which is the ratio of achieved P d e t e c t i o n to the implication count.


Sign in / Sign up

Export Citation Format

Share Document