scholarly journals Variance reduction trends on ‘boosted’ classifiers

2004 ◽  
Vol 8 (3) ◽  
pp. 141-154
Author(s):  
Virginia Wheway

Ensemble classification techniques such as bagging, (Breiman, 1996a), boosting (Freund & Schapire, 1997) and arcing algorithms (Breiman, 1997) have received much attention in recent literature. Such techniques have been shown to lead to reduced classification error on unseen cases. Even when the ensemble is trained well beyond zero training set error, the ensemble continues to exhibit improved classification error on unseen cases. Despite many studies and conjectures, the reasons behind this improved performance and understanding of the underlying probabilistic structures remain open and challenging problems. More recently, diagnostics such as edge and margin (Breiman, 1997; Freund & Schapire, 1997; Schapire et al., 1998) have been used to explain the improvements made when ensemble classifiers are built. This paper presents some interesting results from an empirical study performed on a set of representative datasets using the decision tree learner C4.5 (Quinlan, 1993). An exponential-like decay in the variance of the edge is observed as the number of boosting trials is increased. i.e. boosting appears to ‘homogenise’ the edge. Some initial theory is presented which indicates that a lack of correlation between the errors of individual classifiers is a key factor in this variance reduction.

Author(s):  
Antonio Giovannetti ◽  
Gianluca Susi ◽  
Paola Casti ◽  
Arianna Mencattini ◽  
Sandra Pusil ◽  
...  

AbstractIn this paper, we present the novel Deep-MEG approach in which image-based representations of magnetoencephalography (MEG) data are combined with ensemble classifiers based on deep convolutional neural networks. For the scope of predicting the early signs of Alzheimer’s disease (AD), functional connectivity (FC) measures between the brain bio-magnetic signals originated from spatially separated brain regions are used as MEG data representations for the analysis. After stacking the FC indicators relative to different frequency bands into multiple images, a deep transfer learning model is used to extract different sets of deep features and to derive improved classification ensembles. The proposed Deep-MEG architectures were tested on a set of resting-state MEG recordings and their corresponding magnetic resonance imaging scans, from a longitudinal study involving 87 subjects. Accuracy values of 89% and 87% were obtained, respectively, for the early prediction of AD conversion in a sample of 54 mild cognitive impairment subjects and in a sample of 87 subjects, including 33 healthy controls. These results indicate that the proposed Deep-MEG approach is a powerful tool for detecting early alterations in the spectral–temporal connectivity profiles and in their spatial relationships.


2016 ◽  
Vol 4 (43) ◽  
pp. 16982-16991 ◽  
Author(s):  
Chao Li ◽  
Tongfei Shi ◽  
Hideya Yoshitake ◽  
Hongyu Wang

The interactions between silicon particles and polymeric binders are a key factor during the course of manufacturing high-capacity Si anodes for lithium-ion batteries.


2020 ◽  
Vol 6 (4) ◽  
pp. 121
Author(s):  
Myung Sub Lim ◽  
Choo Yeon Kim ◽  
Jae Wook Yoo

Whether to have a similar or different strategy than firms in same industry is the fundamental question for firms that want to build a competitive advantage. Recent literature, such as the new institutional theory and the perspective of optimal distinctiveness, has emphasized the configuration of competing forces that make firms simultaneously similar by conforming to industry norms and different by implementing innovation, leading to high performance. The primary rationale is that firms can exploit their high status of conformity as a stock of capital to differentiate themselves when required. Upon this rationale, we conducted research to test the hypotheses for optimal distinctiveness in the strategies of manufacturing firms in Korea. The results show that Korean firms have higher performance when they are mutually involved in higher conformity and innovation. It also suggests that firms in the industry with high volatility have difficulties in managing optimal distinctiveness of strategic conformity with innovation.


Author(s):  
Chien-Lin Huang ◽  
Jia-Ching Wang ◽  
Bin Ma

This paper presents an ensemble-based speaker recognition using unsupervised data selection. Ensemble learning is a type of machine learning that applies a combination of several weak learners to achieve an improved performance than a single learner. A speech utterance is divided into several subsets based on its acoustic characteristics using unsupervised data selection methods. The ensemble classifiers are then trained with these non-overlapping subsets of speech data to improve the recognition accuracy. This new approach has two advantages. First, without any auxiliary information, we use ensemble classifiers based on unsupervised data selection to make use of different acoustic characteristics of speech data. Second, in ensemble classifiers, we apply the divide-and-conquer strategy to avoid a local optimization in the training of a single classifier. Our experiments on the 2010 and 2008 NIST Speaker Recognition Evaluation datasets show that using ensemble classifiers yields a significant performance gain.


2012 ◽  
Vol 622-623 ◽  
pp. 1691-1695 ◽  
Author(s):  
Goh Mei Ling ◽  
David Yoon Kin Tong ◽  
Elsadig Musa Ahmed

Malaysia generates 0.8 kg waste per capita per day. Despite the recycling previous programmeslaunched, the national recycling rate was as low as 5%. Households’ involvement is expected to be the key factor to the success of recycling. Therefore, empirical study is needed to examineon the behavioural determinants of households’ recycling behaviour. The paper aims to extend the Theory of Planned Behaviour in predicting the households’ recycling behaviour. The paper will provide useful information and guidelines to the respective authorities in designingstrategies to encourage higher participation from households in the recycling programs.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jiangbo Zou ◽  
Xiaokang Fu ◽  
Lingling Guo ◽  
Chunhua Ju ◽  
Jingjing Chen

Ensemble classifiers improve the classification accuracy by incorporating the decisions made by its component classifiers. Basically, there are two steps to create an ensemble classifier: one is to generate base classifiers and the other is to align the base classifiers to achieve maximum accuracy integrally. One of the major problems in creating ensemble classifiers is the classification accuracy and diversity of the component classifiers. In this paper, we propose an ensemble classifier generating algorithm to improve the accuracy of an ensemble classification and to maximize the diversity of its component classifiers. In this algorithm, information entropy is introduced to measure the diversity of component classifiers, and a cyclic iterative optimization selection tactic is applied to select component classifiers from base classifiers, in which the number of component classifiers is dynamically adjusted to minimize system cost. It is demonstrated that our method has an obvious lower memory cost with higher classification accuracy compared with existing classifier methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jyoti Godara ◽  
Isha Batra ◽  
Rajni Aron ◽  
Mohammad Shabaz

Cognitive science is a technology which focuses on analyzing the human brain using the application of DM. The databases are utilized to gather and store the large volume of data. The authenticated information is extracted using measures. This research work is based on detecting the sarcasm from the text data. This research work introduces a scheme to detect sarcasm based on PCA algorithm, K -means algorithm, and ensemble classification. The four ensemble classifiers are designed with the objective of detecting the sarcasm. The first ensemble classification algorithm (SKD) is the combination of SVM, KNN, and decision tree. In the second ensemble classifier (SLD), SVM, logistic regression, and decision tree classifiers are combined for the sarcasm detection. In the third ensemble model (MLD), MLP, logistic regression, and decision tree are combined, and the last one (SLM) is the combination of MLP, logistic regression, and SVM. The proposed model is implemented in Python and tested on five datasets of different sizes. The performance of the models is tested with regard to various metrics.


2018 ◽  
Vol 11 (3) ◽  
pp. 1
Author(s):  
Wei Xu

With the professionalization of Chinese football, currently, Chinese football industry has become a new economic topic. The team like Guangzhou Evergrande, as a representative of the ‘money’ football policy in China, is popular. The China Football Association Super League (CSL) can be considered as an emerging field of great investment value. As such, the team’s operational efficiency should be a key factor that affects the managers and investors. Based on the input-oriented Data Envelope Analysis (DEA) model, this study analyzes the operational efficiencies of teams in CSL. The empirical study shows three key findings: First, team using the crazy investment mode is not efficient in 2012, 2013, 2014 seasons. Second, Beijing Guoan’s efficiency declined in the 2015 season due to his few investment. Third, in order to achieve good achievements in the league in the future, increasing investment should be an inevitable choice.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6718
Author(s):  
Wei Feng ◽  
Yinghui Quan ◽  
Gabriel Dauphin

Real-world datasets are often contaminated with label noise; labeling is not a clear-cut process and reliable methods tend to be expensive or time-consuming. Depending on the learning technique used, such label noise is potentially harmful, requiring an increased size of the training set, making the trained model more complex and more prone to overfitting and yielding less accurate prediction. This work proposes a cleaning technique called the ensemble method based on the noise detection metric (ENDM). From the corrupted training set, an ensemble classifier is first learned and used to derive four metrics assessing the likelihood for a sample to be mislabeled. For each metric, three thresholds are set to maximize the classifying performance on a corrupted validation dataset when using three different ensemble classifiers, namely Bagging, AdaBoost and k-nearest neighbor (k-NN). These thresholds are used to identify and then either remove or correct the corrupted samples. The effectiveness of the ENDM is demonstrated in performing the classification of 15 public datasets. A comparative analysis is conducted concerning the homogeneous-ensembles-based majority vote method and consensus vote method, two popular ensemble-based label noise filters.


Algorithms ◽  
2019 ◽  
Vol 12 (12) ◽  
pp. 249 ◽  
Author(s):  
Annabella Astorino ◽  
Antonio Fuduli ◽  
Giovanni Giallombardo ◽  
Giovanna Miglionico

A multiple instance learning problem consists of categorizing objects, each represented as a set (bag) of points. Unlike the supervised classification paradigm, where each point of the training set is labeled, the labels are only associated with bags, while the labels of the points inside the bags are unknown. We focus on the binary classification case, where the objective is to discriminate between positive and negative bags using a separating surface. Adopting a support vector machine setting at the training level, the problem of minimizing the classification-error function can be formulated as a nonconvex nonsmooth unconstrained program. We propose a difference-of-convex (DC) decomposition of the nonconvex function, which we face using an appropriate nonsmooth DC algorithm. Some of the numerical results on benchmark data sets are reported.


Sign in / Sign up

Export Citation Format

Share Document