majority voting
Recently Published Documents





2022 ◽  
Vol 40 (3) ◽  
pp. 1-29
Yashar Moshfeghi ◽  
Alvaro Francisco Huertas-Rosero

In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the hypothesis that some workers are more risk-inclined and tend to gamble with their use of time when put to compete with other workers. This hypothesis is supported by our previous simulation study. We test our approach with 35 topics from experiments on the TREC-8 collection being assessed as relevant or non-relevant by crowdsourced workers both in a competitive (referred to as “Game”) and non-competitive (referred to as “Base”) scenario. We find that competition changes the distributions of TCT, making them sensitive to the quality (i.e., wrong or right) and outcome (i.e., relevant or non-relevant) of the assessments. We also test an optimal function of TCT as weights in a weighted majority voting scheme. From probabilistic considerations, we derive a theoretical upper bound for the weighted majority performance of cohorts of 2, 3, 4, and 5 workers, which we use as a criterion to evaluate the performance of our weighting scheme. We find our approach achieves a remarkable performance, significantly closing the gap between the accuracy of the obtained relevance judgements and the upper bound. Since our approach takes advantage of TCT, which is an available quantity in any CS tasks, we believe it is cost-effective and, therefore, can be applied for quality assurance in crowdsourcing for micro-tasks.

Ramsha Saeed ◽  
Hammad Afzal ◽  
Haider Abbas ◽  
Maheen Fatima

Increased connectivity has contributed greatly in facilitating rapid access to information and reliable communication. However, the uncontrolled information dissemination has also resulted in the spread of fake news. Fake news might be spread by a group of people or organizations to serve ulterior motives such as political or financial gains or to damage a country’s public image. Given the importance of timely detection of fake news, the research area has intrigued researchers from all over the world. Most of the work for detecting fake news focuses on the English language. However, automated detection of fake news is important irrespective of the language used for spreading false information. Recognizing the importance of boosting research on fake news detection for low resource languages, this work proposes a novel semantically enriched technique to effectively detect fake news in Urdu—a low resource language. A model based on deep contextual semantics learned from the convolutional neural network is proposed. The features learned from the convolutional neural network are combined with other n-gram-based features and are fed to a conventional majority voting ensemble classifier fitted with three base learners: Adaptive Boosting, Gradient Boosting, and Multi-Layer Perceptron. Experiments are performed with different models, and results show that enriching the traditional ensemble learner with deep contextual semantics along with other standard features shows the best results and outperforms the state-of-the-art Urdu fake news detection model.

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 228
Ahmad B. Hassanat ◽  
Ahmad S. Tarawneh ◽  
Samer Subhi Abed ◽  
Ghada Awad Altarawneh ◽  
Malek Alrashidi ◽  

Since most classifiers are biased toward the dominant class, class imbalance is a challenging problem in machine learning. The most popular approaches to solving this problem include oversampling minority examples and undersampling majority examples. Oversampling may increase the probability of overfitting, whereas undersampling eliminates examples that may be crucial to the learning process. We present a linear time resampling method based on random data partitioning and a majority voting rule to address both concerns, where an imbalanced dataset is partitioned into a number of small subdatasets, each of which must be class balanced. After that, a specific classifier is trained for each subdataset, and the final classification result is established by applying the majority voting rule to the results of all of the trained models. We compared the performance of the proposed method to some of the most well-known oversampling and undersampling methods, employing a range of classifiers, on 33 benchmark machine learning class-imbalanced datasets. The classification results produced by the classifiers employed on the generated data by the proposed method were comparable to most of the resampling methods tested, with the exception of SMOTEFUNA, which is an oversampling method that increases the probability of overfitting. The proposed method produced results that were comparable to the Easy Ensemble (EE) undersampling method. As a result, for solving the challenge of machine learning from class-imbalanced datasets, we advocate using either EE or our method.

2022 ◽  
Vol 14 (2) ◽  
pp. 330
Sejung Jung ◽  
Kirim Lee ◽  
Won Hee Lee

High-rise buildings (HRBs) as modern and visually unique land use continue to increase due to urbanization. Therefore, large-scale monitoring of HRB is very important for urban planning and environmental protection. This paper performed object-based HRB detection using high-resolution satellite image and digital map. Three study areas were acquired from KOMPSAT-3A, KOMPSAT-3, and WorldView-3, and object-based HRB detection was performed using the direction according to relief displacement by satellite image. Object-based multiresolution segmentation images were generated, focusing on HRB in each satellite image, and then combined with pixel-based building detection results obtained from MBI through majority voting to derive object-based building detection results. After that, to remove objects misdetected by HRB, the direction between HRB in the polygon layer of the digital map HRB and the HRB in the object-based building detection result was calculated. It was confirmed that the direction between the two calculated using the centroid coordinates of each building object converged with the azimuth angle of the satellite image, and results outside the error range were removed from the object-based HRB results. The HRBs in satellite images were defined as reference data, and the performance of the results obtained through the proposed method was analyzed. In addition, to evaluate the efficiency of the proposed technique, it was confirmed that the proposed method provides relatively good performance compared to the results of object-based HRB detection using shadows.

2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Deepti Sisodia ◽  
Dilip Singh Sisodia

PurposeThe problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.Design/methodology/approachTo overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.FindingsEmpirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.Originality/valueThe FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

2022 ◽  
Vol 2022 ◽  
pp. 1-8
Mustafa Ghaderzadeh ◽  
Azamossadat Hosseini ◽  
Farkhondeh Asadi ◽  
Hassan Abolghasemi ◽  
Davood Bashash ◽  

Introduction. Acute lymphoblastic leukemia (ALL) is the most common type of leukemia, a deadly white blood cell disease that impacts the human bone marrow. ALL detection in its early stages has always been riddled with complexity and difficulty. Peripheral blood smear (PBS) examination, a common method applied at the outset of ALL diagnosis, is a time-consuming and tedious process that largely depends on the specialist’s experience. Materials and Methods. Herein, a fast, efficient, and comprehensive model based on deep learning (DL) was proposed by implementing eight well-known convolutional neural network (CNN) models for feature extraction on all images and classification of B-ALL lymphoblast and normal cells. After evaluating their performance, four best-performing CNN models were selected to compose an ensemble classifier by combining each classifier’s pretrained model capabilities. Results. Due to the close similarity of the nuclei of cancerous and normal cells, CNN models alone had low sensitivity and poor performance in diagnosing these two classes. The proposed model based on the majority voting technique was adopted to combine the CNN models. The resulting model achieved a sensitivity of 99.4, specificity of 96.7, AUC of 98.3, and accuracy of 98.5. Conclusion. In classifying cancerous blood cells from normal cells, the proposed method can achieve high accuracy without the operator’s intervention in cell feature determination. It can thus be recommended as an extraordinary tool for the analysis of blood samples in digital laboratory equipment to assist laboratory specialists.

Ziquan Zhu ◽  
Siyuan Lu ◽  
Shui-Hua Wang ◽  
Juan Manuel Górriz ◽  
Yu-Dong Zhang

Aims: Most blood diseases, such as chronic anemia, leukemia (commonly known as blood cancer), and hematopoietic dysfunction, are caused by environmental pollution, substandard decoration materials, radiation exposure, and long-term use certain drugs. Thus, it is imperative to classify the blood cell images. Most cell classification is based on the manual feature, machine learning classifier or the deep convolution network neural model. However, manual feature extraction is a very tedious process, and the results are usually unsatisfactory. On the other hand, the deep convolution neural network is usually composed of massive layers, and each layer has many parameters. Therefore, each deep convolution neural network needs a lot of time to get the results. Another problem is that medical data sets are relatively small, which may lead to overfitting problems.Methods: To address these problems, we propose seven models for the automatic classification of blood cells: BCARENet, BCR5RENet, BCMV2RENet, BCRRNet, BCRENet, BCRSNet, and BCNet. The BCNet model is the best model among the seven proposed models. The backbone model in our method is selected as the ResNet-18, which is pre-trained on the ImageNet set. To improve the performance of the proposed model, we replace the last four layers of the trained transferred ResNet-18 model with the three randomized neural networks (RNNs), which are RVFL, ELM, and SNN. The final outputs of our BCNet are generated by the ensemble of the predictions from the three randomized neural networks by the majority voting. We use four multi-classification indexes for the evaluation of our model.Results: The accuracy, average precision, average F1-score, and average recall are 96.78, 97.07, 96.78, and 96.77%, respectively.Conclusion: We offer the comparison of our model with state-of-the-art methods. The results of the proposed BCNet model are much better than other state-of-the-art methods.

2022 ◽  
pp. 20-41
Rubeena Vohra ◽  
Kailash Chandra Tiwari

The goal of this chapter is to demonstrate the classification of natural and man-made objects from multisensory remote sensing data. The spectral and spatial features play an important role in extracting the information of natural and man-made objects. The classification accuracy may be enhanced by fusion technique applied on feature knowledge database. A significantly different approach has been devised using spatial as well as spectral features from multisensory data, and the classified results are enhanced by majority voting fusion technique. The author concludes by presenting extensive discussion at each level and has envisaged the potential use of multisensory data for object-based land cover classification.

2022 ◽  
Vol 16 (1) ◽  
pp. 0-0

In this work, homogeneous ensemble techniques, namely bagging and boosting were employed for intrusion detection to determine the intrusive activities in network by monitoring the network traffic. Simultaneously, model diversity was enhanced as numerous algorithms were taken into account, thereby leading to an increase in the detection rate Several classifiers, i.e., SVM, KNN, RF, ETC and MLP) were used in case of bagging approach. Likewise, tree-based classifiers have been employed for boosting. The proposed model was tested on NSL-KDD dataset that was initially subjected to preprocessing. Accordingly, ten most significant features were identified using decision tree and recursive feature elimination method. Furthermore, the dataset was divided into five subsets, each one them being subjected to training, and the final results were obtained based on majority voting. Experimental results proved that the model was effective for detecting intrusive activities. Bagged ETC and boosted RF outperformed all the other classifiers with an accuracy of 99.123% and 99.309%, respectively.

Zonghao Yuan ◽  
Zengqiang Ma ◽  
Li Xin ◽  
Dayong Gao ◽  
Fu Zhipeng

Abstract Fault diagnosis of rolling bearings is key to maintain and repair modern rotating machinery. Rolling bearings are usually working in non-stationary conditions with time-varying loads and speeds. Existing diagnosis methods based on vibration signals only don’ t have the ability to adapt to rotational speed. And when the load changes, the accuracy rate of them will be obviously reduced. A method is put forward which fuses multi-modal sensor signals to fit speed information. Firstly, the features are extracted from raw vibration signals and instantaneous rotating speed signals, and fused by 1D-CNN-based networks. Secondly, to improve the robustness of the model when the load changes, a majority voting mechanism is proposed in the diagnosis stage. Lastly, Multiple variable speed samples of four bearings under three loads are obtained to evaluate the performance of the proposed method by analyzing the loss function, accuracy rate and F1 score under different variable speed samples. It is empirically found that the proposed method achieves higher diagnostic accuracy and speed-adaptive ability than the algorithms based on vibration signal only. Moreover, A couple of ablation studies are also conducted to investigate the inner mechanism of the proposed speed-adaptive network.

Sign in / Sign up

Export Citation Format

Share Document