scholarly journals A Two-Phase Approach for Semi-Supervised Feature Selection

Algorithms ◽  
2020 ◽  
Vol 13 (9) ◽  
pp. 215
Author(s):  
Amit Saxena ◽  
Shreya Pare ◽  
Mahendra Singh Meena ◽  
Deepak Gupta ◽  
Akshansh Gupta ◽  
...  

This paper proposes a novel approach for selecting a subset of features in semi-supervised datasets where only some of the patterns are labeled. The whole process is completed in two phases. In the first phase, i.e., Phase-I, the whole dataset is divided into two parts: The first part, which contains labeled patterns, and the second part, which contains unlabeled patterns. In the first part, a small number of features are identified using well-known maximum relevance (from first part) and minimum redundancy (whole dataset) based feature selection approaches using the correlation coefficient. The subset of features from the identified set of features, which produces a high classification accuracy using any supervised classifier from labeled patterns, is selected for later processing. In the second phase, i.e., Phase-II, the patterns belonging to the first and second part are clustered separately into the available number of classes of the dataset. In the clusters of the first part, take the majority of patterns belonging to a cluster as the class for that cluster, which is given already. Form the pairs of cluster centroids made in the first and second part. The centroid of the second part nearest to a centroid of the first part will be paired. As the class of the first centroid is known, the same class can be assigned to the centroid of the cluster of the second part, which is unknown. The actual class of the patterns if known for the second part of the dataset can be used to test the classification accuracy of patterns in the second part. The proposed two-phase approach performs well in terms of classification accuracy and number of features selected on the given benchmarked datasets.

Author(s):  
Vishu Madaan ◽  
Aditya Roy ◽  
Charu Gupta ◽  
Prateek Agrawal ◽  
Anand Sharma ◽  
...  

AbstractCOVID-19 (also known as SARS-COV-2) pandemic has spread in the entire world. It is a contagious disease that easily spreads from one person in direct contact to another, classified by experts in five categories: asymptomatic, mild, moderate, severe, and critical. Already more than 66 million people got infected worldwide with more than 22 million active patients as of 5 December 2020 and the rate is accelerating. More than 1.5 million patients (approximately 2.5% of total reported cases) across the world lost their life. In many places, the COVID-19 detection takes place through reverse transcription polymerase chain reaction (RT-PCR) tests which may take longer than 48 h. This is one major reason of its severity and rapid spread. We propose in this paper a two-phase X-ray image classification called XCOVNet for early COVID-19 detection using convolutional neural Networks model. XCOVNet detects COVID-19 infections in chest X-ray patient images in two phases. The first phase pre-processes a dataset of 392 chest X-ray images of which half are COVID-19 positive and half are negative. The second phase trains and tunes the neural network model to achieve a 98.44% accuracy in patient classification.


Author(s):  
Amit Saxena ◽  
John Wang

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.


Author(s):  
Divya Jain ◽  
Vijendra Singh

A two-phase diagnostic framework based on hybrid classification for the diagnosis of chronic disease is proposed. In the first phase, feature selection via ReliefF method and feature extraction via PCA method are incorporated. In the second phase, efficient optimization of SVM parameters via grid search method is performed. The proposed hybrid classification approach is then tested with seven popular chronic disease datasets using a cross-validation method. Experiments are then conducted to evaluate the presented classification method vis-à-vis four other existing classifiers that are applied on the same chronic disease datasets. Results show that the presented approach reduces approximately 40% of the extraneous and surplus features with substantial reduction in the execution time for mining all datasets, achieving the highest classification accuracy of 98.5%. It is concluded that with the presented approach, excellent classification accuracy is achieved for each chronic disease dataset while irrelevant and redundant features may be eliminated, thereby substantially reducing the diagnostic complexity and resulting computational time.


1995 ◽  
Vol 117 (4) ◽  
pp. 483-493 ◽  
Author(s):  
Graeme W. Milton ◽  
Andrej V. Cherkaev

It is shown that any given positive definite fourth order tensor satisfying the usual symmetries of elasticity tensors can be realized as the effective elasticity tensor of a two-phase composite comprised of a sufficiently compliant isotropic phase and a sufficiently rigid isotropic phase configured in an suitable microstructure. The building blocks for constructing this composite are what we call extremal materials. These are composites of the two phases which are extremely stiff to a set of arbitrary given stresses and, at the same time, are extremely compliant to any orthogonal stress. An appropriately chosen subset of the extremal materials are layered together to form the composite with elasticity tensor matching the given tensor.


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Silvia Gaona ◽  
David Romero

Censuses in Mexico are taken by the National Institute of Statistics and Geography (INEGI). In this paper a Two-Phase Approach (TPA) to optimize the routes of INEGI’s census takers is presented. For each pollster, in the first phase, a route is produced by means of the Simulated Annealing (SA) heuristic, which attempts to minimize the travel distance subject to particular constraints. Whenever the route is unrealizable, it is made realizable in the second phase by constructing a visibility graph for each obstacle and applying Dijkstra’s algorithm to determine the shortest path in this graph. A tuning methodology based on theiracepackage was used to determine the parameter values for TPA on a subset of 150 instances provided by INEGI. The practical effectiveness of TPA was assessed on another subset of 1962 instances, comparing its performance with that of the in-use heuristic (INEGIH). The results show that TPA clearly outperformsINEGIH. The average improvement is of 47.11%.


Author(s):  
Gregory H. Teichert ◽  
Quentin T. Aten ◽  
Melanie Easter ◽  
Sandra Burnett ◽  
Larry L. Howell ◽  
...  

This paper introduces a metamorphic erectable cell restraint (MECR) to provide cell restraint in genetic research. A micro-electromechanical systems (MEMS) metamorphic mechanism with two phases of motion was designed to grasp individual embryos about their midplane. The first phase of motion lifts a compliant gripper approximately 40 μm (about half the diameter of an embryo). The gripper then closes in the second phase to grasp the embryo. The metamorphic mechanism includes compliant mechanism components which are analyzed here. A microscale prototype was fabricated from polysilicon and used to demonstrate the mechanism’s two phase motion.


1991 ◽  
Vol 113 (2) ◽  
pp. 228-235 ◽  
Author(s):  
S. E. Jones ◽  
P. P. Gillis ◽  
J. C. Foster ◽  
L. L. Wilson

In this paper, a simple theoretical analysis of an old problem is presented. The analysis is more complete than earlier versions, but retains the mathematical simplicity of the earlier versions. The major thrust is to separate the material response into two phases. The first phase is dominated by strain rate effects and has a variable plastic wave speed. The second phase is dominated by strain hardening effects and has a constant plastic wave speed. Estimates for dynamic yield stress, strain, strainrate, and plastic wave speed during both phases are given. Comparisons with several experiments on OFHC copper are included.


2021 ◽  
pp. 1-20
Author(s):  
Jian Yuan ◽  
Zhongyu Wei ◽  
Yixu Gao ◽  
Wei Chen ◽  
Jun Song ◽  
...  

Abstract In this paper we present the results of the Interactive Argument-Pair Extraction in Judgement Document Competition held by both the Chinese AI and Law challenge (CAIL) and the Chinese National Social Media Processing Conference (SMP), and introduce the related dataset – SMP-CAIL2020-Argmine. The task challenged participants to choose the correct argument among five candidates proposed by the defense to refute or acknowledge the given argument made by the plaintiff, providing the full context recorded in the judgement documents of both parties. We received entries from 63 competing teams, 38 of which scored higher than the provided baseline model (BERT) in the first phase and entered the second phase. The best performing system in the two phases achieved accuracy of 0.856 and 0.905 respectively. In this paper, we will present the results of the competition and a summary of the systems, highlighting commonalities and innovations among participating systems. The SMP-CAIL2020-Argmine dataset and baseline models1 have been already released.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-18 ◽  
Author(s):  
Jessica L. Chapman ◽  
Lu Lu ◽  
Christine M. Anderson-Cook

An important aspect of good management of inventory for many single-use populations or stockpiles is to develop an informed consumption strategy to use a collection of single-use units, with varied reliability as a function of age, during scheduled operations. We present a two-phase approach to balance multiple objectives for a consumption strategy to ensure good performance on the average reliability, consistency of unit reliability over time, and least uncertainty of the reliability estimates. In the first phase, a representative subset of units is selected to explore the impact of using units at different time points on reliability performance and to identify beneficial consumption patterns using a nondominated sorting genetic algorithm based on multiple objectives. In the second phase, the results from the first phase are projected back to the full stockpile as a starting point for determining best consumption strategies that emphasize the priorities of the manager. The method can be generalized to other criteria of interest and management optimization strategies. The method is illustrated with an example that shares characteristics with some munition stockpiles and demonstrates the substantial advantages of the two-phase approach on the quality of solutions and efficiency of finding them.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Heng-Yang Lu ◽  
Yi Zhang ◽  
Yuntao Du

PurposeTopic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.Design/methodology/approachSenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.FindingsExperimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.Originality/valueThe originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.


Sign in / Sign up

Export Citation Format

Share Document