Improving the classification performance with group lasso-based ranking method in high dimensional correlated data

2020 ◽  
Vol 19 (03) ◽  
pp. 2040009
Author(s):  
Abhijeet R Patil ◽  
Bong-Jin Choi ◽  
Sangjin Kim

The high-throughput correlated DNA methylation (DNAmeth) dataset generated from Illumina Infinium Human Methylation 27 (IIHM 27K) BeadChip assay. In the DNAmeth data, there are several CpG sites for every gene, and these grouped CpG sites are highly correlated. Most of the current filtering-based ranking (FBR) methods do not consider the group correlation structures. Obtaining the significant features with the FBR methods and applying these features to the classifiers to attain the best classification accuracy in highly correlated DNAmeth data is a challenging task. In this research, we introduce a resampling of group least absolute shrinkage and selection operator (glasso) FBR method capable of ignoring the unrelated features in the data considering the group correlation among the features. The various classifiers, such as random forests (RF), Naive Bayes (NB), and support vector machines (SVM) with the significant CpGs obtained from the proposed resampling of group lasso-based ranking (RGLR) method helped to boost the classification accuracy. Through simulated and experimental prostate DNAmeth data, we showed that higher performance of accuracy, sensitivity, specificity, and geometric mean is achieved by ignoring the unimportant CpG sites through the RGLR method.

2018 ◽  
Vol 21 (62) ◽  
pp. 1
Author(s):  
Jorge E. Camargo ◽  
Vladimir Vargas-Calderon ◽  
Nelson Vargas ◽  
Liliana Calderón-Benavides

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.


Author(s):  
M. Ustuner ◽  
F. B. Sanli ◽  
S. Abdikan ◽  
M. T. Esetlili ◽  
G. Bilgin

<p><strong>Abstract.</strong> Crops are dynamically changing and time-critical in the growing season and therefore multitemporal earth observation data are needed for spatio-temporal monitoring of the crops. This study evaluates the impacts of classical roll-invariant polarimetric features such as entropy (H), anisotropy (A), mean alpha angle (<span style="text-decoration: overline">&amp;alpha;</span>) and total scattering power (SPAN) for the crop classification from multitemporal polarimetric SAR data. For this purpose, five different data set were generated as following: (1) H<span style="text-decoration: overline">&amp;alpha;</span>, (2) H<span style="text-decoration: overline">&amp;alpha;</span>Span, (3) H<span style="text-decoration: overline">&amp;alpha;</span>A, (4) H<span style="text-decoration: overline">&amp;alpha;</span>ASpan and (5) coherency [<i>T</i>] matrix. A time-series of four PolSAR data (Radarsat-2) were acquired as 13 June, 01 July, 31 July and 24 August in 2016 for the test site located in Konya, Turkey. The test site is covered with crops (maize, potato, summer wheat, sunflower, and alfalfa). For the classification of the data set, three different models were used as following: Support Vector Machines (SVMs), Random Forests (RFs) and Naive Bayes (NB). The experimental results highlight that H&amp;alpha;ASpan (91.43<span class="thinspace"></span>% for SVM, 92.25<span class="thinspace"></span>% for RF and 90.55<span class="thinspace"></span>% for NB) outperformed all other data sets in terms of classification performance, which explicitly proves the significant contribution of SPAN for the discrimination of crops. Highest classification accuracy was obtained as 92.25<span class="thinspace"></span>% by RF and H&amp;alpha;ASpan while lowest classification accuracy was obtained as 66.99<span class="thinspace"></span>% by NB and H&amp;alpha;. This experimental study suggests that roll-invariant polarimetric features can be considered as the powerful polarimetric components for the crop classification. In addition, the findings prove the added benefits of PolSAR data investigation by means of crop classification.</p>


2017 ◽  
Vol 7 (1.3) ◽  
pp. 191 ◽  
Author(s):  
Ravindra B.V ◽  
N Sriraam ◽  
M Geetha

Chronic kidney disease (CKD) refers to the failure of the renal functionalities that leads to the deposition of wastes, electrolytes and other fluids in the body. It is very important to recognize the symptoms that cause the CKD and pathological blood and urine test indicates the key attributes. It is well fact that one has to undergo dialysis due to renal failure. The severity level of disease can be predicted as well as classified using appropriate computer aided quantitative tools. This specific study discusses the classification of chronic and non-chronic kidney disease NCKD using support vector machine (SVM) neural networks. The simulation study makes use of UCI repository CKD datasets with n=400. In order to train to train the attributes of kidney dialysis four cases were considered by including the nominal and numerical values. A radical basis kernel function was employed to train SVM. The performance of the proposed scheme is evaluated in terms of the sensitivity, specificity and classification accuracy. Results reveal an overall classification accuracy of 94.44% was obtained by combining 6 attributes. It can be concluded that the SVM based approach found to be a potential candidate for classification of CKD and NCKD.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Xin Wang ◽  
Yue Yang ◽  
Mingsong Chen ◽  
Qin Wang ◽  
Qin Qin ◽  
...  

Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.


Author(s):  
V. Ratna Bhargavi ◽  
V. Rajesh

In this paper, a hybrid approach of fundus image classification for diabetic retinopathy (DR) lesions is proposed. Laplacian eigenmaps (LE), a nonlinear dimensionality reduction (NDR) technique is applied to a high-dimensional scale invariant feature transform (SIFT) representation of fundus image for lesion classification. The applied NDR technique gives a low-dimensional intrinsic feature vector for lesion classification in fundus images. The publicly available databases are used for demonstrating the implemented strategy. The performance of applied technique can be evaluated based on sensitivity, specificity and accuracy using Support vector classifier. Compared to other feature vectors, the implemented LE-based feature vector yielded better classification performance. The accuracy obtained is 96.6% for SIFT-LE-SVM.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5234
Author(s):  
Chi Qin Lai ◽  
Haidi Ibrahim ◽  
Aini Ismafairus Abd Hamid ◽  
Jafri Malin Abdullah

Traumatic brain injury (TBI) is one of the common injuries when the human head receives an impact due to an accident or fall and is one of the most frequently submitted insurance claims. However, it is often always misused when individuals attempt an insurance fraud claim by providing false medical conditions. Therefore, there is a need for an instant brain condition classification system. This study presents a novel classification architecture that can classify non-severe TBI patients and healthy subjects employing resting-state electroencephalogram (EEG) as the input, solving the immobility issue of the computed tomography (CT) scan and magnetic resonance imaging (MRI). The proposed architecture makes use of long short term memory (LSTM) and error-correcting output coding support vector machine (ECOC-SVM) to perform multiclass classification. The pre-processed EEG time series are supplied to the network by each time step, where important information from the previous time step will be remembered by the LSTM cell. Activations from the LSTM cell is used to train an ECOC-SVM. The temporal advantages of the EEG were amplified and able to achieve a classification accuracy of 100%. The proposed method was compared to existing works in the literature, and it is shown that the proposed method is superior in terms of classification accuracy, sensitivity, specificity, and precision.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2592
Author(s):  
Xuemin Cheng ◽  
Yong Ren ◽  
Kaichang Cheng ◽  
Jie Cao ◽  
Qun Hao

In this study, we propose a method for training convolutional neural networks to make them identify and classify images with higher classification accuracy. By combining the Cartesian and polar coordinate systems when describing the images, the method of recognition and classification for plankton images is discussed. The optimized classification and recognition networks are constructed. They are available for in situ plankton images, exploiting the advantages of both coordinate systems in the network training process. Fusing the two types of vectors and using them as the input for conventional machine learning models for classification, support vector machines (SVMs) are selected as the classifiers to combine these two features of vectors, coming from different image coordinate descriptions. The accuracy of the proposed model was markedly higher than those of the initial classical convolutional neural networks when using the in situ plankton image data, with the increases in classification accuracy and recall rate being 5.3% and 5.1% respectively. In addition, the proposed training method can improve the classification performance considerably when used on the public CIFAR-10 dataset.


Actuators ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 152
Author(s):  
Yu-Tsung Hsiao ◽  
Chia-Fen Tsai ◽  
Chien-Te Wu ◽  
Thanh-Tung Trinh ◽  
Chun-Ying Lee ◽  
...  

Classification between individuals with mild cognitive impairment (MCI) and healthy controls (HC) based on electroencephalography (EEG) has been considered a challenging task to be addressed for the purpose of its early detection. In this study, we proposed a novel EEG feature, the kernel eigen-relative-power (KERP) feature, for achieving high classification accuracy of MCI versus HC. First, we introduced the relative powers (RPs) between pairs of electrodes across 21 different subbands of 2-Hz width as the features, which have not yet been used in previous MCI-HC classification studies. Next, the Fisher’s class separability criterion was applied to determine the best electrode pairs (five electrodes) as well as the frequency subbands for extracting the most sensitive RP features. The kernel principal component analysis (kernel PCA) algorithm was further performed to extract a few more discriminating nonlinear principal components from the optimal RPs, and these components form a KERP feature vector. Results carried out on 51 participants (24 MCI and 27 HC) show that the newly introduced subband RP feature showed superior classification performance to commonly used spectral power features, including the band power, single-electrode relative power, and also the RP based on the conventional frequency bands. A high leave-one-participant-out cross-validation (LOPO-CV) classification accuracy 86.27% was achieved by the RP feature, using a simple linear discriminant analysis (LDA) classifier. Moreover, with the same classifier, the proposed KERP further improved the accuracy to 88.24%. Finally, cascading the KERP feature to a nonlinear classifier, the support vector machine (SVM), yields a high MCI-HC classification accuracy of 90.20% (sensitivity = 87.50% and specificity = 92.59%). The proposed method demonstrated a high accuracy and a high usability (only five electrodes are required), and therefore, has great potential to further develop an EEG-based computer-aided diagnosis system that can be applied for the early detection of MCI.


2020 ◽  
Vol 10 (20) ◽  
pp. 7379
Author(s):  
Iosif Mporas ◽  
Isidoros Perikos ◽  
Vasilios Kelefouras ◽  
Michael Paraskevas

In this article, we present a framework for automatic detection of logging activity in forests using audio recordings. The framework was evaluated in terms of logging detection classification performance and various widely used classification methods and algorithms were tested. Experimental setups, using different ratios of sound-to-noise values, were followed and the best classification accuracy was reported by the support vector machine algorithm. In addition, a postprocessing scheme on decision level was applied that provided an improvement in the performance of more than 1%, mainly in cases of low ratios of sound-to-noise. Finally, we evaluated a late-stage fusion method, combining the postprocessed recognition results of the three top-performing classifiers, and the experimental results showed a further improvement of approximately 2%, in terms of absolute improvement, with logging sound recognition accuracy reaching 94.42% when the ratio of sound-to-noise was equal to 20 dB.


2020 ◽  
Vol 4 (4) ◽  
pp. 649-660
Author(s):  
M. Yunus ◽  
Asep Saefuddin ◽  
Agus M Soleh

One of the rainfall prediction techniques is the Statistical Downscaling Modeling (SDS). SDS modeling is one of the applications of modeling with covariates conditions that are generally large and not independent. The problems that will be encountered is the problem of ill-conditional data i.e multicollinearity and the high correlation between variables. The case of highly correlated data causes a linear regression coefficient estimators obtained to have a large variance. This research was conducted to make the statistical downscaling modeling using the lasso and group lasso for the prediction of rainfall. Group of the covariate scenario is applied based on the adjacent area, the high correlation between covariates and correlation between covariates and responses, and also the addition of dummy variables. Scenario six (grouping which is done by considering the covariates that have a positive correlation to the response is divided into 3 groups, 1 individual and the covariates that are negatively correlated with the response are divided into 2 groups, 1 individual) is better than the other scenarios in linear modeling without a dummy. Then, linear modeling with a dummy is better than without a dummy for both techniques. In linear modeling with a dummy, the Group lasso technique can be considered more in SDs modeling, because the difference in the RMSEP statistical value and the correlation coefficient value is significant.


Sign in / Sign up

Export Citation Format

Share Document