Feature Extraction and Analysis for Lung Nodule Classification using Random Forest

Author(s):  
Nada S. El-Askary ◽  
Mohammed A.-M. Salem ◽  
Mohamed I. Roushdy
2019 ◽  
Vol 13 (2) ◽  
pp. 136-141 ◽  
Author(s):  
Abhisek Sethy ◽  
Prashanta Kumar Patra ◽  
Deepak Ranjan Nayak

Background: In the past decades, handwritten character recognition has received considerable attention from researchers across the globe because of its wide range of applications in daily life. From the literature, it has been observed that there is limited study on various handwritten Indian scripts and Odia is one of them. We revised some of the patents relating to handwritten character recognition. Methods: This paper deals with the development of an automatic recognition system for offline handwritten Odia character recognition. In this case, prior to feature extraction from images, preprocessing has been done on the character images. For feature extraction, first the gray level co-occurrence matrix (GLCM) is computed from all the sub-bands of two-dimensional discrete wavelet transform (2D DWT) and thereafter, feature descriptors such as energy, entropy, correlation, homogeneity, and contrast are calculated from GLCMs which are termed as the primary feature vector. In order to further reduce the feature space and generate more relevant features, principal component analysis (PCA) has been employed. Because of the several salient features of random forest (RF) and K- nearest neighbor (K-NN), they have become a significant choice in pattern classification tasks and therefore, both RF and K-NN are separately applied in this study for segregation of character images. Results: All the experiments were performed on a system having specification as windows 8, 64-bit operating system, and Intel (R) i7 – 4770 CPU @ 3.40 GHz. Simulations were conducted through Matlab2014a on a standard database named as NIT Rourkela Odia Database. Conclusion: The proposed system has been validated on a standard database. The simulation results based on 10-fold cross-validation scenario demonstrate that the proposed system earns better accuracy than the existing methods while requiring least number of features. The recognition rate using RF and K-NN classifier is found to be 94.6% and 96.4% respectively.


Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


Author(s):  
Amrita Naik ◽  
Damodar Reddy Edla

Lung cancer is the most common cancer throughout the world and identification of malignant tumors at an early stage is needed for diagnosis and treatment of patient thus avoiding the progression to a later stage. In recent times, deep learning architectures such as CNN have shown promising results in effectively identifying malignant tumors in CT scans. In this paper, we combine the CNN features with texture features such as Haralick and Gray level run length matrix features to gather benefits of high level and spatial features extracted from the lung nodules to improve the accuracy of classification. These features are further classified using SVM classifier instead of softmax classifier in order to reduce the overfitting problem. Our model was validated on LUNA dataset and achieved an accuracy of 93.53%, sensitivity of 86.62%, the specificity of 96.55%, and positive predictive value of 94.02%.


2020 ◽  
Vol 20 (S12) ◽  
Author(s):  
Juan C. Mier ◽  
Yejin Kim ◽  
Xiaoqian Jiang ◽  
Guo-Qiang Zhang ◽  
Samden Lhatoo

Abstract Background Sudden Unexpected Death in Epilepsy (SUDEP) has increased in awareness considerably over the last two decades and is acknowledged as a serious problem in epilepsy. However, the scientific community remains unclear on the reason or possible bio markers that can discern potentially fatal seizures from other non-fatal seizures. The duration of postictal generalized EEG suppression (PGES) is a promising candidate to aid in identifying SUDEP risk. The length of time a patient experiences PGES after a seizure may be used to infer the risk a patient may have of SUDEP later in life. However, the problem becomes identifying the duration, or marking the end, of PGES (Tomson et al. in Lancet Neurol 7(11):1021–1031, 2008; Nashef in Epilepsia 38:6–8, 1997). Methods This work addresses the problem of marking the end to PGES in EEG data, extracted from patients during a clinically supervised seizure. This work proposes a sensitivity analysis on EEG window size/delay, feature extraction and classifiers along with associated hyperparameters. The resulting sensitivity analysis includes the Gradient Boosted Decision Trees and Random Forest classifiers trained on 10 extracted features rooted in fundamental EEG behavior using an EEG specific feature extraction process (pyEEG) and 5 different window sizes or delays (Bao et al. in Comput Intell Neurosci 2011:1687–5265, 2011). Results The machine learning architecture described above scored a maximum AUC score of 76.02% with the Random Forest classifier trained on all extracted features. The highest performing features included SVD Entropy, Petrosan Fractal Dimension and Power Spectral Intensity. Conclusion The methods described are effective in automatically marking the end to PGES. Future work should include integration of these methods into the clinical setting and using the results to be able to predict a patient’s SUDEP risk.


Sign in / Sign up

Export Citation Format

Share Document