Pattern based Feature Selection and Classification Scheme for Cancer Diagnosis Data Analysis

Abstract Electrical impedance spectroscopy (EIS) has been used as an adjunct to colposcopy for cervical cancer diagnosis for many years, Currently, the template match method is employed for EIS measurements analysis, where the measured EIS spectra are compared with the templates generated from three-dimensional finite element (FE) models of cancerous and non-cancerous cervical tissue, and the matches between the measured EIS spectra and the templates are then used to derive a score that indicates the association strength of the measured EIS to the High-Grade Cervical Intraepithelial Neoplasia (HG CIN). These FE models can be viewed as the computational versions of the associated physical tissue models. In this paper, the problem is revisited with an objective to develop a new method for EIS data analysis that might reveal the relationship between the change in the tissue structure due to disease and the change in the measured spectrum. This could provide us with important information to understand the histopathological mechanism that underpins the EIS-based HG CIN diagnostic decision making and the prognostic value of EIS for cervical cancer diagnosis. A further objective is to develop an alternative EIS data processing method for HG CIN detection that does not rely on physical models of tissues so as to facilitate extending the EIS technique to new medical diagnostic applications where the template spectra are not available. An EIS data-driven method was developed in this paper to achieve the above objectives, where the EIS data analysis for cervical cancer diagnosis and prognosis were formulated as the classification problems and a Cole model-based spectrum curve fitting approach was proposed to extract features from EIS readings for classification. Machine learning techniques were then used to build classification models with the selected features for cervical cancer diagnosis and evaluation of the prognostic value of the measured EIS. The interpretable classification models were developed with real EIS data sets, which enable us to associate the changes in the observed EIS and the risk of being HG CIN or developing HG CIN with the changes in tissue structure due to disease. The developed classification models were used for HG CIN detection and evaluation of the prognostic value of EIS and the results demonstrated the effectiveness of the developed method. The method developed is of long-term benefit for EIS–based cervical cancer diagnosis and, in conjunction with standard colposcopy, there is the potential for the developed method to provide a more effective and efficient patient management strategy for clinic practice.

Download Full-text

Improved Nonnegative Matrix Factorization Based Feature Selection for High Dimensional Data Analysis

Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013) ◽

10.2991/iccsee.2013.583 ◽

2013 ◽

Author(s):

Lincheng Jiang ◽

Wentang Tan ◽

Zhenwen Wang ◽

Fengjing Yin ◽

Bin Ge ◽

...

Keyword(s):

Feature Selection ◽

Data Analysis ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

High Dimensional Data ◽

Nonnegative Matrix ◽

High Dimensional ◽

High Dimensional Data Analysis ◽

Selection For

Download Full-text

Text mining based on tax comments as big data analysis using SVM and feature selection

2018 International Conference on Information and Communications Technology (ICOIACT) ◽

10.1109/icoiact.2018.8350743 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mihuandayani ◽

Ema Utami ◽

Emha Taufiq Luthfi

Keyword(s):

Feature Selection ◽

Big Data ◽

Data Analysis ◽

Text Mining ◽

Big Data Analysis

Download Full-text

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph15081750 ◽

2018 ◽

Vol 15 (8) ◽

pp. 1750 ◽

Cited By ~ 4

Author(s):

Seonho Kim ◽

Jungjoon Kim ◽

Hong-Woo Chun

Keyword(s):

Artificial Intelligence ◽

Time Series ◽

Feature Selection ◽

Deep Learning ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Real Number ◽

Real Time ◽

Language Processing

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.

Download Full-text