Assessing Effects of Pre-Processing Mass Spectrometry Data on Classification Performance

2008 ◽  
Vol 14 (5) ◽  
pp. 267-273 ◽  
Author(s):  
Akin Ozcift ◽  
Arif Gulten

Disease prediction through mass spectrometry (MS) data is gaining importance in medical diagnosis. Particularly in cancerous diseases, early prediction is one of the most life saving stages. High dimension and the noisy nature of MS data requires a two-phase study for successful disease prediction; first, MS data must be pre-processed with stages such as baseline correction, normalizing, de-noising and peak detection. Second, a dimension reduction based classifier design is the main objective. Having the data pre-processed, the prediction accuracy of the classifier algorithm becomes the most significant factor in the medical diagnosis phase. As health is the main concern, the accuracy of the classifier is clearly very important. In this study, the effects of the pre-processing stages of MS data on classifier performances are addressed. Three pre-processing stages—baseline correction, normalization and de-noising—are applied to three MS data samples, namely, high-resolution ovarian cancer, low-resolution prostate cancer and a low-resolution ovarian cancer. To measure the effects of the pre-processing stages quantitatively, four diverse classifiers, genetic algorithm wrapped K-nearest neighbor (GA-KNN), principal component analysis-based least discriminant analysis (PCA-LDA), a neural network (NN) and a support vector machine (SVM) are applied to the data sets. Calculated classifier performances have demonstrated the effects of pre-processing stages quantitatively and the importance of pre-processing stages on the prediction accuracy of classifiers. Results of computations have been shown clearly.

2019 ◽  
Vol 14 ◽  
Author(s):  
Pingan He ◽  
Longao Hou ◽  
Hong Tao ◽  
Qi Dai ◽  
Yuhua Yao

Backgroud: The impact of cancer in the society has created the necessity of new and faster theoretical models for the early diagnosis of cancer. Methods: In the work, A mass spectrometry (MS) data analysis method based on star-like graph of protein and support vector machine (SVM) was proposed and applied to the ovarian cancer early classification in the MS data set. Firstly, the MS data is reduced and transformed into the corresponding protein sequence. And then, the topological indexes of the star-like graph are calculated to describe each MS data of cancer sample. Finally, the SVM model is suggested to classify the MS data. Results: Using independent training and testing experiments 10 times to evaluate the ovarian cancer detection models. The average prediction accuracy, sensitivity, and specificity of the model were 96.45%, 96.88%, and 95.67%, respectively, for [0,1] normalization data. and the model were 94.43%, 96.25%, and 91.11%, respectively, for [-1,1] normalization data. Conclusion: The model combined with the SELDI-TOF-MS technology had a prospect in early clinical detection and diagnosis of ovarian cancer.


2013 ◽  
Vol 295-298 ◽  
pp. 644-647 ◽  
Author(s):  
Yu Kai Yao ◽  
Hong Mei Cui ◽  
Ming Wei Len ◽  
Xiao Yun Chen

SVM (Support Vector Machine) is a powerful data mining algorithm, and is mainly used to finish classification or regression tasks. In this literature, SVM is used to conduct disease prediction. We focus on integrating with stratified sample and grid search technology to improve the classification accuracy of SVM, thus, we propose an improved algorithm named SGSVM: Stratified sample and Grid search based SVM. To testify the performance of SGSVM, heart-disease data from UCI are used in our experiment, and the results show SGSVM has obvious improvement in classification accuracy, and this is very valuable especially in disease prediction.


2005 ◽  
Vol 21 (10) ◽  
pp. 2200-2209 ◽  
Author(s):  
J. S. Yu ◽  
S. Ongarello ◽  
R. Fiedler ◽  
X. W. Chen ◽  
G. Toffolo ◽  
...  

2003 ◽  
Vol 19 (13) ◽  
pp. 1636-1643 ◽  
Author(s):  
B. Wu ◽  
T. Abbott ◽  
D. Fishman ◽  
W. McMurray ◽  
G. Mor ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document