Ensemble of data mining methods for gene ranking

2012 ◽  
Vol 60 (3) ◽  
pp. 461-470 ◽  
Author(s):  
A. Wiliński ◽  
S. Osowski

Abstract The paper presents the ensemble of data mining methods for discovering the most important genes and gene sequences generated by the gene expression arrays, responsible for the recognition of a particular type of cancer. The analyzed methods include the correlation of the feature with a class, application of the statistical hypotheses, the Fisher measure of discrimination and application of the linear Support Vector Machine for characterization of the discrimination ability of the features. In the first step of ranking we apply each method individually, choosing the genes most often selected in the cross validation of the available data set. In the next step we combine the results of different selection methods together and once again choose the genes most frequently appearing in the selected sets. On the basis of this we form the final ranking of the genes. The most important genes form the input information delivered to the Support Vector Machine (SVM) classifier, responsible for the final recognition of tumor from non-tumor data. Different forms of checking the correctness of the proposed ranking procedure have been applied. The first one is relied on mapping the distribution of selected genes on the two-coordinate system formed by two most important principal components of the PCA transformation and applying the cluster quality measures. The other one depicts the results in the graphical form by presenting the gene expressions in the form of pixel intensity for the available data. The final confirmation of the quality of the proposed ranking method are the classification results of recognition of the cancer cases from the non-cancer (normal) ones, performed using the Gaussian kernel SVM. The results of selection of the most significant genes used by the SVM for recognition of the prostate cancer cases from normal cases have confirmed a good accuracy of results. The presented methodology is of potential use for practical application in bioinformatics.

Author(s):  
Zahraa Faiz Hussain ◽  
Hind Raad Ibraheem ◽  
Mohammad Alsajri ◽  
Ahmed Hussein Ali ◽  
Mohd Arfian Ismail ◽  
...  

Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction.


2021 ◽  
Vol 6 (2) ◽  
pp. 018-032
Author(s):  
Rasha Thamer Shawe ◽  
Kawther Thabt Saleh ◽  
Farah Neamah Abbas

These days, security threats detection, generally discussed to as intrusion, has befitted actual significant and serious problem in network, information and data security. Thus, an intrusion detection system (IDS) has befitted actual important element in computer or network security. Avoidance of such intrusions wholly bases on detection ability of Intrusion Detection System (IDS) which productions necessary job in network security such it identifies different kinds of attacks in network. Moreover, the data mining has been playing an important job in the different disciplines of technologies and sciences. For computer security, data mining are presented for serving intrusion detection System (IDS) to detect intruders accurately. One of the vital techniques of data mining is characteristic, so we suggest Intrusion Detection System utilizing data mining approach: SVM (Support Vector Machine). In suggest system, the classification will be through by employing SVM and realization concerning the suggested system efficiency will be accomplish by executing a number of experiments employing KDD Cup’99 dataset. SVM (Support Vector Machine) is one of the best distinguished classification techniques in the data mining region. KDD Cup’99 data set is utilized to execute several investigates in our suggested system. The experimental results illustration that we can decrease wide time is taken to construct SVM model by accomplishment suitable data set pre-processing. False Positive Rate (FPR) is decrease and Attack detection rate of SVM is increased .applied with classification algorithm gives the accuracy highest result. Implementation Environment Intrusion detection system is implemented using Mat lab 2015 programming language, and the examinations have been implemented in the environment of Windows-7 operating system mat lab R2015a, the processor: Core i7- Duo CPU 2670, 2.5 GHz, and (8GB) RAM.


Author(s):  
Nastaran Shahparian ◽  
Mehran Yazdi ◽  
Mohammad Reza Khosravi

Purpose: In recent years, resting-state functional magnetic resonance imaging (rs-fMRI) has been increasingly used as a noninvasive and practical method in different areas of neuroscience and psychology for recognizing brain’s mechanism as well as diagnosing neurological diseases. In this work, we use rs-fMRI data for diagnosing Alzheimer disease. Design/methodology/approach: To do that, by using the rs-fMRI of a patient, we computed the time series of some anatomical regions and then applied the Latent Low Rank Representation method to extract suitable features. Next, based on the extracted features we apply a Support Vector Machine (SVM) classifier to determine whether the patient belongs to healthy category, mild stage of the disease or Alzheimer stage. Findings: The obtained classification accuracy for the proposed method is more than 97.5%. Originality/value: We performed different experiments on a database of rs-fMRI data containing the images of 43 healthy subjects, 36 mild cognitive impairment patients and 32 Alzheimer patients and the obtained results demonstrated that the best performance is achieved when the SVM with Gaussian kernel and the features of only 7 regions were used.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Fusun Citak-Er ◽  
Metin Vural ◽  
Omer Acar ◽  
Tarik Esen ◽  
Aslihan Onay ◽  
...  

Objective.This study aimed at evaluating linear discriminant analysis (LDA) and support vector machine (SVM) classifiers for estimating final Gleason score preoperatively using multiparametric magnetic resonance imaging (mp-MRI) and clinical parameters.Materials and Methods.Thirty-three patients who underwent mp-MRI on a 3T clinical MR scanner and radical prostatectomy were enrolled in this study. The input features for classifiers were age, the presence of a palpable prostate abnormality, prostate specific antigen (PSA) level, index lesion size, and Likert scales of T2 weighted MRI (T2w-MRI), diffusion weighted MRI (DW-MRI), and dynamic contrast enhanced MRI (DCE-MRI) estimated by an experienced radiologist. SVM based recursive feature elimination (SVM-RFE) was used for eliminating features. Principal component analysis (PCA) was applied for data uncorrelation.Results.Using a standard PCA before final Gleason score classification resulted in mean sensitivities of 51.19% and 64.37% and mean specificities of 72.71% and 39.90% for LDA and SVM, respectively. Using a Gaussian kernel PCA resulted in mean sensitivities of 86.51% and 87.88% and mean specificities of 63.99% and 56.83% for LDA and SVM, respectively.Conclusion.SVM classifier resulted in a slightly higher sensitivity but a lower specificity than LDA method for final Gleason score prediction for prostate cancer for this limited patient population.


The Breast Cancer is disease which tremendously increased in women’s nowadays. Mammography is technique of low-powered X-ray diagnosis approach for detection and diagnosis of cancer diseases at early stage. The proposed system shows the solution of two problems. First shows to detect tumors as suspicious regions with a weak contrast to their background and second shows way to extract features which categorize tumors. Hence this classification can be done with SVM, a great method of statistical learning has made significant achievement in various field. Discovered in the early 90’s, which led to an interest in machine learning? Here the different types of tumor like Benign, Malignant, or Normal image are classified using the SVM classifier. This techniques shows how easily we can detect region of tumor is present in mammogram images with more than 80% of accuracy rates for linear classification using SVM. The 10-fold cross validation to get an accurate outcome is been used by proposed system. The Wisconsin breast cancer diagnosis data set is referred from UCI machine learning repository. The considering accuracy, sensitivity, specificity, false discovery rate, false omission rate and Matthews’s correlation coefficient is appraised in the proposed system. This Provides good result for both training and testing phase. The techniques also shows accuracy of 98.57% and 97.14% by use of Support Vector Machine and K-Nearest Neighbors


2019 ◽  
Vol 1 (92) ◽  
pp. 65-70
Author(s):  
G.V. Marchuk ◽  
V.L. Levkivskyy ◽  
S.S. Kaliberda

The main research of the article is the data mining methods, such as linear and polynomial regression and the support vector machine. The application success is based on the fact that the methods and technologies of Data mining ensure the study of data and the research of hidden patterns in them. The analysis assists in identification of various features and data parameters, and therefore it is a powerful tool in the stage of forming forecasting models.


Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2266
Author(s):  
Shih-Lin Lin

In recent years, artificial intelligence technology has been widely used in fault prediction and health management (PHM). The machine learning algorithm is widely used in the condition monitoring of rotating machines, and normal and fault data can be obtained through the data acquisition and monitoring system. After analyzing the data and establishing a model, the system can automatically learn the features from the input data to predict the failure of the maintenance and diagnosis equipment, which is important for motor maintenance. This research proposes a medium Gaussian support vector machine (SVM) method for the application of machine learning and constructs a feature space by extracting the characteristics of the vibration signal collected on the spot based on experience. Different methods were used to cluster and classify features to classify motor health. The influence of different Gaussian kernel functions, such as fine, medium, and coarse, on the performance of the SVM algorithm was analyzed. The experimental data verify the performance of various models through the data set released by the Case Western Reserve University Motor Bearing Data Center. As the motor often has noise interference in the actual application environment, a simulated Gaussian white noise was added to the original vibration data in order to verify the performance of the research method in a noisy environment. The results summarize the classification results of related motor data sets derived recently from the use of motor fault detection and diagnosis using different machine learning algorithms. The results show that the medium Gaussian SVM method improves the reliability and accuracy of motor bearing fault estimation, detection, and identification under variable crack-size and load conditions. This paper also provides a detailed discussion of the predictive analytical capabilities of machine learning algorithms, which can be used as a reference for the future motor predictive maintenance analysis of electric vehicles.


2021 ◽  
Vol 5 (2) ◽  
pp. 335-341
Author(s):  
I Made Yudha Arya Dala ◽  
I Ketut Gede Darma Putra ◽  
Putu Wira Buana

Dengue disease has been known to the people of Indonesia since 1779. The Aedes mosquito has two types, namely Aedes aegypti and Aedes albopictus. Aedes aegypti is a mosquito that carries the dengue virus. The dengue fever cases in Bali province tend to increase from year to year, especially when approaching the rainy season. The government's preventive action is needed to tackle the spread of the dengue virus and casualties. Data mining attempts to extract known knowledge or use historical data to find regularity patterns and relationships in a set of data. In this study, data mining predicts the number of dengue cases in Bali's province. The prediction uses several database variables to predict future variables' values, which are not currently known. The process of estimating predictive values ​​based on patterns in a data set. This forecasting aims to assist the government in predicting dengue fever cases in the coming period to prepare appropriate prevention efforts. Forecasting dengue fever cases are carried out using three methods: backpropagation, gaussians, and support-vector machine. The amount of data used was 528 sample data, from 2008 to 2018. The results obtained are that the backpropagation method is better at predicting dengue fever cases with a MAPE error rate of 0.025. Simultaneously, the gaussian method has a MAPE error rate of 0.035, and support-vector machine has a MAPE error rate of 0.060.  


Sign in / Sign up

Export Citation Format

Share Document