scholarly journals An Accurate Tool for Uncovering Cancer Subtypes by Fast Kernel Learning Method to Integrate Multiple Profile Data

Author(s):  
Hongyu Zhang ◽  
Limin Jiang ◽  
Jijun Tang ◽  
Yijie Ding

In recent years, cancer has become a severe threat to human health. If we can accurately identify the subtypes of cancer, it will be of great significance to the research of anti-cancer drugs, the development of personalized treatment methods, and finally conquer cancer. In this paper, we obtain three feature representation datasets (gene expression profile, isoform expression and DNA methylation data) on lung cancer and renal cancer from the Broad GDAC, which collects the standardized data extracted from The Cancer Genome Atlas (TCGA). Since the feature dimension is too large, Principal Component Analysis (PCA) is used to reduce the feature vector, thus eliminating the redundant features and speeding up the operation speed of the classification model. By multiple kernel learning (MKL), we use Kernel target alignment (KTA), fast kernel learning (FKL), Hilbert-Schmidt Independence Criterion (HSIC), Mean to calculate the weight of kernel fusion. Finally, we put the combined kernel function into the support vector machine (SVM) and get excellent results. Among them, in the classification of renal cell carcinoma subtypes, the maximum accuracy can reach 0.978 by using the method of MKL (HSIC calculation weight), while in the classification of lung cancer subtypes, the accuracy can even reach 0.990 with the same method (FKL calculation weight).

Lung cancer is a serious illness which leads to increased mortality rate globally. The identification of lung cancer at the beginning stage is the probable method of improving the survival rate of the patients. Generally, Computed Tomography (CT) scan is applied for finding the location of the tumor and determines the stage of cancer. Existing works has presented an effective diagnosis classification model for CT lung images. This paper designs an effective diagnosis and classification model for CT lung images. The presented model involves different stages namely pre-processing, segmentation, feature extraction and classification. The initial stage includes an adaptive histogram based equalization (AHE) model for image enhancement and bilateral filtering (BF) model for noise removal. The pre-processed images are fed into the second stage of watershed segmentation model for effectively segment the images. Then, a deep learning based Xception model is applied for prominent feature extraction and the classification takes place by the use of logistic regression (LR) classifier. A comprehensive simulation is carried out to ensure the effective classification of the lung CT images using a benchmark dataset. The outcome implied the outstanding performance of the presented model on the applied test images.


2019 ◽  
Vol 9 (17) ◽  
pp. 3589 ◽  
Author(s):  
Yunyun Dong ◽  
Wenkai Yang ◽  
Jiawen Wang ◽  
Juanjuan Zhao ◽  
Yan Qiang

Effective cancer treatment requires a clear subtype. Due to the small sample size, high dimensionality, and class imbalances of cancer gene data, classifying cancer subtypes by traditional machine learning methods remains challenging. The gcForest algorithm is a combination of machine learning methods and a deep neural network and has been indicated to achieve better classification of small samples of data. However, the gcForest algorithm still faces many challenges when this method is applied to the classification of cancer subtypes. In this paper, we propose an improved gcForest algorithm (MLW-gcForest) to study the applicability of this method to the small sample sizes, high dimensionality, and class imbalances of genetic data. The main contributions of this algorithm are as follows: (1) Different weights are assigned to different random forests according to the classification ability of the forests. (2) We propose a sorting optimization algorithm that assigns different weights to the feature vectors generated under different sliding windows. The MLW-gcForest model is trained on the methylation data of five data sets from the cancer genome atlas (TCGA). The experimental results show that the MLW-gcForest algorithm achieves high accuracy and area under curve (AUC) values for the classification of cancer subtypes compared with those of traditional machine learning methods and state of the art methods. The results also show that methylation data can be effectively used to diagnose cancer.


2019 ◽  
Vol 8 (4) ◽  
pp. 160 ◽  
Author(s):  
Bingxin Liu ◽  
Ying Li ◽  
Guannan Li ◽  
Anling Liu

Spectral characteristics play an important role in the classification of oil film, but the presence of too many bands can lead to information redundancy and reduced classification accuracy. In this study, a classification model that combines spectral indices-based band selection (SIs) and one-dimensional convolutional neural networks was proposed to realize automatic oil films classification using hyperspectral remote sensing images. Additionally, for comparison, the minimum Redundancy Maximum Relevance (mRMR) was tested for reducing the number of bands. The support vector machine (SVM), random forest (RF), and Hu’s convolutional neural networks (CNN) were trained and tested. The results show that the accuracy of classifications through the one dimensional convolutional neural network (1D CNN) models surpassed the accuracy of other machine learning algorithms such as SVM and RF. The model of SIs+1D CNN could produce a relatively higher accuracy oil film distribution map within less time than other models.


2013 ◽  
Vol 791-793 ◽  
pp. 1961-1964
Author(s):  
Xiao Li Yang ◽  
Qiong He

We propose a biomimetic pattern recognition (BPR) approach for classification of proteomic profile. The proposed approach preprocess profile using iterative minimum in adaptive setting window (IMASW) method for baseline correction, discrete wavelet transform (DWT) for fitting and smoothing, and average total ion normalization (ATIN) for remove the influence of vary amount of sample and degradation over time. Then principal component analysis (PCA) and BPR build classification model. With an optimization of the parameters involved in the modeling, we obtain a satisfactory model for cancer diagnosis in three proteomic profile datasets. The predicted results show that BPR technique is more reliable and efficient than support vector machine (SVM) method.


Author(s):  
M. C. Girish Baabu ◽  
Padma M. C.

<span>Hyperspectral imaging (HSI) is composed of several hundred of narrow bands (NB) with high spectral correlation and is widely used in crop classification; thus induces time and space complexity, resulting in high computational overhead and Hughes phenomenon in processing these images. Dimensional reduction technique such as band selection and feature extraction plays an important part in enhancing performance of hyperspectral image classification. However, existing method are not efficient when put forth in noisy and mixed pixel environment with dynamic illumination and climatic condition. Here the proposed Sematic Feature Representation based HSI (SFR-HSI) crop classification method first employ Image Fusion (IF) method for finding meaningful features from raw HSI spectrally. Second, to extract inherent features that keeps spatially meaningful representation of different crops by eliminating shading elements. Then, the meaningful feature set are used for training using Support vector machine (SVM). Experiment outcome shows proposed HSI crop classification model achieves much better accuracies and Kappa coefficient performance. </span>


2020 ◽  
Vol 13 (1-2) ◽  
pp. 43-52
Author(s):  
Boudewijn van Leeuwen ◽  
Zalán Tobak ◽  
Ferenc Kovács

AbstractClassification of multispectral optical satellite data using machine learning techniques to derive land use/land cover thematic data is important for many applications. Comparing the latest algorithms, our research aims to determine the best option to classify land use/land cover with special focus on temporary inundated land in a flat area in the south of Hungary. These inundations disrupt agricultural practices and can cause large financial loss. Sentinel 2 data with a high temporal and medium spatial resolution is classified using open source implementations of a random forest, support vector machine and an artificial neural network. Each classification model is applied to the same data set and the results are compared qualitatively and quantitatively. The accuracy of the results is high for all methods and does not show large overall differences. A quantitative spatial comparison demonstrates that the neural network gives the best results, but that all models are strongly influenced by atmospheric disturbances in the image.


2020 ◽  
Vol 53 (3-4) ◽  
pp. 184-190
Author(s):  
Ramaiah Arun ◽  
Shanmugasundaram Singaravelan

One of the biggest challenges the world face today is the mortality due to Cancer. One in four of all diagnosed cancers involve the lung cancer, where the mortality rate is high, even after so much of technical and medical advances. Most lung cancer cases are diagnosed either in the third or fourth stage, when the disease is not treatable. The main reason for the highest mortality, due to lung cancer is because of non availability of prescreening system which can analyze the cancer cells at early stages. So it is necessary to develop a prescreening system which helps doctors to find and detect lung cancer at early stages. Out of all various types of lung cancers, adenocarcinoma is increasing at an alarming rate. The reason is mainly attributed to the increased rate of smoking - both active and passive. In the present work, a system for the classification of lung glandular cells for early detection of Cancer using multiple color spaces is developed. For segmentation, various clustering techniques like K-Means clustering and Fuzzy C-Means clustering on various Color spaces such as HSV, CIELAB, CIEXYy and CIELUV are used. Features are Extracted and classified using Support Vector Machine (SVM).


As of now the detection and classification of lung cancer disease is one of the most tedious tasks in the field of medical area. In the diversified sector of medical industry usage of technology plays a very important role. Detection and diagnosis of the lung cancer at an early stage with more accuracy is the most challenging task. So, in this research article 400 set of images has been used for this experiment. Best feature extraction technique and best feature optimization technique has been analyzed on the basis of parameter minimum execution time with minimum error rate. Then finest selection of features leads to an optimal classification. In this context, one of the best classification algorithm the support vector machine has been proposed in this hybrid model for the binary classification. Further Feed forward back propagation neural network has been implemented with SVM. This proposed hybrid model reduces the complexity of the system on the basis of minimum execution time that is 1.94 sec. with minimum error rate 29.25. Further better classification accuracy 99.6507% has been achieved by using this unique hybrid model


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Zijin Wu

With the development of the country’s economy, there is a flourishing situation in the field of culture and art. However, the diversification of artistic expressions has not brought development to folk music. On the contrary, it brought a huge impact, and some national music even fell into the dilemma of being lost. This article is mainly aimed at the recognition and classification of folk music emotions and finds the model that can make the classification accuracy rate as high as possible. The classification model used in this article is mainly after determining the use of Support Vector Machine (SVM) classification method, a variety of attempts have been made to feature extraction, and good results have been achieved. Explore the Deep Belief Network (DBN) pretraining and reverse fine-tuning process, using DBN to learn the fusion characteristics of music. According to the abstract characteristics learned by them, the recognition and classification of folk music emotions are carried out. The DBN is improved by adding “Dropout” to each Restricted Boltzmann Machine (RBM) and adjusting the increase standard of weight and bias. The improved network can avoid the overfitting problem and speed up the training of the network. Through experiments, it is found that using the fusion features proposed in this paper, through classification, the classification accuracy has been improved.


Sign in / Sign up

Export Citation Format

Share Document