leukemia dataset
Recently Published Documents


TOTAL DOCUMENTS

9
(FIVE YEARS 1)

H-INDEX

2
(FIVE YEARS 1)

Processes ◽  
2019 ◽  
Vol 7 (9) ◽  
pp. 550 ◽  
Author(s):  
Hui Nies ◽  
Zalmiyah Zakaria ◽  
Mohd Mohamad ◽  
Weng Chan ◽  
Nazar Zaki ◽  
...  

Clustering techniques can group genes based on similarity in biological functions. However, the drawback of using clustering techniques is the inability to identify an optimal number of potential clusters beforehand. Several existing optimization techniques can address the issue. Besides, clustering validation can predict the possible number of potential clusters and hence increase the chances of identifying biologically informative genes. This paper reviews and provides examples of existing methods for clustering genes, optimization of the objective function, and clustering validation. Clustering techniques can be categorized into partitioning, hierarchical, grid-based, and density-based techniques. We also highlight the advantages and the disadvantages of each category. To optimize the objective function, here we introduce the swarm intelligence technique and compare the performances of other methods. Moreover, we discuss the differences of measurements between internal and external criteria to validate a cluster quality. We also investigate the performance of several clustering techniques by applying them on a leukemia dataset. The results show that grid-based clustering techniques provide better classification accuracy; however, partitioning clustering techniques are superior in identifying prognostic markers of leukemia. Therefore, this review suggests combining clustering techniques such as CLIQUE and k-means to yield high-quality gene clusters.


2018 ◽  
Vol 2018 ◽  
pp. 1-9
Author(s):  
Khawla EL bendadi ◽  
Yissam Lakhdar ◽  
El Hassan Sbai

Within the kernel methods, an improved kernel credal classification algorithm (KCCR) has been proposed. The KCCR algorithm uses the Euclidean distance in the kernel function. In this article, we propose to replace the Euclidean distance in the kernel with a regularized Mahalanobis metric. The Mahalanobis distance takes into account the dispersion of the data and the correlation between the variables. It differs from Euclidean distance in that it considers the variance and correlation of the dataset. The robustness of the method is tested using synthetic data and a benchmark database. Finally, a set of DNA microarray data from Leukemia dataset was used to show the performance of our method on real-world application.


Author(s):  
Khadijah Khadijah ◽  
Sri Hartati

AbstrakData microarray digunakan sebagai alternatif untuk diagnosa penyakit kanker karena kesulitan dalam dignosa kanker berdasarkan bentuk morfologis, yaitu perbedaan morfologis yang tipis antar jenis kanker yang berbeda. Penelitian ini bertujuan untuk membangun pengklasifikasi data microarray. Proses klasifikasi diawali dengan reduksi dimensi data microarray menggunakan DWT, dengan cara mendekomposisi sampel hingga level tertentu, kemudian mengambil nilai koefisien aproksimasi pada level tersebut sebagai fitur sampel. Fitur tersebut selanjutnya menjadi masukan untuk klasifikasi. Metode klasifikasi yang digunakan adalah ELM yang diterapkan pada RBFN. Dataset yang digunakan adalah data microarray multikelas, yaitu dataset GCM (16.063 gen, 14 kelas) dan Subtypes-Leukemia (12.600 gen, 7 kelas).Pengujian dilakukan dengan cara membagi data latih dan data uji secara random sepuluh kali dengan proporsi data yang sama. Classifier yang dihasilkan dari penelitian ini untuk dataset GCM belum memiliki performa yang cukup baik, ditunjukkan dengan nilai akurasi sekitar 75% ± 6,25% dan nilai minimum sensitivity yang masih rendah, yaitu 15% ± 19,95% menunjukkan bahwa sensitivity untuk tiap kelas belum merata, terdapat beberapa kelas yang sensitivity-nya masih rendah. Namun, classifier untuk dataset Subtypes-Leukemia yang memiliki jumlah kelas lebih sedikit dari dataset GCM memiliki performa yang cukup baik, ditunjukkan dengan nilai akurasi 87,68% ± 2,88% dan minimum sensitivity 51,90% ± 20,29%.   Kata kunci— microarray, ekspresi gen, DWT, ELM, RBFN AbstractMicroarray data is used as an alternative in cancer diagnosis because of the difficulties cancer diagnosis based on morphologis structures. Different classes of cancer usually have poor distintion of morphologis structures. The aim of this reserach is to bulid microarray data classfier. The classification process is started by reducing dimension of microarray data. The method used to reduce the microarray data dimension is DWT by decomposing the samples until certain decomposition level and then use approximation coefficients at those level as feature to classifier. Classifier used in this reserach is ELM implemeted on RBFN. Dataset used are GCM (16.063 genes, 14 classes) and Subtypes-Leukemia (12.600 genes, 7 classes). Testing process is done by randomly dividing the training and testing data ten times with same proprotion of training and testing data. The perfomance of classifier built in this research is not so good for GCM dataset, shown by accuracy 75% ± 6,25% and mean of minimum sensitivity 15% ± 19,95%. The low minimum sensitivity indicate that there are few classes that have low sensitivity. But the classifier for Subtypes-Leukemia dataset give better result, that is accuracy 87,68% ± 2,88%  and mean of minimum sensitivity 51,90% ± 20,29%.    Keywords— microarray, gene expression, DWT, ELM, RBFN


2012 ◽  
Vol 10 (05) ◽  
pp. 1250010 ◽  
Author(s):  
ALOK SHARMA ◽  
SEIYA IMOTO ◽  
SATORU MIYANO

Feature selection algorithms play a crucial role in identifying and discovering important genes for cancer classification. Feature selection algorithms can be broadly categorized into two main groups: filter-based methods and wrapper-based methods. Filter-based methods have been quite popular in the literature due to their many advantages, including computational efficiency, simplistic architecture, and an intuitively simple means of discovering biological and clinical aspects. However, these methods have limitations, and the classification accuracy of the selected genes is less accurate. In this paper, we propose a set of univariate filter-based methods using a between-class overlapping criterion. The proposed techniques have been compared with many other univariate filter-based methods using an acute leukemia dataset. The following properties have been examined: classification accuracy of the selected individual genes and the gene subsets; redundancy check among selected genes using ridge regression and LASSO methods; similarity and sensitivity analyses; functional analysis; and, stability analysis. A comprehensive experiment shows promising results for our proposed techniques. The univariate filter based methods using between-class overlapping criterion are accurate and robust, have biological significance, and are computationally efficient and easy to implement. Therefore, they are well suited for biological and clinical discoveries.


Author(s):  
Smita Patnaik ◽  
Tripti Swarnkar

Over the past few decades rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce tremendous amounts of information related to molecular biology. It is not possible to research on a large number of genes using traditional methods. DNA Microarray is one such technology which enables to monitor the expression levels for tens of thousands of genes in parallel . A common task with Microarray data is to determine which genes are differentially expressed between two samples obtained under two different conditions. To solve this problem several Statistical methods have been proposed. The Support Vector Machine is one of the most efficient & widely used statistical method for Microarray classification. In this paper we have classified leukemia dataset by using support vector machine under two conditions and also showed the performance of different type of kernels.


2009 ◽  
Vol 2009 ◽  
pp. 1-9 ◽  
Author(s):  
Huanping Zhang ◽  
Xiaofeng Song ◽  
Huinan Wang ◽  
Xiaobai Zhang

Computational analysis of microarray data has provided an effective way to identify disease-related genes. Traditional disease gene selection methods from microarray data such as statistical test always focus on differentially expressed genes in different samples by individual gene prioritization. These traditional methods might miss differentially coexpressed (DCE) gene subsets because they ignore the interaction between genes. In this paper, MIClique algorithm is proposed to identify DEC gene subsets based on mutual information and clique analysis. Mutual information is used to measure the coexpression relationship between each pair of genes in two different kinds of samples. Clique analysis is a commonly used method in biological network, which generally represents biological module of similar function. By applying the MIClique algorithm to real gene expression data, some DEC gene subsets which correlated under one experimental condition but uncorrelated under another condition are detected from the graph of colon dataset and leukemia dataset.


Sign in / Sign up

Export Citation Format

Share Document