New Short Term Prediction Method for Chemical Carcinogenicity by Hepatic Transcript Profiling following 28-Day Toxicity Tests in Rats

We have previously shown the hepatic gene expression profiles of carcinogens in 28-day toxicity tests were clustered into three major groups (Group-1 to 3). Here, we developed a new prediction method for Group-1 carcinogens which consist mainly of genotoxic rat hepatocarcinogens. The prediction formula was generated by a support vector machine using 5 selected genes as the predictive genes and predictive score was introduced to judge carcinogenicity. It correctly predicted the carcinogenicity of all 17 Group-1 chemicals and 22 of 24 non-carcinogens regardless of genotoxicity. In the dose-response study, the prediction score was altered from negative to positive as the dose increased, indicating that the characteristic gene expression profile emerged over a range of carcinogen-specific doses. We conclude that the prediction formula can quantitatively predict the carcinogenicity of Group-1 carcinogens. The same method may be applied to other groups of carcinogens to build a total system for prediction of carcinogenicity.

Download Full-text

Cancer classification of single-cell gene expression data by neural network

Bioinformatics ◽

10.1093/bioinformatics/btz772 ◽

2019 ◽

Cited By ~ 3

Author(s):

Bong-Hyun Kim ◽

Kijin Yu ◽

Peter C W Lee

Keyword(s):

Neural Network ◽

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cancer Classification ◽

Supplementary Information ◽

Support Vector ◽

K Nearest Neighbors ◽

Normal Tissues

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Sub-Classification of Hyperdiploid Myeloma Using Global Gene Expression Profiling and SNP-Based Mapping Arrays.

Blood ◽

10.1182/blood.v108.11.3390.3390 ◽

2006 ◽

Vol 108 (11) ◽

pp. 3390-3390

Author(s):

Brian A. Walker ◽

Paola E. Leone ◽

Matthew W. Jenner ◽

David C. Johnson ◽

David Gonzalez ◽

...

Keyword(s):

Gene Expression ◽

Hierarchical Clustering ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Poor Response ◽

Response To Therapy ◽

Genomic Change ◽

Group 5 ◽

Mapping Arrays ◽

Group 1

Abstract The translocation/cyclin classification system in myeloma does not neatly define subgroups of hyperdiploidy (HRD) and we sought a more definitive sub-classification. Using 131 pre-treatment samples (49 HRD with no split IgH locus by FISH) we defined subgroups using both supervised and unsupervised hierarchical clustering of gene expression profiles. RNA was purified from CD138+ cells, amplified using a 2-cycle IVT and hybridised onto U133 Plus 2 GeneChips. On 30 of the 49 HRD samples we also performed 500K SNP mapping arrays to define the true extent of the genomic change in HRD. The most common trisomic chromosomes were 15 (97%), 9 (86%), 19 (80%), 5 (77%), 11 (74%), 3 (64%), 21 (54%) and 7 (54%). There was no association between HRD and any of the major genetic abnormalities (1p, 1q, 6q, 8p, 13, 16q and 17p) compared to the non-HRD (NHRD) group. Many interstitial deletions were seen in all HRD samples, on both odd and even numbered chromosomes. However, using gene mapping alone it was not possible to globally sub-classify HRD myeloma. We compared NHRD and HRD sample gene expression profiles, removing differences between t(4;14) and t(11;14) cases in the NHRD group. This analysis showed that HRD samples segregate into 2 groups; one with a pattern distinct to NHRD samples and another containing genes that are up-regulated in both HRD and NHRD samples. In this analysis 176 genes were up-regulated in the HRD samples and were predominantly located on the trisomic chromosomes, especially 19, 11, 9 and 5. These genes showed a predominant upregulation of HGF and TRAIL, and down-regulation of TRAIL-R2 compared to NHRD samples. Unsupervised hierarchical clustering split the HRD samples into 5 distinct groups suggesting that there are distinct pathological entities. Group 1 overexpressed 90 genes including BCL2, CCNL1 (cyclin L1) and CDK6, consistent with a proliferation signature. Group 2 overexpressed interferon inducible genes including IFI6, IFI27, IFIT1 as well as TRAIL. Group 3 upregulated genes included IL8, MMP9 and TIMP2. Group 4 upregulated transcripts include neurexophilin 3. Group 5 was less well defined but contained transcripts for CCND2, WNT5A and CXCR4. To define clinically relevant subgroups the HRD samples were clustered comparing response or no response to induction chemotherapy. Analysis showed that Group 1 cases cluster together and were either non or minimal responders. This is consistent with the Group 1 cases over-expressing cell-cycle and proliferation related genes. Group 5 clustered together and were either complete or partial responders, and had a low expression of the genes over expressed by Group 1. The non-responder group overexpressed 58 genes and include MMSET-like 1 (in a region on 8p paralogous to 4p containing FGFR1), DVL3 (dishevelled homolog 3) and CCNL1. 23 genes were over expressed in the complete response group including caspase 1 and manic fringe homolog. The unsupervised HRD cluster and the supervised response cluster shared 10 genes, including CCNL1 and ASS. We have used both genetic and expression data to further define the HRD sub-group in terms of gene expression signatures and response to therapy and have identified 5 groups, of which Group 1 has a proliferation signature and poor response to induction therapy.

Download Full-text

Prediction of Breast Cancer Metastasis by Gene Expression Profiles: A Comparison of Metagenes and Single Genes

Cancer Informatics ◽

10.4137/cin.s10375 ◽

2012 ◽

Vol 11 ◽

pp. CIN.S10375 ◽

Cited By ~ 3

Author(s):

Mark Burton ◽

Mads Thomassen ◽

Qihua Tan ◽

Torben A. Kruse

Keyword(s):

Gene Expression ◽

Cross Validation ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Microarray Platform ◽

Support Vector ◽

Independent Data ◽

Performance Difference ◽

Feature Sets ◽

Prediction Of Metastasis

Background The popularity of a large number of microarray applications has in cancer research led to the development of predictive or prognostic gene expression profiles. However, the diversity of microarray platforms has made the full validation of such profiles and their related gene lists across studies difficult and, at the level of classification accuracies, rarely validated in multiple independent datasets. Frequently, while the individual genes between such lists may not match, genes with same function are included across such gene lists. Development of such lists does not take into account the fact that genes can be grouped together as metagenes (MGs) based on common characteristics such as pathways, regulation, or genomic location. Such MGs might be used as features in building a predictive model applicable for classifying independent data. It is, therefore, demanding to systematically compare independent validation of gene lists or classifiers based on metagene or individual gene (SG) features. Methods In this study we compared the performance of either metagene- or single gene-based feature sets and classifiers using random forest and two support vector machines for classifier building. The performance within the same dataset, feature set validation performance, and validation performance of entire classifiers in strictly independent datasets were assessed by 10 times repeated 10-fold cross validation, leave-one-out cross validation, and one-fold validation, respectively. To test the significance of the performance difference between MG- and SG-features/classifiers, we used a repeated down-sampled binomial test approach. Results MG- and SG-feature sets are transferable and perform well for training and testing prediction of metastasis outcome in strictly independent data sets, both between different and within similar microarray platforms, while classifiers had a poorer performance when validated in strictly independent datasets. The study showed that MG- and SG-feature sets perform equally well in classifying independent data. Furthermore, SG-classifiers significantly outperformed MG-classifier when validation is conducted between datasets using similar platforms, while no significant performance difference was found when validation was performed between different platforms. Conclusion Prediction of metastasis outcome in lymph node–negative patients by MG- and SG-classifiers showed that SG-classifiers performed significantly better than MG-classifiers when validated in independent data based on the same microarray platform as used for developing the classifier. However, the MG- and SG-classifiers had similar performance when conducting classifier validation in independent data based on a different microarray platform. The latter was also true when only validating sets of MG- and SG-features in independent datasets, both between and within similar and different platforms.

Download Full-text

Breast Cancer Case Identification Based on Deep Learning and Bioinformatics Analysis

Frontiers in Genetics ◽

10.3389/fgene.2021.628136 ◽

2021 ◽

Vol 12 ◽

Author(s):

Dongfang Jia ◽

Cheng Chen ◽

Chen Chen ◽

Fangfang Chen ◽

Ningrui Zhang ◽

...

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Gene Expression ◽

Expression Profiles ◽

Differential Expression Analysis ◽

Gene Expression Profiles ◽

Diagnostic Methods ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Hub Genes

Mastering the molecular mechanism of breast cancer (BC) can provide an in-depth understanding of BC pathology. This study explored existing technologies for diagnosing BC, such as mammography, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) and summarized the disadvantages of the existing cancer diagnosis. The purpose of this article is to use gene expression profiles of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) to classify BC samples and normal samples. The method proposed in this article triumphs over some of the shortcomings of traditional diagnostic methods and can conduct BC diagnosis more rapidly with high sensitivity and have no radiation. This study first selected the genes most relevant to cancer through weighted gene co-expression network analysis (WGCNA) and differential expression analysis (DEA). Then it used the protein–protein interaction (PPI) network to screen 23 hub genes. Finally, it used the support vector machine (SVM), decision tree (DT), Bayesian network (BN), artificial neural network (ANN), convolutional neural network CNN-LeNet and CNN-AlexNet to process the expression levels of 23 hub genes. For gene expression profiles, the ANN model has the best performance in the classification of cancer samples. The ten-time average accuracy is 97.36% (±0.34%), the F1 value is 0.8535 (±0.0260), the sensitivity is 98.32% (±0.32%), the specificity is 89.59% (±3.53%) and the AUC is 0.99. In summary, this method effectively classifies cancer samples and normal samples and provides reasonable new ideas for the early diagnosis of cancer in the future.

Download Full-text

Ensemble of Support Vector Machines to Improve the Cancer Class Prediction Based on the Gene Expression Profiles

Advances in Soft Computing - Innovations in Hybrid Intelligent Systems ◽

10.1007/978-3-540-74972-1_51 ◽

2007 ◽

pp. 393-400 ◽

Cited By ~ 4

Author(s):

Ángela Blanco ◽

Manuel Martín-Merino ◽

Javier De Las Rivas

Keyword(s):

Gene Expression ◽

Support Vector Machines ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Support Vector ◽

Class Prediction ◽

Vector Machines

Download Full-text

Gene Expression Profiling in Adult Acute Lymphoblastic Leukemia, Biphenotypic Acute Leukemia, and Acute Myeloid Leukemia M0: Confirmation of Immunophenotypic and Cytogenetic Diagnostic Findings.

Blood ◽

10.1182/blood.v104.11.993.993 ◽

2004 ◽

Vol 104 (11) ◽

pp. 993-993

Author(s):

Wolfgang Kern ◽

Alexander Kohlmann ◽

Claudia Schoch ◽

Martin Dugas ◽

Sylvia Merk ◽

...

Keyword(s):

Gene Expression ◽

Diagnostic Criteria ◽

Prediction Accuracy ◽

Expression Profiles ◽

Lymphoblastic Leukemia ◽

Gene Expression Profiles ◽

Support Vector ◽

Specific Gene ◽

Acute Leukemias ◽

Acute Myeloid

Abstract Diagnosis and classification of acute lymphoblastic leukemias (ALL) and their distinction from biphenotypic acute leukemias (BAL) and acute myeloid leukemias with minimal differentiation (AML M0) is largely based on immunophenotyping. The EGIL classification, adopted by the WHO classification, defines 4 different subtypes of both B-precursor and T-precursor ALL as well as detailed criteria for BAL. Specific cytogenetic features useful for classificationare found in some cases only. We analyzed gene expression profiles in 173 such patients (Pro-B-ALL n=25, c-ALL/Pre-B-ALL n=65 (with t(9;22) n=35, without t(9;22) n=30), mature B-ALL n=13, Pro-T-ALL n=6, Pre-T-ALL n=13, cortical T-ALL n=20, BAL (myeloid and T-lineage) n=17, AML M0 n=14). All cases were assessed by cytomorphology, immunophenotyping, cytogenetics, and molecular genetics. All cases with Pro-B-ALL had t(4;11)/MLL-AF4, all cases with mature B-ALL had t(8;14). Samples were hybridized to both U133A and U133B microarrays (Affymetrix). Top 300 differentially expressed genes were identified for each group in comparison to all other groups and individual other groups and used for classification by various Support Vector Machines (SVM) with 10-fold cross validation (CV). Prediction accuracy for discriminating T- from B-precursor ALL was 100%. Accordingly, principal component analysis (PCA) yielded a complete separation of both groups. PCA of B-precursor ALL cases showed distinct clusters for Pro-B-ALL, c-ALL/Pre-B-ALL, and mature B-ALL, however, c-ALL/Pre-B-ALL with t(9;22) were not completely discriminated from those without. Accordingly, classifying B-precursor ALL with SVM resulted in a 87.4% accuracy. Pre-T-ALL cases clustered distinct from cortical T-ALL with hte exception of two cases. The other Pre-T-ALLs clustered together with Pro-T-ALL. Analyzing T-precusor ALL with SVM and 10-fold CV resulted in an accuracy of only 56.4%. Including BAL and AML M0 into these analyses revealed significant overlaps between samples from these entities and T-ALL cases in PCA; prediction accuracy using SVM and 10-fold CV was 79.8%. This accuracy was confirmed applying 100 runs of SVM with 2/3 of samples being randomly selected as training set and 1/3 as test set which resulted in a median accuracy of 77.2% (range, 67.5% to 85.1%). A 100% prediction accuracy was achieved in Pro-B-ALL and mature B-ALL. Misclassifications were: c-ALL/Pre-B-ALL with t(9;22) as c-ALL/Pre-B-ALL without t(9;22) (6/35) and vice versa (6/30). Of the 13 Pre-T-ALL cases 4 were classified as BAL and 3 as cortical T-ALL. Of the 6 Pro-T-ALL cases 2 were classified as AML M0, 3 as BAL, and 1 as Pre-T-ALL. Of the 17 BAL cases 2 were classified as AML M0, 1 as c-ALL/Pre-B-ALL, 2 as Pre-T-ALL, and 1 as Pro-T-ALL. These analyses confirm that gene expression profiles allow the identification of Pro-B-ALL with t(4;11) and mature B-ALL with t(8;14) but do not unequivocally identify the presence of t(9;22) in c-ALL/Pre-B-ALL. Cortical T-ALL are characterized by a specific gene expression profile which is, however, shared by few cases currently diagnosed as Pre-T-ALL. Thus, diagnostic criteria (surface expression of CD1a only) should be optimized. The same applies to diagnostic criteria for more immature T-ALL, BAL, and AML M0. Loss of 5q is frequently observed in all of these latter entities and may be a future diagnostic marker superseding flow cytometry.

Download Full-text

Classification Algorithm on Gene Expression Profiles of Tumor Using Neighborhood Rough Set and Support Vector Machine

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.850-851.1238 ◽

2013 ◽

Vol 850-851 ◽

pp. 1238-1242

Author(s):

Tao Chen

Keyword(s):

Gene Expression ◽

Rough Set ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Classification Algorithm ◽

Support Vector ◽

Data Set ◽

Filtering Method ◽

Neighborhood Rough Set ◽

Feature Filtering

Gene expression profiles of tumor have the limited amount of samples in comparison to the high dimensionality of the samples;this paper proposed a classification algorithm based on neighborhood rough set to improve classification accuracy.This paper first applied feature filtering method of kruskal-wallis rank sum test to select a set of top-ranked related genes, and then applied neighborhood rough set on these genes to generate a informative genes subset. Finally, SVM was used to classify the GEP data set. The result of the experiment indicates that this method can effectively improve classification accuracy, and it has higher generalization.

Download Full-text

Distinguishing three subtypes of hematopoietic cells based on gene expression profiles using a support vector machine

Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease ◽

10.1016/j.bbadis.2017.12.003 ◽

2018 ◽

Vol 1864 (6) ◽

pp. 2255-2265 ◽

Cited By ~ 6

Author(s):

Yu-Hang Zhang ◽

Yu Hu ◽

Yuchao Zhang ◽

Lan-Dian Hu ◽

Xiangyin Kong

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Expression Profiles ◽

Hematopoietic Cells ◽

Gene Expression Profiles ◽

Support Vector

Download Full-text

A SVM Model for Candidate Y-chromosome Gene Discovery in Prostate Cancer

10.29007/3nzw ◽

2019 ◽

Author(s):

Wageesha Rasanjana ◽

Sandun Rajapaksa ◽

Indika Perera ◽

Dulani Meedeniya

Keyword(s):

Gene Expression ◽

Prostate Cancer ◽

Y Chromosome ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Support Vector ◽

Body Tissues ◽

Average Accuracy ◽

To Come ◽

Significant Expression

Prostate cancer is widely known to be one of the most common cancers among men around the world. Due to its high heterogeneity, many of the studies carried out to identify the molecular level causes for cancer have only been partially successful. Among the techniques used in cancer studies, gene expression profiling is seen to be one of the most popular techniques due to its high usage. Gene expression profiles reveal information about the functionality of genes in different body tissues at different conditions. In order to identify cancer-decisive genes, differential gene expression analysis is carried out using statistical and machine learning methodologies. It helps to extract information about genes that have significant expression differences between healthy tissues and cancerous tissues. In this paper, we discuss a comprehensive supervised classification approach using Support Vector Machine (SVM) models to investigate differentially expressed Y-chromosome genes in prostate cancer. 8 SVM models, which are tuned to have 98.3% average accuracy have been used for the analysis. We were able to capture genes like CD99 (MIC2), ASMTL, DDX3Y and TXLNGY to come out as the best candidates. Some of our results support existing findings while introducing novel findings to be possible prostate cancer candidates.

Download Full-text

Breast and Colon Cancer Classification from Gene Expression Profiles Using Data Mining Techniques

10.20944/preprints202002.0324.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mohamed Loey ◽

Mohammed Wajeeh Jasim ◽

Hazem M. EL-Bakry ◽

Mohamed Hamed N. Taha ◽

Nour Eldeen M. Khalifa

Keyword(s):

Gene Expression ◽

Classification Accuracy ◽

Information Gain ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Disease Diagnosis ◽

Performance Measure ◽

Support Vector ◽

Svm Classifier ◽

Cancer Type

Early detection of cancer increases the probability of recovery. This paper presents an intelligent decision support system (IDSS) for the early diagnosis of cancer based on gene expression profiles collected using DNA microarrays. Such datasets pose a challenge because of the small number of samples (no more than a few hundred) relative to the large number of genes (on the order of thousands). Therefore, a method of reducing the number of features (genes) that are not relevant to the disease of interest is necessary to avoid overfitting. The proposed methodology uses the information gain (IG) to select the most important features from the input patterns. Then, the selected features (genes) are reduced by applying the grey wolf optimization (GWO) algorithm. Finally, the methodology employs a support vector machine (SVM) classifier for cancer type classification. The proposed methodology was applied to two datasets (Breast and Colon) and was evaluated based on its classification accuracy, which is the most important performance measure in disease diagnosis. The experimental results indicate that the proposed methodology is able to enhance the stability of the classification accuracy as well as the feature selection

Download Full-text