scholarly journals Identification of Gene Signatures Used to Recognize Biological Characteristics of Gastric Cancer upon Gene Expression Data

2014 ◽  
Vol 9 ◽  
pp. BMI.S13059 ◽  
Author(s):  
Zhi Yan ◽  
Brian T. Luke ◽  
Shirley X. Tsang ◽  
Rui Xing ◽  
Yuanming Pan ◽  
...  

High-throughput gene expression microarrays can be examined by machine-learning algorithms to identify gene signatures that recognize the biological characteristics of specific human diseases, including cancer, with high sensitivity and specificity. A previous study compared 20 gastric cancer (GC) samples against 20 normal tissue (NT) samples and identified 1,519 differentially expressed genes (DEGs). In this study, Classification Information Index (CII), Information Gain Index (IGI), and RELIEF algorithms are used to mine the previously reported gene expression profiling data. In all, 29 of these genes are identified by all three algorithms and are treated as GC candidate biomarkers. Three biomarkers, COL1A2, ATP4B, and HADHSC, are selected and further examined using quantitative real-time polymerase chain reaction (qRT-PCR) and immunohistochemistry (IHC) staining in two independent sets of GC and normal adjacent tissue (NAT) samples. Our study shows that COL1A2 and HADHSC are the two best biomarkers from the microarray data, distinguishing all GC from the NT, whereas ATP4B is diagnostically significant in lab tests because of its wider range of fold-changes in expression. Herein, a data-mining model applicable for small sample sizes is presented and discussed. Our result suggested that this mining model may be useful in small sample-size studies to identify putative biomarkers and potential biological features of GC.

2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


Pathobiology ◽  
2021 ◽  
Vol 88 (2) ◽  
pp. 156-169
Author(s):  
Williams Fernandes Barra ◽  
Dionison Pereira Sarquis ◽  
André Salim Khayat ◽  
Bruna Cláudia Meireles Khayat ◽  
Samia Demachki ◽  
...  

Identifying a microbiome pattern in gastric cancer (GC) is hugely debatable due to the variation resulting from the diversity of the studied populations, clinical scenarios, and metagenomic approach. <i>H. pylori</i> remains the main microorganism impacting gastric carcinogenesis and seems necessary for the initial steps of the process. Nevertheless, an additional non-<i>H. pylori</i> microbiome pattern is also described, mainly at the final steps of the carcinogenesis. Unfortunately, most of the presented results are not reproducible, and there are no consensual candidates to share the <i>H. pylori</i> protagonists. Limitations to reach a consistent interpretation of metagenomic data include contamination along every step of the process, which might cause relevant misinterpretations. In addition, the functional consequences of an altered microbiome might be addressed. Aiming to minimize methodological bias and limitations due to small sample size and the lack of standardization of bioinformatics assessment and interpretation, we carried out a comprehensive analysis of the publicly available metagenomic data from various conditions relevant to gastric carcinogenesis. Mainly, instead of just analyzing the results of each available publication, a new approach was launched, allowing the comprehensive analysis of the total sample amount, aiming to produce a reliable interpretation due to using a significant number of samples, from different origins, in a standard protocol. Among the main results, <i>Helicobacter</i> and <i>Prevotella</i> figured in the “top 6” genera of every group. <i>Helicobacter</i> was the first one in chronic gastritis (CG), gastric cancer (GC), and adjacent (ADJ) groups, while <i>Prevotella</i> was the leader among healthy control (HC) samples. Groups of bacteria are differently abundant in each clinical situation, and bacterial metabolic pathways also diverge along the carcinogenesis cascade. This information may support future microbiome interventions aiming to face the carcinogenesis process and/or reduce GC risk.


Stroke ◽  
2014 ◽  
Vol 45 (suppl_1) ◽  
Author(s):  
Blake Haas ◽  
Nestor R Gonzalez ◽  
Elina Nikkola ◽  
Mark Connolly ◽  
William Hsu ◽  
...  

Introduction: Intracranial aneurysms (IA) growth and rupture have been associated with chronic remodeling of the arterial wall. However, the pathobiology of this process remains poorly understood. The objective of the present study was to evaluate the feasibility of analyzing gene expression patterns in peripheral blood of patients with ruptured and unruptured saccular IAs. Materials and Methods: We analyzed human whole blood transcriptomes by performing paired-end, 100 bp RNA-sequencing (RNAseq) using the Illumina platform. We used STAR to align reads to the genome, HTSeq to count reads, and DESeq to normalize counts across samples. Self-reported patient information was used to correct expression values for ancestry, age, and sex. We utilized weighted gene co-expression network analysis (WGCNA) to identify gene expression network modules associated with IA size and rupture. The DAVID tool was employed to search for Gene Ontology enrichment in relevant modules. Results: Samples from 12 patients (9 females, age 57.6 +/-12) with IAs were analyzed. Four had ruptured aneurysms. RNA isolation and application of the methodology described above was successful in all samples. Although the small sample size prevents us from drawing definite conclusions, we observed promising novel co-expression networks for IAs: WCGNA analysis showed down-regulation of two transcript modules associated with ruptured IA status (r=-0.78, p=0.008 and r=-0.77, p=0.009), and up-regulation of two modules associated with aneurysm size (r=0.86, p=0.002 and r=0.9, p=4e-04), respectively. DAVID analyses showed that genes upregulated in an IA size-associated module were enriched with genes involved in cellular respiration and translation, while genes involved in transcription were down-regulated in a module associated with ruptured IAs. Conclusions: Whole blood RNAseq analysis is a feasible tool to capture transcriptome dynamics and achieve a better understanding of the pathophysiology of IAs. Further longitudinal studies of patients with IAs using network analysis are justified.


2021 ◽  
Author(s):  
Xin Chen ◽  
Qingrun Zhang ◽  
Thierry Chekouo

Abstract Background: DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. Methods: BACkPAy is a pre-screening Bayesian approach to detect biological meaningful clusters of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e. non-differential) with flat methylation pattern levels accross experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with 3 tissue types and each type contains 3 gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Results: Using BACkPAy, we identified 8 biological meaningful clusters/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e. predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. Conclusions: We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1 and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.


Author(s):  
WEIXIANG LIU ◽  
KEHONG YUAN ◽  
JIAN WU ◽  
DATIAN YE ◽  
ZHEN JI ◽  
...  

Classification of gene expression samples is a core task in microarray data analysis. How to reduce thousands of genes and to select a suitable classifier are two key issues for gene expression data classification. This paper introduces a framework on combining both feature extraction and classifier simultaneously. Considering the non-negativity, high dimensionality and small sample size, we apply a discriminative mixture model which is designed for non-negative gene express data classification via non-negative matrix factorization (NMF) for dimension reduction. In order to enhance the sparseness of training data for fast learning of the mixture model, a generalized NMF is also adopted. Experimental results on several real gene expression datasets show that the classification accuracy, stability and decision quality can be significantly improved by using the generalized method, and the proposed method can give better performance than some previous reported results on the same datasets.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e15511-e15511
Author(s):  
Mojun Zhu ◽  
Douglas W. Mahoney ◽  
Kelli Burger ◽  
Patrick H. Foote ◽  
Karen A. Doering ◽  
...  

e15511 Background: Aberrantly methylated DNA marker (MDM) candidates are strongly associated with primary colorectal cancer (CRC) before treatment and detect CRC recurrence with high sensitivity when assayed from plasma. The relationship of these MDMs in association to chemotherapy treatment response is unknown. Methods: In a prospective cohort of patients receiving systemic therapy for advanced CRC, peripheral blood was collected serially during restaging visits. 15 patients were retrospectively identified to have partial response (PR), stable disease (SD) and progressive disease (PD) to treatment (n=5 for each group) based on the Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1. Using paired samples from each patient before and after response assessment, we analyzed 11 MDMs ( GRIN2D, ZNF671, ANKRD13B, QKI, VAV3, JAM3, SFMBT2, CHST2, ZNF568, FER1L4 and CNNM1) to assess correlation with treatment response. Cell-free DNA was extracted and bisulfite treated before MDMs were quantified by target enrichment long-probe quantitative-amplified signal assay and normalized to a methylated sequence of B3GALT6. Continuous variables are summarized as a median with corresponding interquartile ranges (IQR) and comparisons between subgroups were based on the Wilcox Rank Sums test. Results: The median interval between pre- and post-response assessment visits was 69 days (IQR: 63-83 days) and the level of tumor burden at pre-assessment was similar across all response types (Table 1). Patients with PD had higher levels of methylated GRIN2D, ZNF671 and ANKRD13B than those with PR or SD at baseline and may offer additional prognostic value over CEA which was similar in the PR and PD groups before treatment (Table 1). Elevation of pre-assessment MDMs preceded radiographic evidence of disease progression by 82 days (IQR 69-83 days). Conclusions: Three MDMs, GRIN2D, ZNF671 and ANKRD13B, were found to reflect treatment response (PD vs. PR + SD) as shown in the table. Although this pilot study was limited by a small sample size, it demonstrated the feasibility of using plasma-based MDMs in monitoring treatment response to systemic therapy for advanced CRC and should be compared to CEA in a larger study.[Table: see text]


Author(s):  
Mi Xiao ◽  
Liang Gao ◽  
Xinyu Shao ◽  
Haobo Qiu ◽  
Li Nie

To reduce the tremendous computational expense of implementing complex simulation and analysis in engineering design, more and more researchers pay attention to the construction of approximation models. The approximation models, also called surrogate models and metamodels, can be utilized to replace simulation and analysis codes for design and optimization. Commonly used metamodeling techniques include response surface methodology (RSM), kriging and radial basis functions (RBF). In this paper, gene expression programming (GEP) algorithm in evolutionary computing is investigated as an alternative technique for approximation. The performance of GEP is examined by its innovative applications to the approximation of mathematical functions and engineering analyses. Compared to RSM, kriging and RBF, GEP is demonstrated to be more accurate for the small sample size. For large sample sets, GEP also shows good approximation accuracy. Additionally, GEP has the best transparency since it can provide explicit and compact function relationships and clear factor contributions. Overall, as a novel metamodeling technique, GEP exhibits great capabilities to provide the accurate approximation of a design space and will have wide applications in engineering design, especially when only a few sample points are selected for approximation.


2012 ◽  
Vol 30 (4_suppl) ◽  
pp. 128-128
Author(s):  
Kohei Shitara ◽  
Yasushi Yatabe ◽  
Masato Sugano ◽  
Keitaro Matsuo ◽  
Chihiro Kondo ◽  
...  

128 Background: ToGA study showed that trastuzumab given in combination with first-line chemotherapy (fluoropyrimidine plus cisplatin) improved the overall survival of HER2-positive patients with advanced gastric cancer (AGC). Meanwhile, the prognostic value of HER2 or the efficacy of trastuzumab in second- or further-line chemotherapy remains controversial. Methods: We retrospectively analyzed 567 patients with AGC who initiated systemic chemotherapy before March 2011. Among them, 287 were evaluated for their HER2 status. HER2 positivity was defined as IHC 3+ or IHC 2+ with amplification by FISH. Treatment outcomes were compared between patients with HER2-positive and HER2-negative AGC. To evaluate the impact of exposure to trastuzumab in any line of chemotherapy, we applied time-varying covariates (TVC) analysis to avoid possible lead-time bias. Results: The median survival time (MST) of HER2-evaluated patients (n=287) tended to be better than that of HER2-non-evaluated patients (n=280, 14.5 vs. 13.2 months; P=0.03). Among the HER2-evaluated patients, 47 (16.3%) were HER2-positive and had longer survival than HER2-negative patients (24.1 vs. 13.4 months; P=0.05). Among the HER2-positive patients, 35 received trastuzumab; 15 patients received it as first-line therapy and 20 received it as second- or further-line therapy. The MST of HER2-positive patients with trastuzumab treatment was significantly longer than that of HER2-positive patients without trastuzumab (26.6 vs. 13.5 months; P=0.015). HER2-negative patients and HER2-positive patients without trastuzumab had similar survival durations. According to multivariate analysis with TVCs, exposure to trastuzumab was independently associated with better prognosis (HR 0.54, P=0.04). Conclusions: Although the retrospective nature and small sample size are major limitations of this study, recent HER2-positive AGC patients showed a better prognosis than HER2-negative patients, especially with the introduction of trastuzumab.


2004 ◽  
Vol 02 (04) ◽  
pp. 669-679 ◽  
Author(s):  
MASATO INOUE ◽  
SHIN-ICHI NISHIMURA ◽  
GEN HORI ◽  
HIROYUKI NAKAHARA ◽  
MICHIKO SAITO ◽  
...  

A gene-expression microarray datum is modeled as an exponential expression signal (log-normal distribution) and additive noise. Variance-stabilizing transformation based on this model is useful for improving the uniformity of variance, which is often assumed for conventional statistical analysis methods. However, the existing method of estimating transformation parameters may not be perfect because of poor management of outliers. By employing an information normalization technique, we have developed an improved parameter estimation method, which enables statistically more straightforward outlier exclusion and works well even in the case of small sample size. Validation of this method with experimental data has suggested that it is superior to the conventional method.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Lianxin Zhong ◽  
Qingfang Meng ◽  
Yuehui Chen

The correct classification of cancer subtypes is of great significance for the in-depth study of cancer pathogenesis and the realization of accurate treatment for cancer patients. In recent years, the classification of cancer subtypes using deep neural networks and gene expression data has become a hot topic. However, most classifiers may face the challenges of overfitting and low classification accuracy when dealing with small sample size and high-dimensional biological data. In this paper, the Cascade Flexible Neural Forest (CFNForest) Model was proposed to accomplish cancer subtype classification. CFNForest extended the traditional flexible neural tree structure to FNT Group Forest exploiting a bagging ensemble strategy and could automatically generate the model’s structure and parameters. In order to deepen the FNT Group Forest without introducing new hyperparameters, the multilayer cascade framework was exploited to design the FNT Group Forest model, which transformed features between levels and improved the performance of the model. The proposed CFNForest model also improved the operational efficiency and the robustness of the model by sample selection mechanism between layers and setting different weights for the output of each layer. To accomplish cancer subtype classification, FNT Group Forest with different feature sets was used to enrich the structural diversity of the model, which make it more suitable for processing small sample size datasets. The experiments on RNA-seq gene expression data showed that CFNForest effectively improves the accuracy of cancer subtype classification. The classification results have good robustness.


Sign in / Sign up

Export Citation Format

Share Document