MLW-gcForest: a multi-weighted gcForest model towards the staging of lung adenocarcinoma based on multi-modal genetic data

Abstract Background Lung cancer is one of the most common types of cancer, among which lung adenocarcinoma accounts for the largest proportion. Currently, accurate staging is a prerequisite for effective diagnosis and treatment of lung adenocarcinoma. Previous research has used mainly single-modal data, such as gene expression data, for classification and prediction. Integrating multi-modal genetic data (gene expression RNA-seq, methylation data and copy number variation) from the same patient provides the possibility of using multi-modal genetic data for cancer prediction. A new machine learning method called gcForest has recently been proposed. This method has been proven to be suitable for classification in some fields. However, the model may face challenges when applied to small samples and high-dimensional genetic data. Results In this paper, we propose a multi-weighted gcForest algorithm (MLW-gcForest) to construct a lung adenocarcinoma staging model using multi-modal genetic data. The new algorithm is based on the standard gcForest algorithm. First, different weights are assigned to different random forests according to the classification performance of these forests in the standard gcForest model. Second, because the feature vectors generated under different scanning granularities have a diverse influence on the final classification result, the feature vectors are given weights according to the proposed sorting optimization algorithm. Then, we train three MLW-gcForest models based on three single-modal datasets (gene expression RNA-seq, methylation data, and copy number variation) and then perform decision fusion to stage lung adenocarcinoma. Experimental results suggest that the MLW-gcForest model is superior to the standard gcForest model in constructing a staging model of lung adenocarcinoma and is better than the traditional classification methods. The accuracy, precision, recall, and AUC reached 0.908, 0.896, 0.882, and 0.96, respectively. Conclusions The MLW-gcForest model has great potential in lung adenocarcinoma staging, which is helpful for the diagnosis and personalized treatment of lung adenocarcinoma. The results suggest that the MLW-gcForest algorithm is effective on multi-modal genetic data, which consist of small samples and are high dimensional.

Download Full-text

Transcriptional profiling of iPSC-derived neurons with reciprocal monogenic CNV in 3p26.3 region

Nauchno-prakticheskii zhurnal «Medicinskaia genetika» ◽

10.25557/2073-7998.2020.03.10-11 ◽

2020 ◽

pp. 10-11

Author(s):

М.Е. Лопаткина ◽

В.С. Фишман ◽

М.М. Гридина ◽

Н.А. Скрябин ◽

Т.В. Никитина ◽

...

Keyword(s):

Gene Expression ◽

Copy Number ◽

System Development ◽

Transcriptional Profiling ◽

Global Gene Expression ◽

Nervous System Development ◽

Number Variation ◽

The Central Nervous System ◽

Cntn6 Gene ◽

Copy Number Changes

Проведен анализ генной экспрессии в нейронах, дифференцированных из индуцированных плюрипотентных стволовых клеток пациентов с идиопатическими интеллектуальными нарушениями и реципрокными хромосомными мутациями в регионе 3p26.3, затрагивающими единственный ген CNTN6. Для нейронов с различным типом хромосомных аберраций была показана глобальная дисрегуляция генной экспрессии. В нейронах с вариациями числа копий гена CNTN6 была снижена экспрессия генов, продукты которых вовлечены в процессы развития центральной нервной системы. The gene expression analysis of iPSC-derived neurons, obtained from patients with idiopathic intellectual disability and reciprocal microdeletion and microduplication in 3p26.3 region affecting the single CNTN6 gene was performed. The global gene expression dysregulation was demonstrated for cells with CNTN6 copy number variation. Gene expression in neurons with CNTN6 copy number changes was downregulated for genes, whose products are involved in the central nervous system development.

Download Full-text

Faculty Opinions recommendation of The impact of copy number variation on local gene expression in mouse hematopoietic stem and progenitor cells.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1158714.618930 ◽

2009 ◽

Author(s):

Michael Lassner ◽

Antoni Rafalski

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Progenitor Cells ◽

Copy Number ◽

Hematopoietic Stem ◽

Stem And Progenitor Cells ◽

Number Variation ◽

The Impact

Download Full-text

Evaluation of affinity-based genome-wide DNA methylation data: Effects of CpG density, amplification bias, and copy number variation

Genome Research ◽

10.1101/gr.110601.110 ◽

2010 ◽

Vol 20 (12) ◽

pp. 1719-1729 ◽

Cited By ~ 92

Author(s):

M. D. Robinson ◽

C. Stirzaker ◽

A. L. Statham ◽

M. W. Coolen ◽

J. Z. Song ◽

...

Keyword(s):

Dna Methylation ◽

Copy Number Variation ◽

Copy Number ◽

Methylation Data ◽

Amplification Bias ◽

Genome Wide ◽

Number Variation

Download Full-text

Insights into dispersed duplications and complex structural mutations from whole genome sequencing 706 families

10.1101/2020.08.03.235358 ◽

2020 ◽

Author(s):

Christopher W. Whelan ◽

Robert E. Handsaker ◽

Giulio Genovese ◽

Seva Kashin ◽

Monkol Lek ◽

...

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

De Novo ◽

Whole Genome ◽

Sequencing Data ◽

Number Variation ◽

Structural Mutations ◽

Or Gene ◽

Genomic Locations

AbstractTwo intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs.Author SummaryCopy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the copy number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already display common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.

Download Full-text

Recent Advances in Studying of Copy Number Variation and Gene Expression

Gene Expression to Genetical Genomics ◽

10.4137/gegg.s14286 ◽

2014 ◽

Vol 7 ◽

pp. 1-5 ◽

Cited By ~ 1

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

Recent Advances ◽

Number Variation

Download Full-text

Copy number variation, gene expression and histological localization of human beta-defensin 2 in patients with adeno-tonsillar hypertrophy

Biotechnic & Histochemistry ◽

10.1080/10520295.2020.1752936 ◽

2020 ◽

Vol 95 (8) ◽

pp. 634-640

Author(s):

Fulvio Celsi ◽

Luisa Zupin ◽

Emmanouil Athanasakis ◽

Eva Orzan ◽

Domenico Leonardo Grasso ◽

...

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

Tonsillar Hypertrophy ◽

Number Variation

Download Full-text

Integrated Analyses of Copy Number Variations and Gene Expression in Lung Adenocarcinoma

PLoS ONE ◽

10.1371/journal.pone.0024829 ◽

2011 ◽

Vol 6 (9) ◽

pp. e24829 ◽

Cited By ~ 50

Author(s):

Tzu-Pin Lu ◽

Liang-Chuan Lai ◽

Mong-Hsun Tsai ◽

Pei-Chun Chen ◽

Chung-Ping Hsu ◽

...

Keyword(s):

Gene Expression ◽

Lung Adenocarcinoma ◽

Copy Number ◽

Copy Number Variations

Download Full-text

Splicing, Mutation, and Methylation Alterations Drive Gene Expression in HPV-OPC more than Copy Number Variation: A Network Propagation Analysis

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2019.11.119 ◽

2020 ◽

Vol 106 (5) ◽

pp. 1185

Author(s):

J.R. Qualliotine ◽

B. Rosenthal ◽

G. Xu ◽

A. Mark ◽

C.A. Nasamram ◽

...

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

Splicing Mutation ◽

Drive Gene Expression ◽

Propagation Analysis ◽

Network Propagation ◽

Number Variation ◽

Drive Gene

Download Full-text

Copy number variation is highly correlated with differential gene expression: a pan-cancer study

BMC Medical Genetics ◽

10.1186/s12881-019-0909-5 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 18

Author(s):

Xin Shao ◽

Ning Lv ◽

Jie Liao ◽

Jinbo Long ◽

Rui Xue ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Copy Number Variation ◽

Differential Gene Expression ◽

Copy Number ◽

Close Correlation ◽

Number Variation ◽

Cancer Types ◽

Differential Gene ◽

The Relationship

Abstract Background Cancer is a heterogeneous disease with many genetic variations. Lines of evidence have shown copy number variations (CNVs) of certain genes are involved in development and progression of many cancers through the alterations of their gene expression levels on individual or several cancer types. However, it is not quite clear whether the correlation will be a general phenomenon across multiple cancer types. Methods In this study we applied a bioinformatics approach integrating CNV and differential gene expression mathematically across 1025 cell lines and 9159 patient samples to detect their potential relationship. Results Our results showed there is a close correlation between CNV and differential gene expression and the copy number displayed a positive linear influence on gene expression for the majority of genes, indicating that genetic variation generated a direct effect on gene transcriptional level. Another independent dataset is utilized to revalidate the relationship between copy number and expression level. Further analysis show genes with general positive linear influence on gene expression are clustered in certain disease-related pathways, which suggests the involvement of CNV in pathophysiology of diseases. Conclusions This study shows the close correlation between CNV and differential gene expression revealing the qualitative relationship between genetic variation and its downstream effect, especially for oncogenes and tumor suppressor genes. It is of a critical importance to elucidate the relationship between copy number variation and gene expression for prevention, diagnosis and treatment of cancer.

Download Full-text