scholarly journals Toolbox for Mobile-Element Insertion Detection on Cancer Genomes

2014 ◽  
Vol 13s4 ◽  
pp. CIN.S13979 ◽  
Author(s):  
Wan-Ping Lee ◽  
Jiantao Wu ◽  
Gabor T. Marth

Mobile elements constitute greater than 45% of the human genome as a result of repeated insertion events during human genome evolution. Although most of mobile elements are fixed within the human population, some elements (including ALU, long interspersed elements (LINE) 1 (L1), and SVA) are still actively duplicating and may result in life-threatening human diseases such as cancer, motivating the need for accurate mobile-element insertion (MEI) detection tools. We developed a software package, TANGRAM, for MEI detection in next-generation sequencing data, currently serving as the primary MEI detection tool in the 1000 Genomes Project. TANGRAM takes advantage of valuable mapping information provided by our own MOSAIK mapper, and until recently required MOSAIK mappings as its input. In this study, we report a new feature that enables TANGRAM to be used on alignments generated by any mainstream short-read mapper, making it accessible for many genomic users. To demonstrate its utility for cancer genome analysis, we have applied TANGRAM to the TCGA (The Cancer Genome Atlas) mutation calling benchmark 4 dataset. TANGRAM is fast, accurate, easy to use, and open source on https://github.com/jiantao/Tangram .

2015 ◽  
Vol 14s1 ◽  
pp. CIN.S24657
Author(s):  
Wan-Ping Lee ◽  
Jiantao Wu ◽  
Gabor T. Marth

Mobile elements constitute greater than 45% of the human genome as a result of repeated insertion events during human genome evolution. Although most of mobile elements are fixed within the human population, some elements (including ALU, long interspersed elements (LINE) 1 (L1), and SVA) are still actively duplicating and may result in life-threatening human diseases such as cancer, motivating the need for accurate mobile-element insertion (MEI) detection tools. We developed a software package, TANGRAM, for MEI detection in next-generation sequencing data, currently serving as the primary MEI detection tool in the 1000 Genomes Project. TANGRAM takes advantage of valuable mapping information provided by our own MOSAIK mapper, and until recently required MOSAIK mappings as its input. In this study, we report a new feature that enables TANGRAM to be used on alignments generated by any mainstream short-read mapper, making it accessible for many genomic users. To demonstrate its utility for cancer genome analysis, we have applied TANGRAM to the TCGA (The Cancer Genome Atlas) mutation calling benchmark 4 dataset. TANGRAM is fast, accurate, easy to use, and open source on https://github.com/jiantao/Tangram .


2019 ◽  
Author(s):  
Matthew G. Durrant ◽  
Michelle M. Li ◽  
Ben Siranosian ◽  
Ami S. Bhatt

AbstractMobile genetic elements contribute to bacterial adaptation and evolution; however, detecting these elements in a high-throughput and unbiased manner remains challenging. Here, we demonstrate ade novoapproach to identify mobile elements from short-read sequencing data. The method identifies the precise site of mobile element insertion and infers the identity of the inserted sequence. This is an improvement over previous methods that either rely on curated databases of known mobile elements or rely on ‘split-read’ alignments that assume the inserted element exists within the reference genome. We apply our approach to 12,419 sequenced isolates of nine prevalent bacterial pathogens, and we identify hundreds of known and novel mobile genetic elements, including many candidate insertion sequences. We find that the mobile element repertoire and insertion rate vary considerably across species, and that many of the identified mobile elements are biased toward certain target sequences, several of them being highly specific. Mobile element insertion hotspots often cluster near genes involved in mechanisms of antibiotic resistance, and such insertions are associated with antibiotic resistance in laboratory experiments and clinical isolates. Finally, we demonstrate that mutagenesis caused by these mobile elements contributes to antibiotic resistance in a genome-wide association study of mobile element insertions in pathogenicEscherichia coli. In summary, by applying ade novoapproach to precisely identify mobile genetic elements and their insertion sites, we thoroughly characterize the mobile element repertoire and insertion spectrum of nine pathogenic bacterial species and find that mobile element insertions play a significant role in the evolution of clinically relevant phenotypes, such as antibiotic resistance.


2018 ◽  
Author(s):  
Seo Jeong Shin ◽  
Seng Chan You ◽  
Yu Rang Park ◽  
Jin Roh ◽  
Jang-Hee Kim ◽  
...  

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.


2020 ◽  
Vol 40 (12) ◽  
Author(s):  
Dafeng Xu ◽  
Yu Wang ◽  
Kailun Zhou ◽  
Jincai Wu ◽  
Zhensheng Zhang ◽  
...  

Abstract Although extracellular vesicles (EVs) in body fluid have been considered to be ideal biomarkers for cancer diagnosis and prognosis, it is still difficult to distinguish EVs derived from tumor tissue and normal tissue. Therefore, the prognostic value of tumor-specific EVs was evaluated through related molecules in pancreatic tumor tissue. NA sequencing data of pancreatic adenocarcinoma (PAAD) were acquired from The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC). EV-related genes in pancreatic cancer were obtained from exoRBase. Protein–protein interaction (PPI) network analysis was used to identify modules related to clinical stage. CIBERSORT was used to assess the abundance of immune and non-immune cells in the tumor microenvironment. A total of 12 PPI modules were identified, and the 3-PPI-MOD was identified based on the randomForest package. The genes of this model are involved in DNA damage and repair and cell membrane-related pathways. The independent external verification cohorts showed that the 3-PPI-MOD can significantly classify patient prognosis. Moreover, compared with the model constructed by pure gene expression, the 3-PPI-MOD showed better prognostic value. The expression of genes in the 3-PPI-MOD had a significant positive correlation with immune cells. Genes related to the hypoxia pathway were significantly enriched in the high-risk tumors predicted by the 3-PPI-MOD. External databases were used to verify the gene expression in the 3-PPI-MOD. The 3-PPI-MOD had satisfactory predictive performance and could be used as a prognostic predictive biomarker for pancreatic cancer.


2014 ◽  
Vol 15 (10) ◽  
Author(s):  
Djie Tjwan Thung ◽  
Joep de Ligt ◽  
Lisenka EM Vissers ◽  
Marloes Steehouwer ◽  
Mark Kroon ◽  
...  

BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 795 ◽  
Author(s):  
Jiantao Wu ◽  
Wan-Ping Lee ◽  
Alistair Ward ◽  
Jerilyn A Walker ◽  
Miriam K Konkel ◽  
...  

2018 ◽  
Vol 72 ◽  
pp. 991-996
Author(s):  
Marzena Anna Lewandowska ◽  
Łukasz Żołna ◽  
Krzysztof Roszkowski ◽  
Janusz Kowalewski

Fifteen years after the publication of the full sequence of the human genome which revolutionized medicine and biotechnology, profound elucidation of the molecular mechanisms of genetic disorders remains a challenge. National and international institutions conduct a number of research projects in genomics. Some of them are focused on the characterization of functional elements of the genome (e.g., the Genome Browser database by the ENCODE consortium), some gather information on polymorphisms (HapMap, The 1000 Genomes Project) and mutations (The Human Gene Mutation Database), while other are specifically dedicated to the genomic characterization of cancer (The Cancer Genome Atlas, The Pediatric Cancer Genome Project). Even though the projects are conducted independently, juxtapositions of the constantly updated project data may be performed, leading to interesting results. The genome-wide association studies (GWAS) allowed the identification of millions of SNPs and short insertions/deletions, as well as thousands of structural variants of polymorphic gene products. Further data-mining studies allowed the distinction between synonymous and nonsynonymous SNPs, which became the basis for the epidemiological studies of various types of genetic disorders. The results of the sequencing of entire genomes and transcriptomes may be useful in the identification of novel prognostic and predictive markers. High-throughput technologies are emerging methods in molecular diagnostics, furthermore the correlation of DNA methylation patterns and gene expression profiles may also provide useful results in cancer diagnostics.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1202
Author(s):  
Arina Zagoskina ◽  
Sergei Firsov ◽  
Irina Lazebnaya ◽  
Oleg Lazebny ◽  
Dmitry V. Mukha

The structural and functional organization of the ribosomal RNA gene cluster and the full-length R2 non-LTR retrotransposon (integrated into a specific site of 28S ribosomal RNA genes) of the German cockroach, Blattella germanica, is described. A partial sequence of the R2 retrotransposon of the cockroach Rhyparobia maderae is also analyzed. The analysis of previously published next-generation sequencing data from the B. germanica genome reveals a new type of retrotransposon closely related to R2 retrotransposons but with a random distribution in the genome. Phylogenetic analysis reveals that these newly described retrotransposons form a separate clade. It is shown that proteins corresponding to the open reading frames of newly described retrotransposons exhibit unequal structural domains. Within these retrotransposons, a recombination event is described. New mechanism of transposition activity is discussed. The essential structural features of R2 retrotransposons are conserved in cockroaches and are typical of previously described R2 retrotransposons. However, the investigation of the number and frequency of 5′-truncated R2 retrotransposon insertion variants in eight B. germanica populations suggests recent mobile element activity. It is shown that the pattern of 5′-truncated R2 retrotransposon copies can be an informative molecular genetic marker for revealing genetic distances between insect populations.


2019 ◽  
Vol 35 (18) ◽  
pp. 3484-3486 ◽  
Author(s):  
Tao Jiang ◽  
Bo Liu ◽  
Junyi Li ◽  
Yadong Wang

Abstract Summary Mobile element insertion (MEI) is a major category of structure variations (SVs). The rapid development of long read sequencing technologies provides the opportunity to detect MEIs sensitively. However, the signals of MEI implied by noisy long reads are highly complex due to the repetitiveness of mobile elements as well as the high sequencing error rates. Herein, we propose the Realignment-based Mobile Element insertion detection Tool for Long read (rMETL). Benchmarking results of simulated and real datasets demonstrate that rMETL enables to handle the complex signals to discover MEIs sensitively. It is suited to produce high-quality MEI callsets in many genomics studies. Availability and implementation rMETL is available from https://github.com/hitbc/rMETL. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document