Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein‐Coding Regions

Long read, single molecule sequencing technologies are now routinely used for whole-genome sequencing and assembly. However, even after multiple rounds of correction, many errors remain which can critically affect protein coding regions, resulting in significantly altered and often truncated protein predictions.

Download Full-text

Progress in plant genome sequencing: research directions

Vavilov Journal of Genetics and Breeding ◽

10.18699/vj19.459 ◽

2019 ◽

Vol 23 (1) ◽

pp. 38-48 ◽

Cited By ~ 1

Author(s):

M. K. Bragina ◽

D. A. Afonnikov ◽

E. A. Salina

Keyword(s):

Genome Sequencing ◽

Plant Traits ◽

Plant Genome ◽

Targeted Sequencing ◽

Genome Sequences ◽

Crop Species ◽

High Coverage ◽

Protein Coding ◽

Sequencing Technologies ◽

A Genome

Since the first plant genome of Arabidopsis thaliana has been sequenced and published, genome sequencing technologies have undergone significant changes. New algorithms, sequencing technologies and bioinformatic approaches were adopted to obtain genome, transcriptome and exome sequences for model and crop species, which have permitted deep inferences into plant biology. As a result of an improved genome assembly and analysis methods, genome sequencing costs plummeted and the number of high-quality plant genome sequences is constantly growing. Consequently, more than 300 plant genome sequences have been published over the past twenty years. Although many of the published genomes are considered incomplete, they proved to be a valuable tool for identifying genes involved in the formation of economically valuable plant traits, for marker-assisted and genomic selection and for comparative analysis of plant genomes in order to determine the basic patterns of origin of various plant species. Since a high coverage and resolution of a genome sequence is not enough to detect all changes in complex samples, targeted sequencing, which consists in the isolation and sequencing of a specific region of the genome, has begun to develop. Targeted sequencing has a higher detection power (the ability to identify new differences/variants) and resolution (up to one basis). In addition, exome sequencing (the method of sequencing only protein-coding genes regions) is actively developed, which allows for the sequencing of non-expressed alleles and genes that cannot be found with RNA-seq. In this review, an analysis of sequencing technologies development and the construction of “reference” genomes of plants is performed. A comparison of the methods of targeted sequencing based on the use of the reference DNA sequence is accomplished.

Download Full-text

Exome sequencing in genetic disease: recent advances and considerations

F1000Research ◽

10.12688/f1000research.19444.1 ◽

2020 ◽

Vol 9 ◽

pp. 336

Author(s):

Jay P. Ross ◽

Patrick A. Dion ◽

Guy A. Rouleau

Keyword(s):

Exome Sequencing ◽

Medical Science ◽

Data Generation ◽

Protein Coding ◽

Disease Etiology ◽

Depth Of Knowledge ◽

Coding Regions ◽

Disease Research ◽

Sequencing Technologies ◽

Recent Advances

Over the past decade, exome sequencing (ES) has allowed significant advancements to the field of disease research. By targeting the protein-coding regions of the genome, ES combines the depth of knowledge on protein-altering variants with high-throughput data generation and ease of analysis. New discoveries continue to be made using ES, and medical science has benefitted both theoretically and clinically from its continued use. In this review, we describe recent advances and successes of ES in disease research. Through selected examples of recent publications, we explore how ES continues to be a valuable tool to find variants that might explain disease etiology or provide insight into the biology underlying the disease. We then discuss shortcomings of ES in terms of variant discoveries made by other sequencing technologies that would be missed because of the scope and techniques of ES. We conclude with a brief outlook on the future of ES, suggesting that although newer and more thorough sequencing methods will soon supplant ES, its results will continue to be useful for disease research.

Download Full-text

RNAsamba: coding potential assessment using ORF and whole transcript sequence information

10.1101/620880 ◽

2019 ◽

Author(s):

Antonio P. Camargo ◽

Vsevolod Sourkov ◽

Marcelo F. Carazzolle

Keyword(s):

High Throughput Sequencing ◽

Model Organisms ◽

Sequence Information ◽

Protein Coding ◽

Rna Molecules ◽

Coding Regions ◽

Sequencing Technologies ◽

Partial Length ◽

Non Coding Rnas ◽

Coding Potential

AbstractMotivationThe advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveil the biological roles of genomic elements, being one of the main tasks the identification of protein-coding and long non-coding RNAs.ResultsWe describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a deep-learning model that processes both the whole sequence and the ORF to look for patterns that distinguish coding and non-coding RNAs. We evaluated the model in the classification of coding and non-coding transcripts of humans and five other model organisms and show that RNAsamba mostly outperforms other state-of-the-art methods. We also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its model is not dependent on the presence of complete coding regions. RNAsamba is a fast and easy tool that can provide valuable contributions to genome annotation pipelines.Availability and implementationThe source code of RNAsamba is freely available at:https://github.com/apcamargo/RNAsamba.

Download Full-text

Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric

Current Topics in Medicinal Chemistry ◽

10.2174/1568026613666131204110022 ◽

2014 ◽

Vol 14 (3) ◽

pp. 407-417

Author(s):

Robersy Sanchez

Keyword(s):

Genetic Code ◽

Evolutionary Analysis ◽

Protein Coding ◽

Coding Regions

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Investigation of long non-coding RNAs as regulatory players of grapevine response to powdery and downy mildew infection

BMC Plant Biology ◽

10.1186/s12870-021-03059-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Garima Bhatia ◽

Santosh K. Upadhyay ◽

Anuradha Upadhyay ◽

Kashmir Singh

Keyword(s):

Downy Mildew ◽

Plasmopara Viticola ◽

Defense Responses ◽

Protein Coding ◽

Functional Roles ◽

Real Time Quantitative Pcr ◽

Transcriptional Reprogramming ◽

Sequencing Technologies ◽

Non Coding Rnas ◽

Fungal Phytopathogens

Abstract Background Long non-coding RNAs (lncRNAs) are regulatory transcripts of length > 200 nt. Owing to the rapidly progressing RNA-sequencing technologies, lncRNAs are emerging as considerable nodes in the plant antifungal defense networks. Therefore, we investigated their role in Vitis vinifera (grapevine) in response to obligate biotrophic fungal phytopathogens, Erysiphe necator (powdery mildew, PM) and Plasmopara viticola (downy mildew, DM), which impose huge agro-economic burden on grape-growers worldwide. Results Using computational approach based on RNA-seq data, 71 PM- and 83 DM-responsive V. vinifera lncRNAs were identified and comprehensively examined for their putative functional roles in plant defense response. V. vinifera protein coding sequences (CDS) were also profiled based on expression levels, and 1037 PM-responsive and 670 DM-responsive CDS were identified. Next, co-expression analysis-based functional annotation revealed their association with gene ontology (GO) terms for ‘response to stress’, ‘response to biotic stimulus’, ‘immune system process’, etc. Further investigation based on analysis of domains, enzyme classification, pathways enrichment, transcription factors (TFs), interactions with microRNAs (miRNAs), and real-time quantitative PCR of lncRNAs and co-expressing CDS pairs suggested their involvement in modulation of basal and specific defense responses such as: Ca2+-dependent signaling, cell wall reinforcement, reactive oxygen species metabolism, pathogenesis related proteins accumulation, phytohormonal signal transduction, and secondary metabolism. Conclusions Overall, the identified lncRNAs provide insights into the underlying intricacy of grapevine transcriptional reprogramming/post-transcriptional regulation to delay or seize the living cell-dependent pathogen growth. Therefore, in addition to defense-responsive genes such as TFs, the identified lncRNAs can be further examined and leveraged to candidates for biotechnological improvement/breeding to enhance fungal stress resistance in this susceptible fruit crop of economic and nutritional importance.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

An Optimized Method for the Preparation of Monascus purpureus DNA for Genome Sequencing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.563.379 ◽

2014 ◽

Vol 563 ◽

pp. 379-383 ◽

Cited By ~ 1

Author(s):

Yue Yang ◽

Xin Jun Du ◽

Ping Li ◽

Bin Liang ◽

Shuo Wang

Keyword(s):

Genome Sequencing ◽

Genomic Dna ◽

Benzyl Chloride ◽

Monascus Purpureus ◽

Sequencing Technologies ◽

Fungal Evolution ◽

Ctab Method ◽

Fungal Dna ◽

Gene Functional Analysis ◽

Generation Sequencing

More and more attention has been paid to filamentous fungal evolution, metabolic pathway and gene functional analysis via genome sequencing. However, the published methods for the extraction of fungal genomic DNA were usually costly or inefficient. In the present study, we compared five different DNA extraction protocols: CTAB protocol with some modifications, benzyl chloride protocol with some modifications, snailase protocol, SDS protocol and extraction with the E.Z.N.A. Fungal DNA Maxi Kit (Omega Bio-Tek, USA). The CTAB method which we established with some modification in several steps is not only economical and convenient, but also can be reliably used to obtain large amounts of highly pure genomic DNA fromMonascus purpureusfor sequencing with next-generation sequencing technologies (Illumina and 454) successfully.

Download Full-text

Novel exon 1 protein‐coding regions N‐terminally extend human KCNE3 and KCNE4

The FASEB Journal ◽

10.1096/fj.201600467r ◽

2016 ◽

Vol 30 (8) ◽

pp. 2959-2969 ◽

Cited By ~ 8

Author(s):

Geoffrey W. Abbott

Keyword(s):

Protein Coding ◽

Coding Regions ◽

Exon 1 ◽

Novel Exon

Download Full-text