A Synthetic Defective Interfering SARS-CoV-2

Human plasma contains > 40,000 different coding and non-coding RNAs that are potential biomarkers for human diseases. Here, we used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) combined with peak calling to simultaneously profile all RNA biotypes in apheresis-prepared human plasma pooled from healthy individuals. Extending previous TGIRT-seq analysis, we found that human plasma contains largely fragmented mRNAs from > 19,000 protein-coding genes, abundant full-length, mature tRNAs and other structured small non-coding RNAs, and less abundant tRNA fragments and mature and pre-miRNAs. Many of the mRNA fragments identified by peak calling correspond to annotated protein-binding sites and/or have stable predicted secondary structures that could afford protection from plasma nucleases. Peak calling also identified novel repeat RNAs, miRNA-sized RNAs, and putatively structured intron RNAs of potential biological, evolutionary, and biomarker significance, including a family of full-length excised intron RNAs, subsets of which correspond to mirtron pre-miRNAs or agotrons.

Download Full-text

On spatial molecular arrangements of SARS-CoV2 genomes of Indian patients

10.1101/2020.05.01.071985 ◽

2020 ◽

Cited By ~ 2

Author(s):

Sk. Sarif Hassan ◽

Atanu Moitra ◽

Ranjeet Kumar Rout ◽

Pabitra Pal Choudhury ◽

Prasanta Pramanik ◽

...

Keyword(s):

Virulence Factors ◽

Phylogenetic Relationships ◽

Viral Genome ◽

The Other ◽

Fast Process ◽

Protein Coding ◽

Individual Gene ◽

Protein Coding Genes ◽

Indian Origin ◽

Different Origins

AbstractA pandemic caused by the SARS-CoV2 is being experienced by the whole world since December, 2019. A thorough understanding beyond just sequential similarities among the protein coding genes of SARS-CoV2 is important in order to differentiate or relate to the other known CoVs of the same genus. In this study, we compare three genomes namely MT012098 (India-Kerala), MT050493 (India-Kerala), MT358637 (India-Gujrat) from India with NC_045512 (China-Wuhan) to view the spatial as well as molecular arrangements of nucleotide bases of all the genes embedded in these four genomes. Based on different features extracted for each gene embedded in these genomes, corresponding phylogenetic relationships have been built up. Differences in phylogenetic tree arrangement with individual gene suggest that three genomes of Indian origin have come from three different origins or the evolution of viral genome is very fast process. This study would also help to understand the virulence factors, disease pathogenicity, origin and transmission of the SARS-CoV2.

Download Full-text

4. Proteins

10.1093/actrade/9780198723882.003.0004 ◽

2016 ◽

Author(s):

Aysha Divan ◽

Janice A. Royds

Keyword(s):

Alternative Splicing ◽

Human Genome ◽

The Body ◽

Biological Functions ◽

Protein Coding ◽

Post Translational Modifications ◽

Protein Coding Genes ◽

Composition And Structure ◽

A Cell ◽

Structure Of Proteins

Biological functions require protein and the protein makeup of a cell determines its behaviour and identity. Proteins, therefore, are the most abundant molecules in the body except for water. The approximately 20,000 protein coding genes in the human genome can, by alternative splicing, multiple translation starts, and post-translational modifications, produce over 1,000,000 different proteins, collectively called ‘the proteome’. It is the size of the proteome and not the genome that defines the complexity of an organism. ‘Proteins’ describes the composition and structure of proteins and how they are studied. What information is required in order to understand how proteins work and what happens when this function is impaired in disease?

Download Full-text

Failed detection of the full-length genome of SARS-CoV-2 by ultra-deep sequencing from the recovered and discharged patients retested viral PCR positive

10.1101/2020.03.27.20043299 ◽

2020 ◽

Cited By ~ 4

Author(s):

Fengyu Hu ◽

Fengjuan Chen ◽

Yaping Wang ◽

Teng Xu ◽

Xiaoping Tang ◽

...

Keyword(s):

Public Health ◽

Deep Sequencing ◽

Viral Genome ◽

Full Length ◽

Health Concern ◽

Public Health Concern ◽

Low Concentration ◽

Discharged Patients ◽

Full Length Genome

AbstractOver 10 percent of recovered and discharged patients retested positive for SARS-CoV-2, raising a public health concern whether they could be potential origins of infection. In this study, we found that detectable viral genome in discharged patients might only mean the presence of viral fragments, and could hardly form an infection origin for its extremely low concentration.

Download Full-text

Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

Scientific Reports ◽

10.1038/srep18019 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 8

Author(s):

Meili Chen ◽

Yibo Hu ◽

Jingxing Liu ◽

Qi Wu ◽

Chenglin Zhang ◽

...

Keyword(s):

Genome Assembly ◽

Giant Panda ◽

Full Length ◽

Rna Seq ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

De novo gene evolution: How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins ◽

De Novo Gene

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text

Mitochondrial genome sequences of representatives of three families of scorpionflies (Order Mecoptera) and evolution in a major duplication of coding sequence

Genome ◽

10.1139/g11-006 ◽

2011 ◽

Vol 54 (5) ◽

pp. 368-376 ◽

Cited By ~ 17

Author(s):

Andrew T Beckenbach

Keyword(s):

Dna Sequences ◽

Complete Sequence ◽

Trna Genes ◽

Coding Region ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequence ◽

Protein Coding Genes ◽

Standard Set ◽

Considerable Period

The complete mitochondrial DNA sequences of a hangingfly, Bittacus pilicornis (Mecoptera: Bittacidae), a snow scorpion fly, Boreus elegans (Mecoptera: Boreidae), and a nearly complete sequence from another scorpionfly species, Microchorista philpotti (Mecoptera: Nannochoristidae) were determined. The coding sequence of all three genomes includes the 37 genes normally found in insect mtDNAs, in the same gene order as first described in Drosophila. In addition to the standard set of genes, the Microchorista sequence includes a large duplication of the coding region. The duplication is at least 4 kb (and may be much larger) and includes the remnants of three protein-coding genes and seven tRNA genes. The duplication evidently arose as a single event, and the duplicated region can be aligned in its entirety with the corresponding region of the functional genome. Although most of the genes contain defects that render them nonfunctional, analysis shows that the protein-coding genes in the duplicated region evolved for a considerable period under constraints expected of functional protein-coding genes. It is evident, therefore, that for a period two copies of some of the mitochondrial genes were functional in this species, including genes coding for proteins.

Download Full-text

Identification of protein-protected mRNA fragments and structured excised intron RNAs in human plasma by TGIRT-seq peak calling

10.1101/2020.06.25.171439 ◽

2020 ◽

Author(s):

Jun Yao ◽

Douglas C. Wu ◽

Ryan M. Nottingham ◽

Alan M. Lambowitz

Keyword(s):

Human Plasma ◽

Binding Sites ◽

Full Length ◽

Peak Calling ◽

Healthy Individuals ◽

Protein Coding ◽

Protein Binding Sites ◽

Protein Coding Genes ◽

Non Coding Rnas ◽

Potential Biomarkers

SummaryHuman plasma contains >40,000 different coding and non-coding RNAs that are potential biomarkers for human diseases. Here, we used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) combined with peak calling to simultaneously profile all RNA biotypes in apheresis-prepared human plasma pooled from healthy individuals. Extending previous TGIRT-seq analysis, we found that human plasma contains largely fragmented mRNAs from >19,000 protein-coding genes, abundant full-length, mature tRNAs and other structured small non-coding RNAs, and less abundant tRNA fragments and mature and pre-miRNAs. Many of the mRNA fragments identified by peak calling correspond to annotated protein-binding sites and/or have stable predicted secondary structures that could afford protection from plasma nucleases. Peak calling also identified novel repeat RNAs, miRNA-sized RNAs, and putatively structured intron RNAs of potential biological, evolutionary, and biomarker significance, including a family of full-length excised introns RNAs, subsets of which correspond to mirtron pre-miRNAs or agotrons.

Download Full-text

How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031v1 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Purifying Selection ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text

De novo gene evolution: How do we transition from non-coding to coding?

10.7287/peerj.preprints.3031v2 ◽

2017 ◽

Author(s):

Jorge Ruiz-Orera ◽

José Luis Villanueva-Cañas ◽

William Blevins ◽

M.Mar Albà

Keyword(s):

De Novo ◽

Gene Evolution ◽

Neutral Evolution ◽

Functional Protein ◽

Protein Coding ◽

Coding Sequences ◽

Sequence Composition ◽

Protein Coding Genes ◽

Small Proteins ◽

De Novo Gene

Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA. Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.

Download Full-text