scholarly journals JUMPER Enables Discontinuous Transcript Assembly in Coronaviruses

Author(s):  
Palash Sashittal ◽  
Chuanyi Zhang ◽  
Jian Peng ◽  
Mohammed El-Kebir

Abstract Genes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes and produces subgenomic RNAs that express different viral genes. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLY problem of finding transcripts T and their abundances c given an alignment R of paired end short reads under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient progressive heuristic that uses mixed integer linear program. We show using simulations that our method, JUMPER, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.

2021 ◽  
Author(s):  
Palash Sashittal ◽  
Chuanyi Zhang ◽  
Jian Peng ◽  
Mohammed El-Kebir

AbstractGenes in SARS-CoV-2 and, more generally, in viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes, rendering current transcript assembly methods unsuitable to Nidovirales sequencing samples. Here, we introduce the Discontinuous Transcript Assembly problem of finding transcripts and their abundances c given an alignment under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient mixed integer linear program. We show using simulations that our method, Jumper, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1 and SARS-CoV-2 samples, we find that Jumper not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Jumper enables detailed analyses of Nidovirales transcriptomes.Code availabilitySoftware is available at https://github.com/elkebir-group/Jumper


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Palash Sashittal ◽  
Chuanyi Zhang ◽  
Jian Peng ◽  
Mohammed El-Kebir

AbstractGenes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.


BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
David J. Wright ◽  
Nicola A. L. Hall ◽  
Naomi Irish ◽  
Angela L. Man ◽  
Will Glynn ◽  
...  

Abstract Background Alternative splicing is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of alternative splicing processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line SH-SY5Y, and to characterise isoform expression and usage across differentiation. Results We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation identifying candidates for future research into state change regulation. Conclusions Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.


2020 ◽  
Author(s):  
V Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

AbstractAlternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterised in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AT content of Plasmodium RNA, but also the limitations of short read sequencing in deciphering complex splicing events. In this study, we utilised the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies (ONT) to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or near full-length transcripts with comparable quantification to Illumina sequencing. By comparing this data with available gene models, we find widespread alternative splicing, particular intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This work highlights a strategy in using long read sequencing to understand splicing events at the whole transcript level, and has implications in future interpretation of RNA-seq studies.


2021 ◽  
Author(s):  
David J Wright ◽  
Nicola Hall ◽  
Naomi Irish ◽  
Angela L Man ◽  
Will Glynn ◽  
...  

ABSTRACTAlternative splicing (AS) is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of AS processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line and to characterise isoform expression and usage across differentiation. We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation, and identify a putative molecular regulator underlying this state change. Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.


mSystems ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
V. Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

ABSTRACT Alternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterized in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AU content of Plasmodium RNA, but also the limitations of short-read sequencing in deciphering complex splicing events. In this study, we utilized the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or nearly full-length transcripts with comparable quantification to Illumina sequencing. By comparing these data with available gene models, we find widespread alternative splicing, particularly intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This study highlights a strategy in using long-read sequencing to understand splicing events at the whole-transcript level and has implications in the future interpretation of transcriptome sequencing studies. IMPORTANCE We have used a novel nanopore sequencing technology to directly analyze parasite transcriptomes. The very long reads of this technology reveal the full-length genes of the parasites that cause malaria and toxoplasmosis. Gene transcripts must be processed in a process called splicing before they can be translated to protein. Our analysis reveals that these parasites very frequently only partially process their gene products, in a manner that departs dramatically from their human hosts.


2018 ◽  
Vol 8 (10) ◽  
pp. 1978 ◽  
Author(s):  
Jaber Valinejad ◽  
Taghi Barforoshi ◽  
Mousa Marzband ◽  
Edris Pouresmaeil ◽  
Radu Godina ◽  
...  

This paper presents the analysis of a novel framework of study and the impact of different market design criterion for the generation expansion planning (GEP) in competitive electricity market incentives, under variable uncertainties in a single year horizon. As investment incentives conventionally consist of firm contracts and capacity payments, in this study, the electricity generation investment problem is considered from a strategic generation company (GENCO) ′ s perspective, modelled as a bi-level optimization method. The first-level includes decision steps related to investment incentives to maximize the total profit in the planning horizon. The second-level includes optimization steps focusing on maximizing social welfare when the electricity market is regulated for the current horizon. In addition, variable uncertainties, on offering and investment, are modelled using set of different scenarios. The bi-level optimization problem is then converted to a single-level problem and then represented as a mixed integer linear program (MILP) after linearization. The efficiency of the proposed framework is assessed on the MAZANDARAN regional electric company (MREC) transmission network, integral to IRAN interconnected power system for both elastic and inelastic demands. Simulations show the significance of optimizing the firm contract and the capacity payment that encourages the generation investment for peak technology and improves long-term stability of electricity markets.


Author(s):  
Aamod Sathe ◽  
Elise Miller-Hooks

The ability to locate military units or equipment, police forces, and first responders optimally and to relocate idle units quickly in response to changing conditions is crucial to a country's ability to guard its critical facilities. Such facilities include vital components of the transportation infrastructure, government and monumental buildings, locations of large gatherings, emergency operations centers, and public and private utilities and communications facilities. In this paper, the problem of making optimal location and relocation decisions for a fixed fleet of response units in a transportation network, where travel conditions are uncertain, is addressed. A mixed integer linear program with multiple objectives (maximize secondary coverage and minimize cost) is presented. Because exact solution of such problems may require considerable computational effort, a metaheuristic based on the principles of genetic algorithms is proposed. The heuristic seeks the set of Pareto-optimal location and relocation decisions for each network state. All facilities of concern must be covered by at least one response unit. If the state of the network changes so that coverage is lost (e.g., travel times increase or a response unit is no longer available), one or more of the response units must be relocated. These relocation decisions are also addressed.


Sign in / Sign up

Export Citation Format

Share Document