scholarly journals Discovering Novel Genes in Non-Model Fly Accessory Glands Using De Novo Nanopore Transcriptomics

Author(s):  
Mrinalini Mrinalini ◽  
Nalini Puniamoorthy

Abstract BackgroundOxford Nanopore Technologies (ONT) long-read transcriptomes offer many advantages including long reads (>10kbp), end-to-end transcripts, structural variants, isoform-level resolution of genes and expression. However, uptake of ONT transcriptomics is still low, largely due to high error rates (2 to 13%) and reliance on reference databases that are unavailable for many non-model species. Additionally, bioinformatics tools and pipelines for de novo ONT transcriptomics are still in early stages of development. ResultsHere, we use de novo ONT GridION transcriptomics to discover novel genes from the male accessory glands (AG) of a widespread, non-model dung fly, Sepsis punctum. Insect AGs are of particular interest for this as they are hotspots for rapid evolution of novel reproductive genes, and they synthesize seminal fluid proteins that lack homology to any other known proteins. We implement a completely de novo ONT GridION transcriptome pipeline, incorporating quality-filtering and rigorous error-correction procedures, to characterize this novel gene set and to quantify their expression. Specifically, we compare these ONT genes and their expression against de novo lllumina HiSeq transcriptome data. We find 40 high-quality and high-confidence ONT genes that cross-verify against Illumina genes; twenty-six of which are novel and specific to S. punctum. Read count based expression quantification in ONT samples is highly congruent with Illumina’s Transcript per Million (TPM), both in overall pattern and within functional categories. Novel genes account for an average of 81% of total gene expression underscoring their functional importance in S. punctum AGs. Eighty percentage of these genes are secretory in nature, responsible for 74% total gene expression. Notably, median sequence similarities of ONT nucleotide and protein sequences match within-Illumina sequence similarities indicating that our de novo ONT transcriptome pipeline successfully mitigated sequencing errors. ConclusionsThis is the first study to adapt ONT transcriptomics for completely de novo characterization of novel genes in animals. Our study demonstrates that ONT long-reads, constituting a quarter of the number of bases sequenced at less than a third the cost of Illumina reads, can be a resource-friendly and cost-effective solution for end-to-end sequencing of unknown genes even in the absence of a reference database.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ricardo Pérez-Sánchez ◽  
Ángel Carnero-Morán ◽  
Beatriz Soriano ◽  
Carlos Llorens ◽  
Ana Oleaga

Abstract Background The argasid tick Ornithodoros erraticus is the main vector of tick-borne human relapsing fever (TBRF) and African swine fever (ASF) in the Mediterranean Basin. Tick salivary proteins secreted to the host at the feeding interface play critical roles for tick feeding and may contribute to host infection by tick-borne pathogens; accordingly, these proteins represent interesting antigen targets for the development of vaccines aimed at the control and prevention of tick infestations and tick-borne diseases. Methods To identify these proteins, the transcriptome of the salivary glands of O. erraticus was de novo assembled and the salivary gene expression dynamics assessed throughout the trophogonic cycle using Illumina sequencing. The genes differentially upregulated after feeding were selected and discussed as potential antigen candidates for tick vaccines. Results Transcriptome assembly resulted in 22,007 transcripts and 18,961 annotated transcripts, which represent 86.15% of annotation success. Most salivary gene expression took place during the first 7 days after feeding (2088 upregulated transcripts), while only a few genes (122 upregulated transcripts) were differentially expressed from day 7 post-feeding onwards. The protein families more abundantly overrepresented after feeding were lipocalins, acid and basic tail proteins, proteases (particularly metalloproteases), protease inhibitors, secreted phospholipases A2, 5′-nucleotidases/apyrases and heme-binding vitellogenin-like proteins. All of them are functionally related to blood ingestion and regulation of host defensive responses, so they can be interesting candidate protective antigens for vaccines. Conclusions The O. erraticus sialotranscriptome contains thousands of protein coding sequences—many of them belonging to large conserved multigene protein families—and shows a complexity and functional redundancy similar to those observed in the sialomes of other argasid and ixodid tick species. This high functional redundancy emphasises the need for developing multiantigenic tick vaccines to reach full protection. This research provides a set of promising candidate antigens for the development of vaccines for the control of O. erraticus infestations and prevention of tick-borne diseases of public and veterinary health relevance, such as TBRF and ASF. Additionally, this transcriptome constitutes a valuable reference database for proteomics studies of the saliva and salivary glands of O. erraticus.


2018 ◽  
Author(s):  
Kristoffer Sahlin ◽  
Paul Medvedev

AbstractLong-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.


2021 ◽  
Vol 15 (2) ◽  
pp. e0009105
Author(s):  
Ana Oleaga ◽  
Beatriz Soriano ◽  
Carlos Llorens ◽  
Ricardo Pérez-Sánchez

The argasid tick Ornithodoros moubata is the main vector of human relapsing fever (HRF) and African swine fever (ASF) in Africa. Salivary proteins are part of the host-tick interface and play vital roles in the tick feeding process and the host infection by tick-borne pathogens; they represent interesting targets for immune interventions aimed at tick control. The present work describes the transcriptome profile of salivary glands of O. moubata and assesses the gene expression dynamics along the trophogonic cycle using Illumina sequencing. De novo transcriptome assembling resulted in 71,194 transcript clusters and 41,011 annotated transcripts, which represent 57.6% of the annotation success. Most salivary gene expression takes place during the first 7 days after feeding (6,287 upregulated transcripts), while a minority of genes (203 upregulated transcripts) are differentially expressed between 7 and 14 days after feeding. The functional protein groups more abundantly overrepresented after blood feeding were lipocalins, proteases (especially metalloproteases), protease inhibitors including the Kunitz/BPTI-family, proteins with phospholipase A2 activity, acid tail proteins, basic tail proteins, vitellogenins, the 7DB family and proteins involved in tick immunity and defence. The complexity and functional redundancy observed in the sialotranscriptome of O. moubata are comparable to those of the sialomes of other argasid and ixodid ticks. This transcriptome provides a valuable reference database for ongoing proteomics studies of the salivary glands and saliva of O. moubata aimed at confirming and expanding previous data on the O. moubata sialoproteome.


2019 ◽  
Author(s):  
Angel Ruiz-Reche ◽  
Joel A. Indi ◽  
Ivan de la Rubia ◽  
Eduardo Eyras

Long-read sequencing technologies allow the systematic interrogation of transcriptomes from any species. However, functional characterization requires the determination of the correct 5’-to-3’ orientation of reads. Oxford Nanopore Technologies (ONT) allows the direct measurement of RNA molecules in the native orientation (Garalde et al. 2018), but sequencing of complementary-DNA (cDNA) libraries yields generally a larger number of reads (Workman et al. 2018). Although strand-specific adapters can be used, error rates hinder their detection. Current methods rely on the comparison to a genome or transcriptome reference (Wyman and Mortazavi 2018; Workman et al. 2018) or on the use of additional technologies (Fu et al. 2018), which limits the applicability of rapid and cost-effective long-read sequencing for transcriptomics beyond model species. To facilitate the interrogation of transcriptomes de-novo in species or samples for which a genome or transcriptome reference is not available, we have developed ReorientExpress (https://github.com/comprna/reorientexpress), a new tool to perform reference-free orientation of ONT reads from a cDNA library, with our without stranded adapters. ReorientExpress uses a deep neural network (DNN) to predict the orientation of cDNA long-reads independently of adapters and without using a reference.


2020 ◽  
Author(s):  
Sina Baharlouei ◽  
Meisam Razaviyayn ◽  
Elizabeth Tseng ◽  
David Tse

Long-read sequencing technologies demonstrate high potential for de novo discovery of complex transcript isoforms, but high error rates pose a significant challenge. Existing error correction methods rely on clustering reads based on isoform-level alignment and cannot be efficiently scaled. We propose a new method, I-CONVEX, that performs fast, alignment-free isoform clustering with almost linear computational complexity, and leads to better consensus accuracy on simulated, synthetic, and real datasets.


2015 ◽  
Vol 18 (4) ◽  
pp. 134 ◽  
Author(s):  
Asad A Shah

<p><strong>Background:  </strong>Bicuspid aortic valves predispose to ascending aortic aneurysms, but the mechanisms underlying this aortopathy remain incompletely characterized.  We sought to identify epigenetic pathways predisposing to aneurysm formation in bicuspid patients.</p><p><strong>Methods:  </strong>Ascending aortic aneurysm tissue samples were collected at the time of aortic replacement in subjects with bicuspid and trileaflet aortic valves.  Genome-wide DNA methylation status was determined on DNA from tissue using the Illumina 450K methylation chip, and gene expression was profiled on the same samples using Illumina Whole-Genome DASL arrays.  Gene methylation and expression were compared between bicuspid and trileaflet individuals using an unadjusted Wilcoxon rank sum test.  </p><p><strong>Results:  </strong>Twenty-seven probes in 9 genes showed significant differential methylation and expression (P&lt;5.5x10<sup>-4</sup>).  The top gene was protein tyrosine phosphatase, non-receptor type 22 (<em>PTPN22</em>), which was hypermethylated (delta beta range: +15.4 to +16.0%) and underexpressed (log 2 gene expression intensity: bicuspid 5.1 vs. trileaflet 7.9, P=2x10<sup>-5</sup>) in bicuspid patients, as compared to tricuspid patients.  Numerous genes involved in cardiovascular development were also differentially methylated, but not differentially expressed, including <em>ACTA2</em> (4 probes, delta beta range:  -10.0 to -22.9%), which when mutated causes the syndrome of familial thoracic aortic aneurysms and dissections</p><p><strong>Conclusions:  </strong>Using an integrated, unbiased genomic approach, we have identified novel genes associated with ascending aortic aneurysms in patients with bicuspid aortic valves, modulated through epigenetic mechanisms.  The top gene was <em>PTPN22</em>, which is involved in T-cell receptor signaling and associated with various immune disorders.  These differences highlight novel potential mechanisms of aneurysm development in the bicuspid population.</p>


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Cells ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 324
Author(s):  
Matthias Deutsch ◽  
Anne Günther ◽  
Rodrigo Lerchundi ◽  
Christine R. Rose ◽  
Sabine Balfanz ◽  
...  

Uncovering the physiological role of individual proteins that are part of the intricate process of cellular signaling is often a complex and challenging task. A straightforward strategy of studying a protein’s function is by manipulating the expression rate of its gene. In recent years, the Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)/Cas9-based technology was established as a powerful gene-editing tool for generating sequence specific changes in proliferating cells. However, obtaining homogeneous populations of transgenic post-mitotic neurons by CRISPR/Cas9 turned out to be challenging. These constraints can be partially overcome by CRISPR interference (CRISPRi), which mediates the inhibition of gene expression by competing with the transcription machinery for promoter binding and, thus, transcription initiation. Notably, CRISPR/Cas is only one of several described approaches for the manipulation of gene expression. Here, we targeted neurons with recombinant Adeno-associated viruses to induce either CRISPRi or RNA interference (RNAi), a well-established method for impairing de novo protein biosynthesis by using cellular regulatory mechanisms that induce the degradation of pre-existing mRNA. We specifically targeted hyperpolarization-activated and cyclic nucleotide-gated (HCN) channels, which are widely expressed in neuronal tissues and play essential physiological roles in maintaining biophysical characteristics in neurons. Both of the strategies reduced the expression levels of three HCN isoforms (HCN1, 2, and 4) with high specificity. Furthermore, detailed analysis revealed that the knock-down of just a single HCN isoform (HCN4) in hippocampal neurons did not affect basic electrical parameters of transduced neurons, whereas substantial changes emerged in HCN-current specific properties.


Animals ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 2273
Author(s):  
Menelaos Kavouras ◽  
Emmanouil E. Malandrakis ◽  
Ewout Blom ◽  
Kyriaki Tsilika ◽  
Theodoros Danis ◽  
...  

In farmed flatfish, such as common sole, color disturbances are common. Dyschromia is a general term that includes the color defects on the blind and ocular sides of the fish. The purpose was to examine the difference in gene expression between normal pigmented and juveniles who present ambicoloration. The analysis was carried out with next-generation sequencing techniques and de novo assembly of the transcriptome. Transcripts that showed significant differences (FDR < 0.05) in the expression between the two groups, were related to those of zebrafish (Danio rerio), functionally identified, and classified into categories of the gene ontology. The results revealed that ambicolorated juveniles exhibit a divergent function, mainly of the central nervous system at the synaptic level, as well as the ionic channels. The close association of chromophore cells with the growth of nerve cells and the nervous system was recorded. The pathway, glutamate binding–activation of AMPA and NMDA receptors–long-term stimulation of postsynaptic potential–LTP (long term potentiation)–plasticity of synapses, appears to be affected. In addition, the development of synapses also seems to be affected by the interaction of the LGI (leucine-rich glioma inactivated) protein family with the ADAM (a disintegrin and metalloprotease) ones.


Sign in / Sign up

Export Citation Format

Share Document