scholarly journals Full-length transcriptome analysis and identification of transcript structures in Eimeria necatrix from different developmental stages by single-molecule real-time sequencing

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Yang Gao ◽  
Zeyang Suding ◽  
Lele Wang ◽  
Dandan Liu ◽  
Shijie Su ◽  
...  

Abstract Background Eimeria necatrix is one of the most pathogenic parasites, causing high mortality in chickens. Although its genome sequence has been published, the sequences and complete structures of its mRNA transcripts remain unclear, limiting exploration of novel biomarkers, drug targets and genetic functions in E. necatrix. Methods Second-generation merozoites (MZ-2) of E. necatrix were collected using Percoll density gradients, and high-quality RNA was extracted from them. Single-molecule real-time (SMRT) sequencing and Illumina sequencing were combined to generate the transcripts of MZ-2. Combined with the SMRT sequencing data of sporozoites (SZ) collected in our previous study, the transcriptome and transcript structures of E. necatrix were studied. Results SMRT sequencing yielded 21,923 consensus isoforms in MZ-2. A total of 17,151 novel isoforms of known genes and 3918 isoforms of novel genes were successfully identified. We also identified 2752 (SZ) and 3255 (MZ-2) alternative splicing (AS) events, 1705 (SZ) and 1874 (MZ-2) genes with alternative polyadenylation (APA) sites, 4019 (SZ) and 2588 (MZ-2) fusion transcripts, 159 (SZ) and 84 (MZ-2) putative transcription factors (TFs) and 3581 (SZ) and 2039 (MZ-2) long non-coding RNAs (lncRNAs). To validate fusion transcripts, reverse transcription-PCR was performed on 16 candidates, with an accuracy reaching up to 87.5%. Sanger sequencing of the PCR products further confirmed the authenticity of chimeric transcripts. Comparative analysis of transcript structures revealed a total of 3710 consensus isoforms, 815 AS events, 1139 genes with APA sites, 20 putative TFs and 352 lncRNAs in both SZ and MZ-2. Conclusions We obtained many long-read isoforms in E. necatrix SZ and MZ-2, from which a series of lncRNAs, AS events, APA events and fusion transcripts were identified. Information on TFs will improve understanding of transcriptional regulation, and fusion event data will greatly improve draft versions of gene models in E. necatrix. This information offers insights into the mechanisms governing the development of E. necatrix and will aid in the development of novel strategies for coccidiosis control. Graphical Abstract

2020 ◽  
Vol 10 (10) ◽  
pp. 3505-3514
Author(s):  
Hongmei Zhuang ◽  
Qiang Wang ◽  
Hongwei Han ◽  
Huifang Liu ◽  
Hao Wang

To generate the full-length transcriptome of Xinjiang green and purple turnips, Brassica rapa var. Rapa, using single-molecule real-time (SMRT) sequencing. The samples of two varieties of Brassica rapa var. Rapa at five developmental stages were collected and combined to perform SMRT sequencing. Meanwhile, next generation sequencing was performed to correct SMRT sequencing data. A series of analyses were performed to investigate the transcript structure. Finally, the obtained transcripts were mapped to the genome of Brassica rapa ssp. pekinesis Chiifu to identify potential novel transcripts. For green turnip (F01), a total of 19.54 Gb clean data were obtained from 8 cells. The number of reads of insert (ROI) and full-length non-chimeric (FLNC) reads were 510,137 and 267,666. In addition, 82,640 consensus isoforms were obtained in the isoform sequences clustering, of which 69,480 were high-quality, and 13,160 low-quality sequences were corrected using Illumina RNA seq data. For purple turnip (F02), there were 20.41 Gb clean data, 552,829 ROIs, and 274,915 FLNC sequences. A total of 93,775 consensus isoforms were obtained, of which 78,798 were high-quality, and the 14,977 low-quality sequences were corrected. Following the removal of redundant sequences, there were 46,516 and 49,429 non-redundant transcripts for F01 and F02, respectively; 7,774 and 9,385 alternative splicing events were predicted for F01 and F02; 63,890 simple sequence repeats, 59,460 complete coding sequences, and 535 long-non coding RNAs were predicted. Moreover, 5,194 and 5,369 novel transcripts were identified by mapping to Brassica rapa ssp. pekinesis Chiifu. The obtained transcriptome data may improve turnip genome annotation and facilitate further study of the Brassica rapa var. Rapa genome and transcriptome.


Genes ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 1333
Author(s):  
Mariana R. Botton ◽  
Yao Yang ◽  
Erick R. Scott ◽  
Robert J. Desnick ◽  
Stuart A. Scott

The SLC6A4 gene has been implicated in psychiatric disorder susceptibility and antidepressant response variability. The SLC6A4 promoter is defined by a variable number of homologous 20–24 bp repeats (5-HTTLPR), and long (L) and short (S) alleles are associated with higher and lower expression, respectively. However, this insertion/deletion variant is most informative when considered as a haplotype with the rs25531 and rs25532 variants. Therefore, we developed a long-read single molecule real-time (SMRT) sequencing method to interrogate the SLC6A4 promoter region. A total of 120 samples were subjected to SLC6A4 long-read SMRT sequencing, primarily selected based on available short-read sequencing data. Short-read genome sequencing from the 1000 Genomes (1KG) Project (~5X) and the Genetic Testing Reference Material Coordination Program (~45X), as well as high-depth short-read capture-based sequencing (~330X), could not identify the 5-HTTLPR short (S) allele, nor could short-read sequencing phase any identified variants. In contrast, long-read SMRT sequencing unambiguously identified the 5-HTTLPR short (S) allele (frequency of 0.467) and phased SLC6A4 promoter haplotypes. Additionally, discordant rs25531 genotypes were reviewed and determined to be short-read errors. Taken together, long-read SMRT sequencing is an innovative and robust method for phased resolution of the SLC6A4 promoter, which could enable more accurate pharmacogenetic testing for both research and clinical applications.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 16-17
Author(s):  
Claudia Haferlach ◽  
Wencke Walter ◽  
Manja Meggendorfer ◽  
Constance Baer ◽  
Anna Stengel ◽  
...  

Background: Genomic alterations are a hallmark of hematological malignancies and comprise small nucleotide variants, copy number alterations and structural variants (SV). SV lead to the co-localization of remote genomic material resulting in 2 different scenarios: 1. breakpoints are located within 2 genes leading to a chimeric fusion gene and a fusion transcript, 2. breakpoints are located outside of genes, frequently placing one nearby gene under the influence of the regulatory sequences of the partner, leading to a deregulated - usually increased - transcription. Aim: The frequency of fusion transcripts was determined across hematological entities in order to 1) identify recurrent partner genes across entities, 2) evaluate the specificity of fusion transcripts and genes involved in fusions for distinct entities. Cohort and Methods: Whole transcriptome sequencing (WTS) was performed in 3,549 patients in 25 different hematological entities (table). 101 bp paired-end reads were produced on a NovaSeq 6000 system (Illumina, San Diego, CA) with a yield between 35 and 125 million paired reads per sample. Potential fusions were called using 3 different callers (Arriba, STAR-Fusion, Manta), only fusions called by at least 2 callers, validated by whole genome sequencing (data available for all cases) and with at least one protein coding partner were kept for further analyses. Reciprocal fusion transcripts were counted as one fusion event. Results: In total 1,309 fusion transcripts were identified in 932 of 3,549 (26.3%) patients. 221 patients showed > 1 fusion (2 fusions: 150, 3: 36, >3: 35). 806 distinct fusion transcripts were divided into recurrent fusions (n=50) and unique fusions, i.e. found only in 1 case (n=756). Out of 932 patients with at least 1 fusion, 541 (58%) patients harbored a minimum of one recurrent fusion. The proportion of patients harboring any or a recurrent fusion varied substantially between different entities with high frequencies for both in CML (96.5%/96.5%), B-lineage ALL (53.1%/41.3%), AML (42.8%/31.2%), and T-lineage ALL (35.3%/12.6%). In several myeloid entities low fusion frequencies were observed (e.g. PMF, MDS/MPN-U, MDS, figure A). No fusion transcripts were detected in ET. Strikingly, fusions were detected in a substantial proportion of cases with lymphoid neoplasms but only very few occurred recurrently (e.g. T-PLL: 47.8%/4.3%, FL: 39.3%/4.9%, figure A). With regard to age, only patients with AML and T-ALL harboring recurrent fusions were significantly younger than corresponding cases without recurrent fusions (59 vs 71 yrs, p<0.0001; 35 vs 38 yrs, p=0.02). Only in AML patients with unique fusions were older (70 vs 66 yrs, p=0.02), while no age differences were observed between cases with and without unique fusions in other entities. 23/50 (46%) of the recurrent fusions were specific for one entity (12 in myeloid, 11 in lymphatic entities), while the other 54% (27/50) were observed in 2 to 7 different entities. Of these 27 recurrent fusions, only 16 fusions were shared between myeloid and lymphatic entities, while 10 were restricted to myeloid and one fusion to lymphatic entities (figure B). In total 1,270 different genes were involved in the 806 distinct fusions, indicating a broad spectrum of potential functional impact. 54 genes were involved only in recurrent fusions, 27 genes in both recurrent and unique fusions, while 1,189 genes were solely involved in unique fusions. Four genes involved in recurrent fusions and 32 genes involved in unique fusions are FDA approved drug targets (Human Protein Atlas). Only 16% (199/1270) of the genes were involved in more than one fusion: 3 genes (ETV6, KMT2A, RUNX1) in 14 fusions, 2 genes (ABL1, BCR) in 11 fusions, 16 genes in 4 to 10 fusions, 38 genes in 3 fusions, 140 in 2 fusions. Several genes frequently involved in fusions in hematological malignancies (e.g. ABL1, ETV6, KMT2A) and 78/1189 genes only involved in unique fusions were also reported to be partners in fusions in non-hematological malignancies. Conclusions: As known, in CML and acute several leukemias a high proportion of patients harbor fusions of which many occur recurrently, suggesting a substantial pathogenic impact and, thus, requiring detection in a diagnostic work-up. In BCR-ABL1 negative chronic myeloid malignancies few fusions were observed while lymphoma patients carry frequently non-recurrent fusions with so far unknown impact on pathogenesis and prognosis. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Yaoxian Lv ◽  
Lei Cai ◽  
Jingyang Gao

Abstract Background: Single-molecule real-time (SMRT) sequencing data are characterized by long reads and high read depth. Compared with next-generation sequencing (NGS), SMRT sequencing data can present more structural variations (SVs) and has greater advantages in calling variation. However, there are high sequencing errors and noises in SMRT sequencing data, which brings inaccurately on calling SVs from sequencing data. Most existing tools are unable to overcome the sequencing errors and detect genomic deletions. Methods and results: In this investigation, we propose a new method for calling deletions from SMRT sequencing data, called MaxDEL. MaxDEL can effectively overcome the noise of SMRT sequencing data and integrates new machine learning and deep learning technologies. Firstly, it uses machine learning method to calibrate the deletions regions from variant call format (VCF) file. Secondly, MaxDEL develops a novel feature visualization method to convert the variant features to images and uses these images to accurately call the deletions based on convolutional neural network (CNN). The result shows that MaxDEL performs better in terms of accuracy and recall for calling variants when compared with existing methods in both real data and simulative data. Conclusions: We propose a method (MAXDEL) for calling deletion variations, which effectively utilizes both machine learning and deep learning methods. We tested it with different SMRT data and evaluated its effectiveness. The research result shows that the use of machine learning and deep learning methods has great potential in calling deletion variations.


2018 ◽  
Vol 3 (1) ◽  
Author(s):  
Jennifer Reiner ◽  
Laura Pisani ◽  
Wanqiong Qiao ◽  
Ram Singh ◽  
Yao Yang ◽  
...  

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Huiyan Wang ◽  
Ning Wang ◽  
Yixin Huo

Abstract Background Azadirachtin A is a triterpenoid from neem tree exhibiting excellent activities against over 600 insect species in agriculture. The production of azadirachtin A depends on extraction from neem tissues, which is not an eco-friendly and sustainable process. The low yield and discontinuous supply of azadirachtin A impedes further applications. The biosynthetic pathway of azadirachtin A is still unknown and is the focus of our study. Results We attempted to explore azadirachtin A biosynthetic pathway and identified the key genes involved by analyzing transcriptome data from five neem tissues through the hybrid-sequencing (Illumina HiSeq and Pacific Biosciences Single Molecule Real-Time (SMRT)) approach. Candidates were first screened by comparing the expression levels between the five tissues. After phylogenetic analysis, domain prediction, and molecular docking studies, 22 candidates encoding 2,3-oxidosqualene cyclase (OSC), alcohol dehydrogenase, cytochrome P450 (CYP450), acyltransferase, and esterase were proposed to be potential genes involved in azadirachtin A biosynthesis. Among them, two unigenes encoding homologs of MaOSC1 and MaCYP71CD2 were identified. A unigene encoding the complete homolog of MaCYP71BQ5 was reported. Accuracy of the assembly was verified by quantitative real-time PCR (qRT-PCR) and full-length PCR cloning. Conclusions By integrating and analyzing transcriptome data from hybrid-seq technology, 22 differentially expressed genes (DEGs) were finally selected as candidates involved in azadirachtin A pathway. The obtained reliable and accurate sequencing data provided important novel information for understanding neem genome. Our data shed new light on understanding the biosynthesis of other triterpenoids in neem trees and provides a reference for exploring other valuable natural product biosynthesis in plants.


Food Control ◽  
2018 ◽  
Vol 93 ◽  
pp. 226-234 ◽  
Author(s):  
Jicheng Wang ◽  
Yi Zheng ◽  
Xiaoxia Xi ◽  
Qiangchuan Hou ◽  
Haiyan Xu ◽  
...  

2017 ◽  
Vol 5 (40) ◽  
Author(s):  
Jason N. Woodhouse ◽  
A. Katharina Makower ◽  
Hans-Peter Grossart ◽  
Elke Dittmann

ABSTRACT Two genome sequences of the phylum Armatimonadetes, derived from terrestrial environments, have been previously described. Here, two additional Armatimonadetes genome sequences were obtained via single-molecule real-time (SMRT) sequencing of an enrichment culture of the bloom-forming cyanobacterium Microcystis sp. isolated from a eutrophic lake (Brandenburg, Germany). The genomes are most closely affiliated with the class Fimbriimonadales, although they are smaller than the 5.6-Mbp type strain genome.


Sign in / Sign up

Export Citation Format

Share Document