scholarly journals Homeolog expression quantification methods for allopolyploids

2018 ◽  
Author(s):  
Tony Kuo ◽  
Masaomi Hatakeyama ◽  
Toshiaki Tameshige ◽  
Kentaro K. Shimizu ◽  
Jun Sese

AbstractGenome duplication with hybridization, or allopolyploidization, occurs in animals, fungi, and plants, and is especially common in crop plants. There is increasing interest in the study of allopolyploids due to advances in polyploid genome assembly, however the high level of sequence similarity in duplicated gene copies (homeologs) pose many challenges. Here we compared standard RNA-seq expression quantification approaches used currently for diploid species against subgenome-classification approaches which maps reads to each subgenome separately. We examined mapping error using our previous and new RNA-seq data in which a subgenome is experimentally added (synthetic allotetraploid Arabidopsis kamchatica) or reduced (allohexaploid wheat Triticum aestivum versus extracted allotetraploid) as ground truth. The error rates in the two species were very similar. The standard approaches showed higher error rates (> 10% using pseudo-alignment with Kallisto) while subgenome-classification approaches showed much lower error rates (< 1% using EAGLE-RC, < 2% using HomeoRoq). Although downstream analysis may partly mitigate mapping errors, the difference in methods was substantial in hexaploid wheat, where Kallisto appeared to have systematic differences relative to other methods. Only approximately half of the differentially expressed homeologs detected using Kallisto overlapped with those by any other method. In general, disagreement in low expression genes was responsible for most of the discordance between methods, which is consistent with known biases in Kallisto. We also observed that there exist uncertainties in genome sequences and annotation which can affect each method differently. Overall, subgenome-classification approaches tend to perform better than standard approaches with EAGLE-RC having the highest precision.

2018 ◽  
Vol 21 (2) ◽  
pp. 395-407 ◽  
Author(s):  
Tony C Y Kuo ◽  
Masaomi Hatakeyama ◽  
Toshiaki Tameshige ◽  
Kentaro K Shimizu ◽  
Jun Sese

Abstract Genome duplication with hybridization, or allopolyploidization, occurs in animals, fungi and plants, and is especially common in crop plants. There is an increasing interest in the study of allopolyploids because of advances in polyploid genome assembly; however, the high level of sequence similarity in duplicated gene copies (homeologs) poses many challenges. Here we compared standard RNA-seq expression quantification approaches used currently for diploid species against subgenome-classification approaches which maps reads to each subgenome separately. We examined mapping error using our previous and new RNA-seq data in which a subgenome is experimentally added (synthetic allotetraploid Arabidopsis kamchatica) or reduced (allohexaploid wheat Triticum aestivum versus extracted allotetraploid) as ground truth. The error rates in the two species were very similar. The standard approaches showed higher error rates (&gt;10% using pseudo-alignment with Kallisto) while subgenome-classification approaches showed much lower error rates (&lt;1% using EAGLE-RC, &lt;2% using HomeoRoq). Although downstream analysis may partly mitigate mapping errors, the difference in methods was substantial in hexaploid wheat, where Kallisto appeared to have systematic differences relative to other methods. Only approximately half of the differentially expressed homeologs detected using Kallisto overlapped with those by any other method in wheat. In general, disagreement in low-expression genes was responsible for most of the discordance between methods, which is consistent with known biases in Kallisto. We also observed that there exist uncertainties in genome sequences and annotation which can affect each method differently. Overall, subgenome-classification approaches tend to perform better than standard approaches with EAGLE-RC having the highest precision.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12233
Author(s):  
Diem-Trang Tran ◽  
Matthew Might

Normalization of RNA-seq data has been an active area of research since the problem was first recognized a decade ago. Despite the active development of new normalizers, their performance measures have been given little attention. To evaluate normalizers, researchers have been relying on ad hoc measures, most of which are either qualitative, potentially biased, or easily confounded by parametric choices of downstream analysis. We propose a metric called condition-number based deviation, or cdev, to quantify normalization success. cdev measures how much an expression matrix differs from another. If a ground truth normalization is given, cdev can then be used to evaluate the performance of normalizers. To establish experimental ground truth, we compiled an extensive set of public RNA-seq assays with external spike-ins. This data collection, together with cdev, provides a valuable toolset for benchmarking new and existing normalization methods.


2018 ◽  
Vol 11 (2) ◽  
Author(s):  
Véronique Drai-Zerbib ◽  
Thierry Baccino

The study investigated the cross-modal integration hypothesis for expert musicians using eye tracking. Twenty randomized excerpts of classical music were presented in two modes (auditory and visual), at the same time (simultaneously) or successively (sequentially). Musicians (N = 53, 26 experts and 27 non-experts) were asked to detect a note modified between the auditory and visual versions, either in the same major/minor key or violating the key. Experts carried out the task faster and with greater accuracy than non-experts. Sequential presentation was more difficult than simultaneous (longer fixations and higher error rates) and the modified notes were more easily detected when violating the key (fewer errors), but with longer fixations (speed/accuracy trade-off strategy). Experts detected the modified note faster, especially in the simultaneous condition in which cross-modal integration may be applied. These results support the hypothesis that the main difference between experts and non-experts derives from the difference in knowledge structures in memory built over time with practice. They also suggest that these high-level knowledge structures in memory contain harmony and tonal rules, arguing in favour of cross-modal integration capacities for experts, which are related to and can be explained by the long-term working memory (LTWM) model of expert memory (e.g. Drai-Zerbib & Baccino, 2014; Ericsson & Kintsch, 1995).


GigaScience ◽  
2019 ◽  
Vol 8 (12) ◽  
Author(s):  
Hong Zheng ◽  
Kevin Brennan ◽  
Mikel Hernaez ◽  
Olivier Gevaert

Abstract Background Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification. Results In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods. Conclusions Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.


2018 ◽  
Author(s):  
Ping-Han Hsieh ◽  
Yen-Jen Oyang ◽  
Chien-Yu Chen

AbstractBackgroundCorrect quantification of transcript expression is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For such projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation on the abundance of transcripts. In this regard, this study comprehensively investigates how assembly quality affects the performance of quantification for RNA-Seq analysis based on de novo transcriptome assembly.ResultsSeveral important factors that might seriously affect the accuracy of the RNA-Seq analysis were thoroughly discussed. First, we found that the assemblers perform comparatively well for the transcriptomes with lower biological complexity. Second, we examined the over-extended and incomplete contigs, and then demonstrated that assembly completeness has a strong impact on the estimation of contig abundance. Lastly, we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally present in the transcriptome or accidentally produced by assemblers. The results suggest that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs. For organisms without reference transcriptome, it remained challenging to detect the inaccurate abundance estimation on family-collapse contigs. On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read distribution of the duplicated contigs.ConclusionsIn summary, we explicated the behavior of quantifiers when erroneous contigs are present and we outlined the potential problems that the assemblers might cause for the downstream analysis of RNA-Seq. We anticipate the analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification.Availabilitywe proposed an open-source Python based package QuantEval that builds connected components for the assembled contigs based on sequence similarity and evaluates the quantification results for each connected component. The package can be downloaded from https://github.com/dn070017/QuantEval.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yue You ◽  
Luyi Tian ◽  
Shian Su ◽  
Xueyi Dong ◽  
Jafar S. Jabbari ◽  
...  

Abstract Background Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. Results Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. Conclusions In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.


2021 ◽  
Author(s):  
Madalina Ciortan ◽  
Matthieu Defrance

Single-cell RNA sequencing (scRNA-seq) produces transcriptomic profiling for individual cells. Due to the lack of cell-class annotations, scRNA-seq is routinely analyzed with unsupervised clustering methods. Because these methods are typically limited to producing clustering predictions (that is, assignment of cells to clusters of similar cells), numerous model agnostic differential expression (DE) libraries have been proposed to identify the genes expressed differently in the detected clusters, as needed in the downstream analysis. In parallel, the advancements in neural networks (NN) brought several model-specific explainability methods to identify salient features based on gradients, eliminating the need for external models. We propose a comprehensive study to compare the performance of dedicated DE methods, with that of explainability methods typically used in machine learning, both model agnostic (such as SHAP, permutation importance) and model-specific (such as NN gradient-based methods). The DE analysis is performed on the results of 3 state-of-the-art clustering methods based on NNs. Our results on 36 simulated datasets indicate that all analyzed DE methods have limited agreement between them and with ground-truth genes. The gradients method outperforms the traditional DE methods, which encourages the development of NN-based clustering methods to provide an out-of-the-box DE capability. Employing DE methods on the input data preprocessed by clustering method outperforms the traditional approach of using the original count data, albeit still performing worse than gradient-based methods.


2018 ◽  
Author(s):  
Hong Zheng ◽  
Kevin Brennan ◽  
Mikel Hernaez ◽  
Olivier Gevaert

ABSTRACTLong non-coding RNAs (lncRNAs) emerge as important regulators of various biological processes. Many lncRNAs with tumor-suppressor or oncogenic functions in cancer have been discovered. While many studies have exploited public resources such as RNA-Seq data in The Cancer Genome Atlas (TCGA) to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification of lncRNAs. In this benchmarking study, we compared the performance of pseudoalignment methods Kallisto and Salmon, and alignment-based methods HTSeq, featureCounts, and RSEM, in lncRNA quantification, by applying them to a simulated RNA-Seq dataset and a pan-cancer RNA-Seq dataset from TCGA. We observed that full transcriptome annotation, including both protein coding and noncoding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment-based methods detect more lncRNAs than alignment-based methods and correlate highly with simulated ground truth. On the contrary, alignment-based methods tend to underestimate lncRNA expression or even fail to capture lncRNA signal in the ground truth. These underestimated genes include cancer-relevant lncRNAs such as TERC and ZEB2-AS1. Overall, 10–16% of lncRNAs can be detected in the samples, with antisense and lincRNAs the two most abundant categories. A higher proportion of antisense RNAs are detected than lincRNAs. Moreover, among the expressed lncRNAs, more antisense RNAs are discordant from ground truth than lincRNAs when measured by alignment-based methods, indicating that antisense RNAs are more susceptible to mis-quantification. In addition, the lncRNAs with fewer transcripts, less than three exons, and lower sequence uniqueness tend to be more discordant. In summary, pseudoalignment methods Kallisto or Salmon in combination with the full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.AUTHOR SUMMARYLong non-coding RNAs (lncRNAs) emerge as important regulators of various biological processes. Our benchmarking work on both simulated RNA-Seq dataset and pan-cancer dataset provides timely and useful recommendations for wide research community who are studying lncRNAs, especially for those who are exploring public resources such as TCGA RNA-Seq data. We demonstrate that using full transcriptome annotation in RNA-Seq analysis is strongly recommended as it greatly improves the specificity of lncRNA quantification. What’s more, pseudoalignment methods Kallisto and Salmon outperform alignment-based methods in lncRNA quantification. It is worth noting that the default workflow for TCGA RNA-Seq data stored in Genomic Data Commons (GDC) data portal uses HTSeq, an alignment-based method. Thus, reanalyzing the data might be considered when checking gene expression in TCGA datasets. In summary, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.


1998 ◽  
Vol 79 (06) ◽  
pp. 1184-1190 ◽  
Author(s):  
Yoshiaki Tomiyama ◽  
Shigenori Honda ◽  
Kayoko Senzaki ◽  
Akito Tanaka ◽  
Mitsuru Okubo ◽  
...  

SummaryThis study investigated the difference of [Ca2+]i movement in platelets in response to thrombin and TRAP. The involvement of αIIbβ3 in this signaling was also studied. Stimulation of platelets with thrombin at 0.03 U/ml caused platelet aggregation and a two-peak increase in [Ca2+]i. The second peak of [Ca2+]i, but not the first peak was abolished by the inhibition of platelet aggregation with αIIbβ3 antagonists or by scavenging endogenous ADP with apyrase. A cyclooxygenase inhibitor, aspirin, and a TXA2 receptor antagonist, BM13505, also abolished the second peak of [Ca2+]i but not the first peak, although these regents did not inhibit aggregation. Under the same assay conditions, measurement of TXB2 demonstrated that αIIbβ3 antagonists and aspirin almost completely inhibited the production of TXB2. In contrast to thrombin-stimulation, TRAP caused only a single peak of [Ca2+]i even in the presence of platelet aggregation, and a high level of [Ca2+]i increase was needed for the induction of platelet aggregation. The inhibition of aggregation with αIIbβ3 antagonists had no effect on [Ca2+]i change and TXB2 production induced by TRAP. Inhibition studies using anti-GPIb antibodies suggested that GPIb may be involved in the thrombin response, but not in the TRAP. Our findings suggest that low dose thrombin causes a different [Ca2+]i response and TXA2 producing signal from TRAP. Endogenous ADP release and fibrinogen binding to αIIbβ3 are responsible for the synthesis of TXA2 which results in the induction of the second peak of [Ca2+]i in low thrombin- but not TRAP-stimulated platelets.


2018 ◽  
Vol 1 (1) ◽  
pp. 6-21 ◽  
Author(s):  
I. K. Razumova ◽  
N. N. Litvinova ◽  
M. E. Shvartsman ◽  
A. Yu. Kuznetsov

Introduction. The paper presents survey results on the awareness towards and practice of Open Access scholarly publishing among Russian academics.Materials and Methods. We employed methods of statistical analysis of survey results. Materials comprise results of data processing of Russian survey conducted in 2018 and published results of the latest international surveys. The survey comprised 1383 respondents from 182 organizations. We performed comparative studies of the responses from academics and research institutions as well as different research areas. The study compares results obtained in Russia with the recently published results of surveys conducted in the United Kingdom and Europe.Results. Our findings show that 95% of Russian respondents support open access, 94% agree to post their publications in open repositories and 75% have experience in open access publishing. We did not find any difference in the awareness and attitude towards open access among seven reference groups. Our analysis revealed the difference in the structure of open access publications of the authors from universities and research institutes. Discussion andConclusions. Results reveal a high level of awareness and support to open access and succeful practice in the open access publications in the Russian scholarly community. The results for Russia demonstrate close similarity with the results of the UK academics. The governmental open access policies and programs would foster the practical realization of the open access in Russia.


Sign in / Sign up

Export Citation Format

Share Document