scholarly journals iSeqQC: A Tool for Expression-Based Quality Control in RNA Sequencing

2019 ◽  
Author(s):  
Gaurav Kumar ◽  
Adam Ertel ◽  
George Feldman ◽  
Joan Kupper ◽  
Paolo Fortina

ABSTRACTQuality Control in any high-throughput sequencing technology is a critical step, which if overlooked can compromise the data. A number of methods exist to identify biases during sequencing or alignment, yet not many tools exist to interpret biases due to outliers or batch effects. Hence, we developed iSeqQC, an expression-based QC tool that detects outliers either produced by batch effects due to laboratory conditions or due to dissimilarity within a phenotypic group. iSeqQC implements various statistical approaches including unsupervised clustering, agglomerative hierarchical clustering and correlation coefficients to provide insight into outliers. It can be utilized either through command-line (Github: https://github.com/gkumar09/iSeqQC) or web-interface (http://cancerwebpa.jefferson.edu/iSeqQC). iSeqQC is a fast, light-weight, expression-based QC tool that detects outliers by implementing various statistical approaches.

2019 ◽  
Author(s):  
Florian Heyl ◽  
Rolf Backofen

The prediction of binding sites (peak calling) is a common task in the data analysis of methods such as crosslinking or chromatin immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq, ChIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns as an example. However, the obtained peak set can vary in their profile shapes because of the used peakcaller method, different binding domains of the protein, protocol biases, or other factors. Thus, a tool is missing that evaluates and classifies the predicted peaks based on their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP and ChIP. StoatyDive therefore fine tunes downstream analysis steps such as structure or sequence motif predictions and acts as a quality control.With StoatyDive we were able to classify distinct peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We show the potential of StoatyDive, as a quality control tool and as a filter to pick different shapes based on biological or methodical questions.StoatyDive is open source and freely available under GLP-3 at https://github.com/BackofenLab/StoatyDive and at bioconda https://anaconda.org/bioconda/stoatydive.


2017 ◽  
Author(s):  
Aziz Khan ◽  
Anthony Mathelier

AbstractBackgroundA common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited.ResultsTo address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets.ConclusionsIntervene and its web application companion provide an easy command line, and an interactive web interface to compute intersections of multiple genomic and list sets. They also have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene, with the web application available at https://asntech.shinyapps.io/intervene.


2019 ◽  
Vol 48 (2) ◽  
pp. e7-e7 ◽  
Author(s):  
Carine Legrand ◽  
Francesca Tuorto

Abstract Recently, newly developed ribosome profiling methods based on high-throughput sequencing of ribosome-protected mRNA footprints allow to study genome-wide translational changes in detail. However, computational analysis of the sequencing data still represents a bottleneck for many laboratories. Further, specific pipelines for quality control and statistical analysis of ribosome profiling data, providing high levels of both accuracy and confidence, are currently lacking. In this study, we describe automated bioinformatic and statistical diagnoses to perform robust quality control of ribosome profiling data (RiboQC), to efficiently visualize ribosome positions and to estimate ribosome speed (RiboMine) in an unbiased way. We present an R pipeline to setup and undertake the analyses that offers the user an HTML page to scan own data regarding the following aspects: periodicity, ligation and digestion of footprints; reproducibility and batch effects of replicates; drug-related artifacts; unbiased codon enrichment including variability between mRNAs, for A, P and E sites; mining of some causal or confounding factors. We expect our pipeline to allow an optimal use of the wealth of information provided by ribosome profiling experiments.


2020 ◽  
Author(s):  
Renesh Bedre ◽  
Carlos Avila ◽  
Kranthi Mandadi

AbstractMotivationUse of high-throughput sequencing (HTS) has become indispensable in life science research. Raw HTS data contains several sequencing artifacts, and as a first step it is imperative to remove the artifacts for reliable downstream bioinformatics analysis. Although there are multiple stand-alone tools available that can perform the various quality control steps separately, availability of an integrated tool that can allow one-step, automated quality control analysis of HTS datasets will significantly enhance handling large number of samples parallelly.ResultsHere, we developed HTSeqQC, a stand-alone, flexible, and easy-to-use software for one-step quality control analysis of raw HTS data. HTSeqQC can evaluate HTS data quality and perform filtering and trimming analysis in a single run. We evaluated the performance of HTSeqQC for conducting batch analysis of HTS datasets with 322 sample datasets with an average ∼ 1M (paired end) sequence reads per sample. HTSeqQC accomplished the QC analysis in ∼3 hours in distributed mode and ∼31 hours in shared mode, thus underscoring its utility and robust performance.Availability and implementationHTSeqQC software, Docker image and Nextflow template are available for download at https://github.com/reneshbedre/HTSeqQC and graphical user interface (GUI) is available at CyVerse Discovery Environment (DE) (https://cyverse.org/). Documentation available at https://reneshbedre.github.io/blog/htseqqc.html and https://cyverse-htseqqc-cyverse-tutorial.readthedocs-hosted.com/en/latest/ (for CyVerse).ContactKranthi Mandadi ([email protected])Supplementary informationSupplementary information provided in Supplementary File 1.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tatsuhiko Hoshino ◽  
Ryohei Nakao ◽  
Hideyuki Doi ◽  
Toshifumi Minamoto

AbstractThe combination of high-throughput sequencing technology and environmental DNA (eDNA) analysis has the potential to be a powerful tool for comprehensive, non-invasive monitoring of species in the environment. To understand the correlation between the abundance of eDNA and that of species in natural environments, we have to obtain quantitative eDNA data, usually via individual assays for each species. The recently developed quantitative sequencing (qSeq) technique enables simultaneous phylogenetic identification and quantification of individual species by counting random tags added to the 5′ end of the target sequence during the first DNA synthesis. Here, we applied qSeq to eDNA analysis to test its effectiveness in biodiversity monitoring. eDNA was extracted from water samples taken over 4 days from aquaria containing five fish species (Hemigrammocypris neglectus, Candidia temminckii, Oryzias latipes, Rhinogobius flumineus, and Misgurnus anguillicaudatus), and quantified by qSeq and microfluidic digital PCR (dPCR) using a TaqMan probe. The eDNA abundance quantified by qSeq was consistent with that quantified by dPCR for each fish species at each sampling time. The correlation coefficients between qSeq and dPCR were 0.643, 0.859, and 0.786 for H. neglectus, O. latipes, and M. anguillicaudatus, respectively, indicating that qSeq accurately quantifies fish eDNA.


2021 ◽  
Vol 11 (11) ◽  
pp. 5046
Author(s):  
Zong-Wei Liu ◽  
Chun-Mei Yang ◽  
Ying Jiang ◽  
Lei Xie ◽  
Jin-Yan Du ◽  
...  

Array gain is investigated based on the acoustic channel characteristics manifested by the fluctuant transmission loss and decrease in the acoustic channel spatial coherence. An analytical expression is derived as the summation of the products of the acoustic channel correlation coefficients and root-mean-square pressures. The formula provides insight into the physical mechanisms of the gain degradation in the ocean waveguide. Furthermore, this formula provides a new method to study array gain in the ocean waveguide from underwater acoustic field. The obtained expression is a more general formula that is applicable to shallow water, deep sea, and continental slope, with the traditional methods as a special case. Numerical results show that the array gain calculated by previous formulas are generally overestimated, caused by ignoring the effect of transmission loss fluctuation.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xue Lin ◽  
Yingying Hua ◽  
Shuanglin Gu ◽  
Li Lv ◽  
Xingyu Li ◽  
...  

Abstract Background Genomic localized hypermutation regions were found in cancers, which were reported to be related to the prognosis of cancers. This genomic localized hypermutation is quite different from the usual somatic mutations in the frequency of occurrence and genomic density. It is like a mutations “violent storm”, which is just what the Greek word “kataegis” means. Results There are needs for a light-weighted and simple-to-use toolkit to identify and visualize the localized hypermutation regions in genome. Thus we developed the R package “kataegis” to meet these needs. The package used only three steps to identify the genomic hypermutation regions, i.e., i) read in the variation files in standard formats; ii) calculate the inter-mutational distances; iii) identify the hypermutation regions with appropriate parameters, and finally one step to visualize the nucleotide contents and spectra of both the foci and flanking regions, and the genomic landscape of these regions. Conclusions The kataegis package is available on Bionconductor/Github (https://github.com/flosalbizziae/kataegis), which provides a light-weighted and simple-to-use toolkit for quickly identifying and visualizing the genomic hypermuation regions.


Author(s):  
David W. Adams ◽  
Cameron D. E. Summerville ◽  
Brendan M. Voss ◽  
Jack Jeswiet ◽  
Matthew C. Doolan

Traditional quality control of resistance spot welds by analysis of the dynamic resistance signature (DRS) relies on manual feature selection to reduce the dimensionality prior to analysis. Manually selected features of the DRS may contain information that is not directly correlated to strength, reducing the accuracy of any classification performed. In this paper, correlations between the DRS and weld strength are automatically detected by calculating correlation coefficients between weld strength and principal components of the DRS. The key features of the DRS that correlate to weld strength are identified in a systematic manner. Systematically identifying relevant features of the DRS is useful as the correlations between weld strength and DRS may vary with process parameters.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7853 ◽  
Author(s):  
Yuchen Yan ◽  
Gengyun Niu ◽  
Yaoyao Zhang ◽  
Qianying Ren ◽  
Shiyu Du ◽  
...  

Labriocimbex sinicus Yan & Wei gen. et sp. nov. of Cimbicidae is described. The new genus is similar to Praia Andre and Trichiosoma Leach. A key to extant Holarctic genera of Cimbicinae is provided. To identify the phylogenetic placement of Cimbicidae, the mitochondrial genome of L. sinicus was annotated and characterized using high-throughput sequencing data. The complete mitochondrial genome of L. sinicus was obtained with a length of 15,405 bp (GenBank: MH136623; SRA: SRR8270383) and a typical set of 37 genes (22 tRNAs, 13 PCGs, and two rRNAs). The results demonstrated that all PCGs were initiated by ATN codon, and ended with TAA or T stop codons. The study reveals that all tRNA genes have a typical clover-leaf secondary structure, except for trnS1. Remarkably, the secondary structures of the rrnS and rrnL of L. sinicus were much different from those of Corynis lateralis. Phylogenetic analyses verified the monophyly and positions of the three Cimbicidae species within the superfamily Tenthredinoidea and demonstrated a relationship as (Tenthredinidae + Cimbicidae) + (Argidae + Pergidae) with strong nodal supports. Furthermore, we found that the generic relationships of Cimbicidae revealed by the phylogenetic analyses based on COI genes agree quite closely with the systematic arrangement of the genera based on the morphological characters. Phylogenetic tree based on two methods shows that L. sinicus is the sister group of Praia with high support values. We suggest that Labriocimbex belongs to the tribe Trichiosomini of Cimbicinae based on adult morphology and molecular data. Besides, we suggest to promote the subgenus Asitrichiosoma to be a valid genus.


2019 ◽  
Author(s):  
Anthony Federico ◽  
Stefano Monti

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.


Sign in / Sign up

Export Citation Format

Share Document