STAR Chimeric Post For Rapid Detection of Circular RNA and Fusion Transcripts

AbstractMotivationThe biological relevance of chimeric RNA alignments is now well established. Chimera arising as chromosomal fusions are often drivers of cancer, and recently discovered circular RNA are only now being characterized. While software already exists for fusion discovery and quantitation, high false positive rates and high run-times hamper scalable fusion discovery on large datasets. Furthermore, very little software is available for circular RNA detection and quantification.ResultsHere we present STAR Chimeric Post (STARChip), a novel software package that processes chimeric alignments from the STAR aligner and produces annotated circular RNA and high precision fusions in a rapid, efficient, and scalable manner that is appropriate for high dimensional medical omics datasets.Availability and ImplementationSTARChip is available at https://github.com/LosicLab/[email protected] or [email protected] InformationSupplementary figures and tables are available online.

Download Full-text

IBDkin: fast estimation of kinship coefficients from identity by descent segments

Bioinformatics ◽

10.1093/bioinformatics/btaa569 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4519-4520

Author(s):

Ying Zhou ◽

Sharon R Browning ◽

Brian L Browning

Keyword(s):

Software Package ◽

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Uk Biobank ◽

Identity By Descent ◽

Fast Estimation ◽

Kinship Coefficients ◽

Related Individuals ◽

The Uk

Abstract Motivation Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. Results We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. Availability and implementation https://github.com/YingZhou001/IBDkin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CircMiner: accurate and rapid detection of circular RNA through splice-aware pseudo-alignment scheme

Bioinformatics ◽

10.1093/bioinformatics/btaa232 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3703-3711 ◽

Cited By ~ 1

Author(s):

Hossein Asghari ◽

Yen-Yi Lin ◽

Yang Xu ◽

Ehsan Haghshenas ◽

Colin C Collins ◽

...

Keyword(s):

Cell Line ◽

Rapid Detection ◽

High Throughput Sequencing ◽

Circular Rna ◽

Supplementary Information ◽

Circular Rnas ◽

Alignment Technique ◽

Nucleotide Resolution ◽

High Degree ◽

Splice Junctions

Abstract Motivation The ubiquitous abundance of circular RNAs (circRNAs) has been revealed by performing high-throughput sequencing in a variety of eukaryotes. circRNAs are related to some diseases, such as cancer in which they act as oncogenes or tumor-suppressors and, therefore, have the potential to be used as biomarkers or therapeutic targets. Accurate and rapid detection of circRNAs from short reads remains computationally challenging. This is due to the fact that identifying chimeric reads, which is essential for finding back-splice junctions, is a complex process. The sensitivity of discovery methods, to a high degree, relies on the underlying mapper that is used for finding chimeric reads. Furthermore, all the available circRNA discovery pipelines are resource intensive. Results We introduce CircMiner, a novel stand-alone circRNA detection method that rapidly identifies and filters out linear RNA sequencing reads and detects back-splice junctions. CircMiner employs a rapid pseudo-alignment technique to identify linear reads that originate from transcripts, genes or the genome. CircMiner further processes the remaining reads to identify the back-splice junctions and detect circRNAs with single-nucleotide resolution. We evaluated the efficacy of CircMiner using simulated datasets generated from known back-splice junctions and showed that CircMiner has superior accuracy and speed compared to the existing circRNA detection tools. Additionally, on two RNase R treated cell line datasets, CircMiner was able to detect most of consistent, high confidence circRNAs compared to untreated samples of the same cell line. Availability and implementation CircMiner is implemented in C++ and is available online at https://github.com/vpc-ccg/circminer. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SCHNEL: Scalable clustering of high dimensional single-cell data

10.1101/2020.03.30.015925 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tamim Abdelaal ◽

Paul de Raadt ◽

Boudewijn P.F. Lelieveldt ◽

Marcel J.T. Reinders ◽

Ahmed Mahfouz

Keyword(s):

Single Cell ◽

Graph Clustering ◽

Original Data ◽

Large Datasets ◽

Supplementary Information ◽

High Dimensional ◽

Sequencing Data ◽

Novel Approach ◽

Cellular Markers ◽

Cell Data

AbstractMotivationSingle cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets.ResultsWe developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable timeframes. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.Availability and ImplementationImplementation is available on GitHub (https://github.com/paulderaadt/HSNE-clustering)[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

SCHNEL: scalable clustering of high dimensional single-cell data

Bioinformatics ◽

10.1093/bioinformatics/btaa816 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i849-i856

Author(s):

Tamim Abdelaal ◽

Paul de Raadt ◽

Boudewijn P F Lelieveldt ◽

Marcel J T Reinders ◽

Ahmed Mahfouz

Keyword(s):

Single Cell ◽

Graph Clustering ◽

Original Data ◽

Large Datasets ◽

Supplementary Information ◽

High Dimensional ◽

Sequencing Data ◽

Novel Approach ◽

Cellular Markers ◽

Cell Data

Abstract Motivation Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. Availability and implementation Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Rapid detection and quantification of adulteration in Chinese hawthorn fruits powder by near-infrared spectroscopy combined with chemometrics

Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy ◽

10.1016/j.saa.2020.119346 ◽

2021 ◽

Vol 250 ◽

pp. 119346

Author(s):

Xuefen Sun ◽

Huiling Li ◽

Yuan Yi ◽

Haimin Hua ◽

Ying Guan ◽

...

Keyword(s):

Infrared Spectroscopy ◽

Near Infrared Spectroscopy ◽

Rapid Detection ◽

Near Infrared ◽

Detection And Quantification

Download Full-text

Train Fast While Reducing False Positives: Improving Animal Classification Performance Using Convolutional Neural Networks

Geomatics ◽

10.3390/geomatics1010004 ◽

2021 ◽

Vol 1 (1) ◽

pp. 34-49

Author(s):

Mael Moreni ◽

Jerome Theau ◽

Samuel Foucher

Keyword(s):

False Positive ◽

Classification Performance ◽

Large Datasets ◽

False Positives ◽

Fine Tuning ◽

Training Time ◽

In The Wild ◽

Test Sets ◽

High Level ◽

Time Decrease

The combination of unmanned aerial vehicles (UAV) with deep learning models has the capacity to replace manned aircrafts for wildlife surveys. However, the scarcity of animals in the wild often leads to highly unbalanced, large datasets for which even a good detection method can return a large amount of false detections. Our objectives in this paper were to design a training method that would reduce training time, decrease the number of false positives and alleviate the fine-tuning effort of an image classifier in a context of animal surveys. We acquired two highly unbalanced datasets of deer images with a UAV and trained a Resnet-18 classifier using hard-negative mining and a series of recent techniques. Our method achieved sub-decimal false positive rates on two test sets (1 false positive per 19,162 and 213,312 negatives respectively), while training on small but relevant fractions of the data. The resulting training times were therefore significantly shorter than they would have been using the whole datasets. This high level of efficiency was achieved with little tuning effort and using simple techniques. We believe this parsimonious approach to dealing with highly unbalanced, large datasets could be particularly useful to projects with either limited resources or extremely large datasets.

Download Full-text

CPVA: a web-based metabolomic tool for chromatographic peak visualization and annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa200 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3913-3915

Author(s):

Hemi Luan ◽

Xingen Jiang ◽

Fenfen Ji ◽

Zhangzhang Lan ◽

Zongwei Cai ◽

...

Keyword(s):

False Positive ◽

Supplementary Information ◽

Liquid Chromatography Mass Spectrometry ◽

Targeted Metabolomics ◽

Metabolomics Data ◽

Web Based ◽

Tremendous Amount ◽

Chromatographic Peaks ◽

User Friendly

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text