scholarly journals metID: a R package for automatable compound annotation for LC−MS-based data

Author(s):  
Xiaotao Shen ◽  
Si Wu ◽  
Liang Liang ◽  
Songjie Chen ◽  
Kévin Contrepois ◽  
...  

Abstract Summary Accurate and efficient compound annotation is a long-standing challenge for LC−MS-based data (e.g., untargeted metabolomics and exposomics). Substantial efforts have been devoted to overcoming this obstacle, whereas current tools are limited by the sources of spectral information used (in-house and public databases) and are not automated and streamlined. Therefore, we developed metID, an R package that combines information from all major databases for comprehensive and streamlined compound annotation. metID is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the compound annotation process to be fully automatic and reproducible. A detailed tutorial and a case study are provided in Supplementary Materials. Availability and implementation https://jaspershen.github.io/metID. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
Xiaotao Shen ◽  
Si Wu ◽  
Liang Liang ◽  
Songjie Chen ◽  
Kevin Contrepois ◽  
...  

Accurate and efficient compound annotation is a long-standing challenge for LC−MSbased data (e.g. untargeted metabolomics and exposomics). Substantial efforts have been devoted to overcoming this obstacle, whereas current tools are limited by the sources of spectral information used (in-house and public databases) and are not automated and streamlined. Therefore, we developed metID, an R package that combines information from all major databases for comprehensive and streamlined compound annotation. metID is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the compound annotation process to be fully automatic and reproducible. A detailed tutorial and a case study are provided in Supplementary Materials.


2017 ◽  
Author(s):  
Zhun Miao ◽  
Ke Deng ◽  
Xiaowo Wang ◽  
Xuegong Zhang

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.


2019 ◽  
Vol 36 (8) ◽  
pp. 2587-2588 ◽  
Author(s):  
Christopher M Ward ◽  
Thu-Hien To ◽  
Stephen M Pederson

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3516-3521 ◽  
Author(s):  
Lixiang Zhang ◽  
Lin Lin ◽  
Jia Li

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (10) ◽  
pp. 1797-1798 ◽  
Author(s):  
Han Cao ◽  
Jiayu Zhou ◽  
Emanuel Schwarz

Abstract Motivation Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. Results We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. Availability and implementation The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2291-2292 ◽  
Author(s):  
Saskia Freytag ◽  
Ryan Lister

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (20) ◽  
pp. 4190-4192 ◽  
Author(s):  
Vincenzo Belcastro ◽  
Stephane Cano ◽  
Diego Marescotti ◽  
Stefano Acali ◽  
Carine Poussin ◽  
...  

Abstract Summary GladiaTOX R package is an open-source, flexible solution to high-content screening data processing and reporting in biomedical research. GladiaTOX takes advantage of the ‘tcpl’ core functionalities and provides a number of extensions: it provides a web-service solution to fetch raw data; it computes severity scores and exports ToxPi formatted files; furthermore it contains a suite of functionalities to generate PDF reports for quality control and data processing. Availability and implementation GladiaTOX R package (bioconductor). Also available via: git clone https://github.com/philipmorrisintl/GladiaTOX.git. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zachary B Abrams ◽  
Dwayne G Tally ◽  
Lynne V Abruzzo ◽  
Kevin R Coombes

Abstract Summary Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology. Availability and Implementation Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS. Supplementary information There is no supplementary data.


Author(s):  
Xiaohua Douglas Zhang ◽  
Dandan Wang ◽  
Shixue Sun ◽  
Heping Zhang

Abstract Motivation High-throughput screening (HTS) is a vital automation technology in biomedical research in both industry and academia. The well-known Z-factor has been widely used as a gatekeeper to assure assay quality in an HTS study. However, many researchers and users may not have realized that Z-factor has major issues. Results In this article, the following four major issues are explored and demonstrated so that researchers may use the Z-factor appropriately. First, the Z-factor violates the Pythagorean theorem of statistics. Second, there is no adjustment of sampling error in the application of the Z-factor for quality control (QC) in HTS studies. Third, the expectation of the sample-based Z-factor does not exist. Fourth, the thresholds in the Z-factor-based criterion lack a theoretical basis. Here, an approach to avoid these issues was proposed and new QC criteria under homoscedasticity were constructed so that researchers can choose a statistically grounded criterion for QC in the HTS studies. We implemented this approach in an R package and demonstrated its utility in multiple CRISPR/CAS9 or siRNA HTS studies. Availability and implementation The R package qcSSMDhomo is freely available from GitHub: https://github.com/Karena6688/qcSSMDhomo. The file qcSSMDhomo_1.0.0.tar.gz (for Windows) containing qcSSMDhomo is also available at Bioinformatics online. qcSSMDhomo is distributed under the GNU General Public License. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Daniel G Bunis ◽  
Jared Andrews ◽  
Gabriela K Fragiadakis ◽  
Trevor D Burt ◽  
Marina Sirota

Abstract Summary A visualization suite for major forms of bulk and single-cell RNAseq data in R. dittoSeq is color blindness-friendly by default, robustly documented to power ease-of-use and allows highly customizable generation of both daily-use and publication-quality figures. Availability and implementation dittoSeq is an R package available through Bioconductor via an open source MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document