GARCOM: A user-friendly R package for genetic mutation counts

Targeted sequencing is an increasingly popular Next Generation Sequencing (NGS) approach for studying populations, through focusing sequencing efforts on specific parts of the genome of a species of interest. Methodologies and tools for designing targeted baits are scarce but in high demand. Here, we present specific guidelines and considerations for designing capture sequencing experiments for population genetics for both neutral genomic regions and regions subject to selection. We describe the bait design process for three diverse fish species: Atlantic salmon, Atlantic cod and tiger shark, which was carried out in our research group, and provide an evaluation of the performance of our approach across both historical and modern samples. The workflow used for designing these three bait sets has been implemented in the R-package supeRbaits, which encompass our considerations and guidelines for bait design to benefit researchers and practitioners. The supeRbaits R package is user‐friendly and versatile. It is written in C++ and implemented in R. supeRbaits and its manual are available from Github: https://github.com/BelenJM/supeRbaits

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multiple moderator meta-analysis using the R-package Meta-CART

Behavior Research Methods ◽

10.3758/s13428-020-01360-0 ◽

2020 ◽

Vol 52 (6) ◽

pp. 2657-2673

Author(s):

Xinru Li ◽

Elise Dusseldorp ◽

Xiaogang Su ◽

Jacqueline J. Meulman

Keyword(s):

Random Effects ◽

Treatment Effectiveness ◽

Meta Analysis ◽

R Package ◽

Effect Sizes ◽

Tree Model ◽

Effective Interventions ◽

Look Ahead ◽

Fixed And Random Effects ◽

User Friendly

AbstractIn meta-analysis, heterogeneity often exists between studies. Knowledge about study features (i.e., moderators) that can explain the heterogeneity in effect sizes can be useful for researchers to assess the effectiveness of existing interventions and design new potentially effective interventions. When there are multiple moderators, they may amplify or attenuate each other’s effect on treatment effectiveness. However, in most meta-analysis studies, interaction effects are neglected due to the lack of appropriate methods. The method meta-CART was recently proposed to identify interactions between multiple moderators. The analysis result is a tree model in which the studies are partitioned into more homogeneous subgroups by combinations of moderators. This paper describes the R-package metacart, which provides user-friendly functions to conduct meta-CART analyses in R. This package can fit both fixed- and random-effects meta-CART, and can handle dichotomous, categorical, ordinal and continuous moderators. In addition, a new look ahead procedure is presented. The application of the package is illustrated step-by-step using diverse examples.

Download Full-text

gpart: human genome partitioning and visualization of high-density SNP data by identifying haplotype blocks

Bioinformatics ◽

10.1093/bioinformatics/btz308 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4419-4421 ◽

Cited By ~ 3

Author(s):

Sun Ah Kim ◽

Myriam Brossard ◽

Delnaz Roshandel ◽

Andrew D Paterson ◽

Shelley B Bull ◽

...

Keyword(s):

Clustering Algorithms ◽

R Package ◽

Supplementary Information ◽

Visualization Tool ◽

Sequencing Data ◽

Haplotype Blocks ◽

Snp Data ◽

Computing Environments ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Summary For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image. The gpart functions facilitate construction of LD blocks and SNP partitions for vast amounts of genome sequencing data within reasonable time and memory limits in personal computing environments. Availability and implementation The R package is available at https://bioconductor.org/packages/gpart. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

sgDI-tector: defective interfering viral genome bioinformatics for detection of coronavirus subgenomic RNAs

10.1101/2021.11.30.470527 ◽

2021 ◽

Author(s):

Andrea Di Gioacchino ◽

Rachel Legendre ◽

Yannis Rahou ◽

Valerie Najburg ◽

Pierre Charneau ◽

...

Keyword(s):

Regulatory Sequences ◽

Accessory Proteins ◽

Viral Genomes ◽

Bioinformatic Tools ◽

Subgenomic Rnas ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Initial Knowledge ◽

Generation Sequencing

Coronavirus RNA-dependent RNA polymerases produce subgenomic RNAs (sgRNAs) that encode viral structural and accessory proteins. User-friendly bioinformatic tools to detect and quantify sgRNA production are urgently needed to study the growing number of next-generation sequencing (NGS) data of SARS-CoV-2. We introduced sgDI-tector to identify and quantify sgRNA in SARS-CoV-2 NGS data. sgDI-tector allowed detection of sgRNA without initial knowledge of the transcription-regulatory sequences. We produced NGS data and successfully detected the nested set of sgRNAs with the ranking M>ORF3a>N>ORF6>ORF7a>ORF8>S>E>ORF7b. We also compared the level of sgRNA production with other types of viral RNA products such as defective interfering viral genomes.

Download Full-text

SequencEnG: an Interactive Knowledge Base of Sequencing Techniques

10.1101/319079 ◽

2018 ◽

Author(s):

Yi Zhang ◽

Mohith Manjunath ◽

Yeonsung Kim ◽

Joerg Heintz ◽

Jun S. Song

Keyword(s):

Knowledge Base ◽

Educational Resource ◽

Rapid Progress ◽

Structured Knowledge ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Generation Sequencing ◽

Acute Challenge

AbstractNext-generation sequencing (NGS) techniques are revolutionizing biomedical research by providing powerful methods for generating genomic and epigenomic profiles. The rapid progress is posing an acute challenge to students and researchers to stay acquainted with the numerous available methods. We have developed an interactive online educational resource called SequencEnG (acronym for Sequencing Techniques Engine for Genomics) to provide a tree-structured knowledge base of 66 different sequencing techniques and step-by-step NGS data analysis pipelines comparing popular tools. SequencEnG is designed to facilitate barrier-free learning of current NGS techniques and provides a user-friendly interface for searching through experimental and analysis methods. SequencEnG is part of the project KnowEnG (Knowledge Engine for Genomics) and is freely available at http://education.knoweng.org/sequenceng/.

Download Full-text

Easymap: A User-Friendly Software Package for Rapid Mapping-by-Sequencing of Point Mutations and Large Insertions

Frontiers in Plant Science ◽

10.3389/fpls.2021.655286 ◽

2021 ◽

Vol 12 ◽

Author(s):

Samuel Daniel Lup ◽

David Wilson-Sánchez ◽

Sergio Andreu-Sánchez ◽

José Luis Micol

Keyword(s):

Point Mutations ◽

Rapid Identification ◽

Graphical Interface ◽

Rna Seq ◽

Source Program ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Mapping By Sequencing ◽

Generation Sequencing

Mapping-by-sequencing strategies combine next-generation sequencing (NGS) with classical linkage analysis, allowing rapid identification of the causal mutations of the phenotypes exhibited by mutants isolated in a genetic screen. Computer programs that analyze NGS data obtained from a mapping population of individuals derived from a mutant of interest to identify a causal mutation are available; however, the installation and usage of such programs requires bioinformatic skills, modifying or combining pieces of existing software, or purchasing licenses. To ease this process, we developed Easymap, an open-source program that simplifies the data analysis workflows from raw NGS reads to candidate mutations. Easymap can perform bulked segregant mapping of point mutations induced by ethyl methanesulfonate (EMS) with DNA-seq or RNA-seq datasets, as well as tagged-sequence mapping for large insertions, such as transposons or T-DNAs. The mapping analyses implemented in Easymap have been validated with experimental and simulated datasets from different plant and animal model species. Easymap was designed to be accessible to all users regardless of their bioinformatics skills by implementing a user-friendly graphical interface, a simple universal installation script, and detailed mapping reports, including informative images and complementary data for assessment of the mapping results. Easymap is available at http://genetics.edu.umh.es/resources/easymap; its Quickstart Installation Guide details the recommended procedure for installation.

Download Full-text

Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty

Psychological Science ◽

10.1177/0956797617723724 ◽

2017 ◽

Vol 28 (11) ◽

pp. 1547-1562 ◽

Cited By ~ 84

Author(s):

Samantha F. Anderson ◽

Ken Kelley ◽

Scott E. Maxwell

Keyword(s):

Sample Size ◽

Publication Bias ◽

Effect Size ◽

Statistical Power ◽

Web Applications ◽

R Package ◽

Effect Sizes ◽

Alternative Approach ◽

Sample Size Planning ◽

User Friendly

The sample size necessary to obtain a desired level of statistical power depends in part on the population value of the effect size, which is, by definition, unknown. A common approach to sample-size planning uses the sample effect size from a prior study as an estimate of the population value of the effect to be detected in the future study. Although this strategy is intuitively appealing, effect-size estimates, taken at face value, are typically not accurate estimates of the population effect size because of publication bias and uncertainty. We show that the use of this approach often results in underpowered studies, sometimes to an alarming degree. We present an alternative approach that adjusts sample effect sizes for bias and uncertainty, and we demonstrate its effectiveness for several experimental designs. Furthermore, we discuss an open-source R package, BUCSS, and user-friendly Web applications that we have made available to researchers so that they can easily implement our suggested methods.

Download Full-text

Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa076 ◽

2020 ◽

Vol 2 (4) ◽

Cited By ~ 1

Author(s):

Thomas P Quinn ◽

Ionas Erb

Keyword(s):

Compositional Data ◽

R Package ◽

Data Driven ◽

Alternative Methods ◽

Compositional Data Analysis ◽

Relative Information ◽

Technical Factors ◽

User Friendly ◽

Log Ratio ◽

Generation Sequencing

Abstract Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.

Download Full-text

sgDI-tector: defective interfering viral genome bioinformatics for detection of coronavirus subgenomic RNAs

RNA ◽

10.1261/rna.078969.121 ◽

2021 ◽

pp. rna.078969.121

Author(s):

Andrea Di Gioacchino ◽

Rachel Legendre ◽

Yannis Rahou ◽

Valérie Najburg ◽

Pierre Charneau ◽

...

Keyword(s):

Regulatory Sequences ◽

Accessory Proteins ◽

Viral Genomes ◽

Bioinformatic Tools ◽

Subgenomic Rnas ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Initial Knowledge ◽

Generation Sequencing

Coronavirus RNA-dependent RNA polymerases produce subgenomic RNAs (sgRNAs) that encode viral structural and accessory proteins. User-friendly bioinformatic tools to detect and quantify sgRNA production are urgently needed to study the growing number of next-generation sequencing (NGS) data of SARS-CoV-2. We introduced sgDI-tector to identify and quantify sgRNA in SARS-CoV-2 NGS data. sgDI-tector allowed detection of sgRNA without initial knowledge of the transcription-regulatory sequences. We produced NGS data and successfully detected the nested set of sgRNAs with the ranking M>ORF3a>N>ORF6>ORF7a>ORF8>S>E>ORF7b. We also compared the level of sgRNA production with other types of viral RNA products such as defective interfering viral genomes.

Download Full-text