BSAseq: an interactive and integrated web-based workflow for identification of causal mutations in bulked F2 populations

Abstract Summary With the advance of next-generation sequencing technologies and reductions in the costs of these techniques, bulked segregant analysis (BSA) has become not only a powerful tool for mapping quantitative trait loci but also a useful way to identify causal gene mutations underlying phenotypes of interest. However, due to the presence of background mutations and errors in sequencing, genotyping, and reference assembly, it is often difficult to distinguish true causal mutations from background mutations. In this study, we developed the BSAseq workflow, which includes an automated bioinformatics analysis pipeline with a probabilistic model for estimating the linked region (the region linked to the causal mutation) and an interactive Shiny web application for visualizing the results. We deeply sequenced a sorghum male-sterile parental line (ms8) to capture the majority of background mutations in our bulked F2 data. We applied the workflow to 11 bulked sorghum F2 populations and 1 rice F2 population and identified the true causal mutation in each population. The workflow is intuitive and straightforward, facilitating its adoption by users without bioinformatics analysis skills. We anticipate that the BSAseq workflow will be broadly applicable to the identification of causal mutations for many phenotypes of interest. Availability and implementation BSAseq is freely available on https://www.sciapps.org/page/bsa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BSAseq: an interactive and integrated web-based workflow for identification of causal mutations in bulked F2 populations

10.1101/2020.04.08.029801 ◽

2020 ◽

Author(s):

Liya Wang ◽

Zhenyuan Lu ◽

Michael Regulski ◽

Yinping Jiao ◽

Junping Chen ◽

...

Keyword(s):

Web Application ◽

Bioinformatics Analysis ◽

Bulked Segregant Analysis ◽

Gene Mutations ◽

Parental Line ◽

Male Sterile ◽

Analysis Pipeline ◽

Web Based ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

AbstractSummaryWith the advance of next-generation sequencing (NGS) technologies and reductions in the costs of these techniques, bulked segregant analysis (BSA) has become not only a powerful tool for mapping quantitative trait loci (QTL) but also a useful way to identify causal gene mutations underlying phenotypes of interest. However, due to the presence of background mutations and errors in sequencing, genotyping, and reference assembly, it is often difficult to distinguish true causal mutations from background mutations. In this study, we developed the BSAseq workflow, which includes an automated bioinformatics analysis pipeline with a probabilistic model for estimating the segregation region and an interactive Shiny web application for visualizing the results. We deeply sequenced a male sterile parental line (ms8) to capture the majority of background mutations in our bulked F2 data. We applied the workflow to 11 bulked F2 populations and identified the true causal mutation in each population. The workflow is intuitive and straightforward, facilitating its adoption by users without bioinformatics analysis skills. We anticipate that BSAseq will be broadly applicable to the identification of causal mutations for many phenotypes of interest.AvailabilityBSAseq is freely available on https://www.sciapps.org/page/[email protected], [email protected], [email protected]

Download Full-text

PlanExp: intuitive integration of complex RNA-seq datasets with planarian omics resources

Bioinformatics ◽

10.1093/bioinformatics/btz802 ◽

2019 ◽

Cited By ~ 3

Author(s):

S Castillo-Lara ◽

E Pascual-Carreras ◽

J F Abril

Keyword(s):

Single Cell ◽

Web Application ◽

Transcript Level ◽

Supplementary Information ◽

Expression Data ◽

Rna Seq ◽

Making Sense ◽

Sequencing Technologies ◽

Planarian Regeneration ◽

Current Sequence

Abstract Motivation There is an increasing amount of transcriptomic and genomic data available for planarians with the advent of both traditional and single-cell RNA sequencing technologies. Therefore, exploring, visualizing and making sense of all these data in order to understand planarian regeneration and development can be challenging. Results In this work, we present PlanExp, a web-application to explore and visualize gene expression data from different RNA-seq experiments (both traditional and single-cell RNA-seq) for the planaria Schmidtea mediterranea. PlanExp provides tools for creating different interactive plots, such as heatmaps, scatterplots, etc. and links them with the current sequence annotations both at the genome and the transcript level thanks to its integration with the PlanNET web application. PlanExp also provides a full gene/protein network editor, a prediction of genetic interactions from single-cell RNA-seq data, and a network expression mapper that will help researchers to close the gap between systems biology and planarian regeneration. Availability and implementation PlanExp is freely available at https://compgen.bio.ub.edu/PlanNET/planexp. The source code is available at https://compgen.bio.ub.edu/PlanNET/downloads. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A single nucleotide polymorphism in an R2R3 MYB transcription factor gene triggers the male sterility in soybean ms6 (Ames1)

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03920-0 ◽

2021 ◽

Author(s):

Junping Yu ◽

Guolong Zhao ◽

Wei Li ◽

Ying Zhang ◽

Peng Wang ◽

...

Keyword(s):

Transcription Factor ◽

Glycine Max ◽

Male Sterility ◽

Bulked Segregant Analysis ◽

Soybean Protein ◽

Anther Development ◽

Hybrid Breeding ◽

Myb Transcription Factor ◽

Male Sterile ◽

R2r3 Myb

Abstract Key message Identification and functional analysis of the male sterile gene MS6 in Glycine max. Abstract Soybean (Glycine max (L.) Merr.) is an important crop providing vegetable oil and protein. The male sterility-based hybrid breeding is a promising method for improving soybean yield to meet the globally growing demand. In this research, we identified a soybean genic male sterile locus, MS6, by combining the bulked segregant analysis sequencing method and the map-based cloning technology. MS6, highly expressed in anther, encodes an R2R3 MYB transcription factor (GmTDF1-1) that is homologous to Tapetal Development and Function 1, a key factor for anther development in Arabidopsis and rice. In male sterile ms6 (Ames1), the mutant allele contains a missense mutation, leading to the 76th leucine substituted by histidine in the DNA binding domain of GmTDF1-1. The expression of soybean MS6 under the control of the AtTDF1 promoter could rescue the male sterility of attdf1 but ms6 could not. Additionally, ms6 overexpression in wild-type Arabidopsis did not affect anther development. These results evidence that GmTDF1-1 is a functional TDF1 homolog and L76H disrupts its function. Notably, GmTDF1-1 shows 92% sequence identity with another soybean protein termed as GmTDF1-2, whose active expression also restored the fertility of attdf1. However, GmTDF1-2 is constitutively expressed at a very low level in soybean, and therefore, not able to compensate for the MS6 deficiency. Analysis of the TDF1-involved anther development regulatory pathway showed that expressions of the genes downstream of TDF1 are significantly suppressed in ms6, unveiling that GmTDF1-1 is a core transcription factor regulating soybean anther development.

Download Full-text

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

Bioinformatics ◽

10.1093/bioinformatics/btaa179 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3669-3679 ◽

Cited By ~ 3

Author(s):

Can Firtina ◽

Jeremie S Kim ◽

Mohammed Alser ◽

Damla Senol Cali ◽

A Ercument Cicek ◽

...

Keyword(s):

Genome Analysis ◽

Supplementary Information ◽

Third Generation ◽

Sequencing Technology ◽

Base Pairs ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Generation Sequencing ◽

Large Genomes

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SurfaceGenie: a web-based application for prioritizing cell-type-specific marker candidates

Bioinformatics ◽

10.1093/bioinformatics/btaa092 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3447-3456 ◽

Cited By ~ 2

Author(s):

Matthew Waas ◽

Shana T Snarrenberg ◽

Jack Littrell ◽

Rachel A Jones Lipinski ◽

Polly A Hansen ◽

...

Keyword(s):

Cell Surface ◽

Specific Surface ◽

Web Application ◽

Rank Order ◽

Surface Proteins ◽

Supplementary Information ◽

Live Cells ◽

Specific Marker ◽

Cell Type ◽

Cell Type Specific

Abstract Motivation Cell-type-specific surface proteins can be exploited as valuable markers for a range of applications including immunophenotyping live cells, targeted drug delivery and in vivo imaging. Despite their utility and relevance, the unique combination of molecules present at the cell surface are not yet described for most cell types. A significant challenge in analyzing ‘omic’ discovery datasets is the selection of candidate markers that are most applicable for downstream applications. Results Here, we developed GenieScore, a prioritization metric that integrates a consensus-based prediction of cell surface localization with user-input data to rank-order candidate cell-type-specific surface markers. In this report, we demonstrate the utility of GenieScore for analyzing human and rodent data from proteomic and transcriptomic experiments in the areas of cancer, stem cell and islet biology. We also demonstrate that permutations of GenieScore, termed IsoGenieScore and OmniGenieScore, can efficiently prioritize co-expressed and intracellular cell-type-specific markers, respectively. Availability and implementation Calculation of GenieScores and lookup of SPC scores is made freely accessible via the SurfaceGenie web application: www.cellsurfer.net/surfacegenie. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MUM&Co: accurate detection of all SV types through whole-genome alignment

Bioinformatics ◽

10.1093/bioinformatics/btaa115 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3242-3243 ◽

Cited By ~ 2

Author(s):

Samuel O’Donnell ◽

Gilles Fischer

Keyword(s):

De Novo ◽

Supplementary Information ◽

Genome Alignment ◽

Whole Genome ◽

Structural Variations ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Human Genomes ◽

Whole Genome Alignment ◽

Primary Output

Abstract Summary MUM&Co is a single bash script to detect structural variations (SVs) utilizing whole-genome alignment (WGA). Using MUMmer’s nucmer alignment, MUM&Co can detect insertions, deletions, tandem duplications, inversions and translocations greater than 50 bp. Its versatility depends upon the WGA and therefore benefits from contiguous de-novo assemblies generated by third generation sequencing technologies. Benchmarked against five WGA SV-calling tools, MUM&Co outperforms all tools on simulated SVs in yeast, plant and human genomes and performs similarly in two real human datasets. Additionally, MUM&Co is particularly unique in its ability to find inversions in both simulated and real datasets. Lastly, MUM&Co’s primary output is an intuitive tabulated file containing a list of SVs with only necessary genomic details. Availability and implementation https://github.com/SAMtoBAM/MUMandCo. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

Identification of candidate genes controlling chilling tolerance of rice in the cold region at the booting stage by BSA-Seq and RNA-Seq

Royal Society Open Science ◽

10.1098/rsos.201081 ◽

2020 ◽

Vol 7 (11) ◽

pp. 201081

Author(s):

Zhenhua Guo ◽

Lijun Cai ◽

Zhiqiang Chen ◽

Ruiying Wang ◽

Lanming Zhang ◽

...

Keyword(s):

Candidate Genes ◽

Bulked Segregant Analysis ◽

Expression Patterns ◽

Enrichment Analysis ◽

Chilling Tolerance ◽

Pathway Enrichment Analysis ◽

Potential Candidate ◽

Rice Varieties ◽

Booting Stage ◽

F2 Population

Rice is sensitive to low temperatures, specifically at the booting stage. Chilling tolerance of rice is a quantitative trait loci that is governed by multiple genes, and thus, its precise identification through the conventional methods is an arduous task. In this study, we investigated the candidate genes related to chilling tolerance at the booting stage of rice. The F2 population was derived from Longjing25 (chilling-tolerant) and Longjing11 (chilling-sensitive) cross. Two bulked segregant analysis pools were constructed. A 0.82 Mb region containing 98 annotated genes on chromosomes 6 and 9 was recognized as the candidate region associated with chilling tolerance of rice at the booting stage. Transcriptomic analysis of Longjing25 and Longjing11 revealed 50 differentially expressed genes (DEGs) on the candidate intervals. KEGG pathway enrichment analysis of DEGs was performed. Nine pathways were found to be enriched, which contained 10 DEGs. A total of four genes had different expression patterns or levels between Longjing25 and Longjing11. Four out of the 10 DEGs were considered as potential candidate genes for chilling tolerance. This study will assist in the cloning of the candidate genes responsible for chilling tolerance and molecular breeding of rice for the development of chilling-tolerant rice varieties.

Download Full-text

SVIM-asm: Structural variant detection from haploid and diploid genome assemblies

10.1101/2020.10.27.356907 ◽

2020 ◽

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Genetic Information ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Diploid Genome ◽

Insertions And Deletions ◽

Structural Variant ◽

Sequencing Technologies ◽

Variant Detection ◽

Genome Assemblies

AbstractMotivationWith the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes.ResultsWe introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual.Availability and ImplementationSVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/[email protected] informationSupplementary data are available online.

Download Full-text

Mutations in NOTCH3 Gene may Promote the Clinical Presentation of Spinocerebellar Ataxia Type 37 Caused by Mutations in DAB1 Gene

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.668312 ◽

2021 ◽

Vol 8 ◽

Author(s):

Zhao-Wei Wang ◽

Li-Ping Wang ◽

Ye Du ◽

Qi Liu

Keyword(s):

Spinocerebellar Ataxia ◽

Sanger Sequencing ◽

Autosomal Dominant ◽

Bioinformatics Analysis ◽

Clinical Presentation ◽

Cerebellar Atrophy ◽

Gene Mutations ◽

Brain Magnetic Resonance Imaging ◽

Diagnostic Methods ◽

Spinocerebellar Ataxia Type

Background: Autosomal dominant spinocerebellar ataxia type 37 (SCA37) and Cerebral autosomal dominant arteriopathy with subcortical infarct and leukoencephalopathy (CADASIL) result from DAB1 and NOTCH3 gene mutations, respectively.Methods: In addition to conventional diagnostic methods, next-generation sequencing (NGS) and Sanger sequencing were performed to define and confirm the DAB1 and NOTCH3 gene mutation for a Chinese pedigree. Bioinformatics analysis was also applied for the mutated DAB1 and NOTCH3 protein using available software tools.Results: Brain magnetic resonance imaging shows diffuse leukoencephalopathy and cerebellar atrophy in the proband. NGS and Sanger sequencing identified two novel heterozygous mutations: NM_021080:c.318T > G (p.H106Q) in the DAB1 gene and NM_000435:c.3298C > T (p.R1100C) in the NOTCH3 gene. Bioinformatics analysis suggested that the DAB1 and NOTCH3 gene mutations are disease-causing and may be responsible for the phenotypes.Conclusion: This is the first report of a pedigree with both SAC37 and CADASIL phenotypes carrying corresponding gene mutations. Mutations in the NOTCH3 gene may promote the clinical presentation of spinocerebellar ataxia type 37 caused by mutations in the DAB1 gene. In addition to general examinations, it is vital for physicians to apply molecular genetics to get an accurate diagnosis in the clinic, especially for rare diseases.

Download Full-text