scholarly journals Aneuvis: Web-based exploration of numerical chromosomal variation in single cells

2018 ◽  
Author(s):  
Daniel G Piqué ◽  
Grasiella A Andriani ◽  
Elaine Maggi ◽  
Samuel E Zimmerman ◽  
John M Greally ◽  
...  

AbstractMotivationAberrations in chromosomal copy number are one of the most common molecular features observed in cancer. Quantifying the degree of numerical chromosomal variation in single cells across a population of cells is of interest to researchers studying whole chromosomal instability (W-CIN). W-CIN, a state of high numerical chromosomal variation, contributes to treatment resistance in cancer.ResultsHere, we introduce aneuvis, a web application that allows users to determine whether numerical chromosomal variation exists between experimental treatment groups. The web interface allows users to upload molecular cytogenetic or processed whole-genome sequencing data in a cell-by-chromosome matrix format and automatically generates visualizations and summary statistics that reflect the degree of numeric chromosomal variability. Aneuvis is the first user-friendly web application to help researchers identify the genetic and environmental perturbations that promote numerical chromosomal variation.Availability and ImplementationAneuvis is freely available as a web application at https://dpique.shinyapps.io/aneuvis/. Website implemented using Shiny version 1.0.5 with all major browsers supported. All source code for the application is available at https://github.com/dpique/aneuvis.

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11333
Author(s):  
Daniyar Karabayev ◽  
Askhat Molkenov ◽  
Kaiyrgali Yerulanuly ◽  
Ilyas Kabimoldayev ◽  
Asset Daniyarov ◽  
...  

Background High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).


2017 ◽  
Author(s):  
Philipp N. Spahn ◽  
Tyler Bath ◽  
Ryan J. Weiss ◽  
Jihoon Kim ◽  
Jeffrey D. Esko ◽  
...  

AbstractBackgroundLarge-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise.ResultsTo make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots.ConclusionsPinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions, documentation and test datasets. The source code is available at https://github.com/LewisLabUCSD/PinAPL-Py


2021 ◽  
Author(s):  
Thea G. Fennell ◽  
Grace A. Blackwell ◽  
Nicholas R. Thomson ◽  
Matthew J. Dorman

AbstractMembers of the bacterial genus Vibrio utilise chitin both as a metabolic substrate and a signal to activate natural competence. Vibrio cholerae is a bacterial enteric pathogen, sub-lineages of which can cause pandemic cholera. However, the chitin metabolic pathway in V. cholerae has been dissected using only a limited number of laboratory strains of this species. Here, we survey the complement of key chitin metabolism genes amongst 195 diverse V. cholerae. We show that the gene encoding GbpA, known to be an important colonisation and virulence factor in pandemic isolates, is not ubiquitous amongst V. cholerae. We also identify a putatively novel chitinase, and present experimental evidence in support of its functionality. Our data indicate that the chitin metabolic pathway within the V. cholerae species is more complex than previously thought, and emphasise the importance of considering genes and functions in the context of a species in its entirety, rather than simply relying on traditional reference strains.Impact statementIt is thought that the ability to metabolise chitin is ubiquitous amongst Vibrio spp., and that this enables these species to survive in aqueous and estuarine environmental contexts. Although chitin metabolism pathways have been detailed in several members of this genus, little is known about how these processes vary within a single Vibrio species. Here, we present the distribution of genes encoding key chitinase and chitin-binding proteins across diverse Vibrio cholerae, and show that our canonical understanding of this pathway in this species is challenged when isolates from non-pandemic V. cholerae lineages are considered alongside those linked to pandemics. Furthermore, we show that genes previously thought to be species core genes are not in fact ubiquitous, and we identify novel components of the chitin metabolic cascade in this species, and present functional validation for these observations.Data summaryThe authors confirm that all supporting data, code, and protocols have been provided within the article or through supplementary data files.No whole-genome sequencing data were generated in this study. Accession numbers for the publicly-available sequences used for these analyses are listed in Supplementary Table 1, Table 2, and the Methods.All other data which underpin the figures in this manuscript, including pangenome data matrices, modified and unmodified sequence alignments and phylogenetic trees, original images of gels and immunoblots, raw fluorescence data, amplicon sequencing reads, and the R code used to generate Figure 7, are available in Figshare: https://dx.doi.org/10.6084/m9.figshare.13169189(Note for peer-review: Figshare DOI is inactive but will be activated upon publication, please use temporary URL https://figshare.com/s/7795a2d80c13f694f8fa for review).


2018 ◽  
Author(s):  
Arda Soylev ◽  
Thong Le ◽  
Hajar Amini ◽  
Can Alkan ◽  
Fereydoun Hormozdiari

AbstractMotivationSeveral algorithms have been developed that use high throughput sequencing technology to characterize structural variations. Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions, and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions; and duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants.ResultsWe developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing data sets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real data sets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state of the art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (less than 5% for the top 50 predictions).AvailabilityTARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/[email protected] and [email protected]


2017 ◽  
Author(s):  
Ryan M. Moore ◽  
Amelia O. Harrison ◽  
Sean M. McAllister ◽  
Shawn W. Polson ◽  
K. Eric Wommack

ABSTRACTPhylogenetic trees are an important analytical tool for evaluating community diversity and evolutionary history. In the case of microorganisms, the decreasing cost of sequencing has enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of these types of datasets create complex trees that can be challenging to interpret. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience or complicated file formats for annotation. Iroki, a user-friendly web interface for tree visualization, addresses these issues by providing automatic customization of large trees based on metadata contained in tab-separated text files. Iroki’s utility for exploring biological and ecological trends in sequencing data was demonstrated through a variety of microbial ecology applications in which trees with hundreds to thousands of leaf nodes were customized according to extensive collections of metadata. The Iroki web application and documentation are available at https://www.iroki.net or through the VIROME portal (http://virome.dbi.udel.edu). Iroki’s source code is released under the MIT license and is available at https://github.com/mooreryan/iroki.


2015 ◽  
Vol 53 (8) ◽  
pp. 2402-2403 ◽  
Author(s):  
Claire Jenkins

The accessibility of whole-genome sequencing (WGS) presents the opportunity for national reference laboratories to provide a state-of-the-art public health surveillance service. The replacement of traditional serology-based typing ofEscherichia coliby WGS is supported by user-friendly, freely available data analysis Web tools. Anarticle in this issueof theJournal of Clinical Microbiology(K. G. Joensen, A. M. M. Tetzschner, A. Iguchi, F. M. Aarestrup, and F. Scheutz, J Clin Microbiol, 53:2410–2426, 2015,http://dx.doi.org/10.1128/JCM.00008-15) describes SerotypeFinder, an essential guide to serotypingE. coliin the 21st century.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8584 ◽  
Author(s):  
Ryan M. Moore ◽  
Amelia O. Harrison ◽  
Sean M. McAllister ◽  
Shawn W. Polson ◽  
K. Eric Wommack

Phylogenetic trees are an important analytical tool for evaluating community diversity and evolutionary history. In the case of microorganisms, the decreasing cost of sequencing has enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of these types of datasets create complex trees that can be challenging to interpret. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience or complicated file formats for annotation. Iroki, a user-friendly web interface for tree visualization, addresses these issues by providing automatic customization of large trees based on metadata contained in tab-separated text files. Iroki’s utility for exploring biological and ecological trends in sequencing data was demonstrated through a variety of microbial ecology applications in which trees with hundreds to thousands of leaf nodes were customized according to extensive collections of metadata. The Iroki web application and documentation are available at https://www.iroki.net or through the VIROME portal http://virome.dbi.udel.edu. Iroki’s source code is released under the MIT license and is available at https://github.com/mooreryan/iroki.


2017 ◽  
Author(s):  
Kemal Eren ◽  
Steven Weaver ◽  
Robert Ketteringham ◽  
Morné Valentyn ◽  
Melissa Laird Smith ◽  
...  

AbstractNext generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV-1 env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data.FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN /dS) across time and across protein structure, and a phylogenetic tree browser.We demonstrate how FLEA may be used to process Pacific Biosciences HIV-1 env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV-1 env populations.A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.


2021 ◽  
Vol 7 (6) ◽  
Author(s):  
Einar Gabbassov ◽  
Miguel Moreno-Molina ◽  
Iñaki Comas ◽  
Maxwell Libbrecht ◽  
Leonid Chindelevitch

The occurrence of multiple strains of a bacterial pathogen such as M. tuberculosis or C. difficile within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, a novel method for addressing these challenges. Grounded in a rigorous statistical model, SplitStrains not only demonstrates superior performance in proportion estimation to other existing methods on both simulated as well as real M. tuberculosis data, but also successfully determines the identity of the underlying strains. We conclude that SplitStrains is a powerful addition to the existing toolkit of analytical methods for data coming from bacterial pathogens and holds the promise of enabling previously inaccessible conclusions to be drawn in the realm of public health microbiology.


Sign in / Sign up

Export Citation Format

Share Document