Varstation: a complete and efficient tool to support NGS data analysis

AbstractSummaryVarstation is a cloud-based NGS data processor and analyzer for human genetic variation. This resource provides a customizable, centralized, safe and clinically validated environment aiming to improve and optimize the flow of NGS analyses and reports related with clinical and research genetics.Availability and implementationVarstation is freely available at http://varstation.com, for academic [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Haplotype-aware graph indexes

10.1101/559583 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jouni Sirén ◽

Erik Garrison ◽

Adam M. Novak ◽

Benedict Paten ◽

Richard Durbin

Keyword(s):

Genetic Variation ◽

Chromosome 17 ◽

Supplementary Information ◽

Whole Genome ◽

Supplementary Data ◽

1000 Genomes Project ◽

1000 Genomes ◽

Link Type ◽

Supplementary Material ◽

Haplotype Information

AbstractMotivationThe variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.ResultsWe augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.AvailabilityOur software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/[email protected] informationSupplementary data are available.

Download Full-text

P–582 High level of concordance between invasive and non-invasive preimplantation genetic testing for aneuploidies (niPGT-A) at day5 and day6–7

Human Reproduction ◽

10.1093/humrep/deab130.581 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

A Biricik ◽

V Bianchi ◽

F Lecciso ◽

M Surdo ◽

M Manno ◽

...

Keyword(s):

Data Analysis ◽

Embryo Culture ◽

Culture Media ◽

Embryo Biopsy ◽

Non Invasive ◽

Culture Time ◽

Ngs Data Analysis ◽

High Level ◽

Ngs Data

Abstract Study question To explore ploidy concordance between invasive and non-invasive PGTA (niPGT-A) at different embryo culture time. Summary answer High level (>84%) of concordance rate for ploidy and sex, sensitivity (>88%), and speciﬁcity (76%) were obtained for both day6/7 samples and day5 samples. What is known already The analysis of embryo cell free DNA (cfDNA) that are released into culture media during in vitro embryo development has the potential to evaluate embryo ploidy status. However, obtaining sufficient quality and quantity of cfDNA is essential to achieve interpretable results for niPGT-A. More culture time is expected to be directly proportional to the release of more cfDNA. But embryo culture time is limited due to in-vitro embryo survival potential. Therefore, it is important to estimate the duration of the culture that will provide the maximum cfDNA that can be obtained without adversely affecting the development of the embryo. Study design, size, duration A total of 105 spent culture media (SCM) from day5-day7 blastocyst stage embryos have been included in this cohort study. The cfDNA of SCM samples were amplified and analyzed for niPGT-A by NGS analysis. The SCM samples were divided into 2 subgroups according the embryo culture hours (Day5 and Day6/7 group). The DNA concentration, informativity and euploidy results have then been compared with their corresponding embryos after trophectoderm biopsy (TE) and PGT-A analysis by NGS Participants/materials, setting, methods Embryos cultured until Day3 washed and cultured again in 20µl fresh culture media until embryo biopsy on Day5, 6, or 7. After biopsy SCM samples were immediately collected in PCR tubes and conserved at –20 °C until whole genome amplification by MALBAC® (Yicon Genomics). The TE and SCM samples were analyzed by next-generation sequencing (NGS) using Illumina MiSeq® System. NGS data analysis has been done by Bluefuse Multi Software 4.5 (Illumina) for SCM and TE samples Main results and the role of chance Only the SCM samples which have an embryo with a conclusive result were included in this cohort (n = 105). Overall 97.1% (102/105) of SCM samples gave a successful DNA amplification with a concentration ranging 32.4–128.5ng/µl. Non-informative (NI) results including a chaotic profile (>5 chromosome aneuploidies) were observed in 17 samples, so 83.3%(85/102) of SCM samples were informative for NGS data analysis. Ploidy concordance rate with the corresponding TE biopsies (euploid vs euploid, aneuploid vs aneuploid) was 84.7% (72/85). Sensitivity and speciﬁcity were 92,8% and 76,7%, respectively with no signiﬁcant difference for all parameters for day 6/7 samples compared with day 5 samples. The false-negative rate was 3.5% (3/85), and false-positive rate was 11.7% (10/85). Limitations, reasons for caution The sample size is relatively small. Larger prospective studies are needed. As this is a single-center study, the impact of the variations in embryo culture conditions can be underestimated. Maternal DNA contamination risk cannot be revealed in SCM, therefore the use of molecular markers would increase the reliability. Wider implications of the findings: Non-invasive analysis of embryo cfDNA analyzed in spent culture media demonstrates high concordance with TE biopsy results in both early and late culture time. A non-invasive approach for aneuploidy screening offers important advantages such as avoiding invasive embryo biopsy and decreased cost, potentially increasing accessibility for a wider patient population. Trial registration number Not applicable

Download Full-text

Appendix A: Common File Types Used in Next-Generation Sequencing (NGS) Data Analysis

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-20 ◽

2016 ◽

pp. 199-202

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Next Generation ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text

Bio-Docklets: Virtualization Containers for Single-Step Execution of NGS Pipelines

10.1101/116962 ◽

2017 ◽

Cited By ~ 3

Author(s):

Baekdoo Kim ◽

Thahmina Ali ◽

Carlos Lijeron ◽

Enis Afgan ◽

Konstantinos Krampis

Keyword(s):

Data Analysis ◽

Cloud Service ◽

Application Programming Interface ◽

Single Step ◽

Easy Access ◽

Complex Data ◽

The Galaxy ◽

Ngs Data Analysis ◽

Single Data ◽

Ngs Data

ABSTRACTBackgroundProcessing of Next-Generation Sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized post-analysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers, towards seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform.FindingsWe present an approach for abstracting the complex data operations of multi-step, bioinformatics pipelines for NGS data analysis. As examples, we have deployed two pipelines for RNAseq and CHIPseq, pre-configured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines is as simple as running a single bioinformatics tool. This is achieved through a “meta-script” that automatically starts the Bio-Docklets, and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface (API). The pipelne output is post-processed using the Visual Omics Explorer (VOE) framework, providing interactive data visualizations that users can access through a web browser.ConclusionsThe goal of our approach is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts, on any computing environment whether a laboratory workstation, university computer cluster, or a cloud service provider,. Besides end-users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets.

Download Full-text

FastqCleaner: an interactive Bioconductor application for quality-control, filtering and trimming of FASTQ files

10.1101/393140 ◽

2018 ◽

Cited By ~ 1

Author(s):

Leandro Gabriel Roser ◽

Fernán Agüero ◽

Daniel Oscar Sánchez

Keyword(s):

Quality Control ◽

Data Analysis ◽

The Novel ◽

Web Environment ◽

Ngs Data Analysis ◽

Novel Concept ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Analysis Platform

AbstractBackgroundExploration and processing of FASTQ files are the first steps in state-of-the-art data analysis workflows of Next Generation Sequencing (NGS) platforms. The large amount of data generated by these technologies has put a challenge in terms of rapid analysis and visualization of sequencing information. Recent integration of the R data analysis platform with web visual frameworks has stimulated the development of user-friendly, powerful, and dynamic NGS data analysis applications.ResultsThis paper presents FastqCleaner, a Bioconductor visual application for both quality-control (QC) and pre-processing of FASTQ files. The interface shows diagnostic information for the input and output data and allows to select a series of filtering and trimming operations in an interactive framework. FastqCleaner combines the technology of Bioconductor for NGS data analysis with the data visualization advantages of a web environment.ConclusionsFastqCleaner is an user-friendly, offline-capable tool that enables access to advanced Bioconductor infrastructure. The novel concept of a Bioconductor interactive application that can be used without the need for programming skills, makes FastqCleaner a valuable resource for NGS data analysis.

Download Full-text

miND pipeline AWS EC2 installation and setup v2

10.17504/protocols.io.b3f6qjre ◽

2022 ◽

Author(s):

Andreas B Diendorfer ◽

Kseniya.Khamina not provided ◽

marianne.pultar not provided

Keyword(s):

Data Analysis ◽

Public Repository ◽

Sequencing Data ◽

Analysis Pipeline ◽

Ngs Data Analysis ◽

Ngs Data ◽

Data Analysis Pipeline

miND is a NGS data analysis pipeline for smallRNA sequencing data. In this protocol, the pipeline is setup and run on an AWS EC2 instance with example data from a public repository. Please see the publication paper on F1000 for more details on the pipeline and how to use it.

Download Full-text

MetumpX—a metabolomics support package for untargeted mass spectrometry

Bioinformatics ◽

10.1093/bioinformatics/btz765 ◽

2019 ◽

Vol 36 (5) ◽

pp. 1647-1648 ◽

Cited By ~ 1

Author(s):

Bilal Wajid ◽

Hasan Iqbal ◽

Momina Jamil ◽

Hafsa Rafique ◽

Faria Anwar

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Small Molecules ◽

Software Package ◽

Life Sciences ◽

Supplementary Information ◽

Supplementary Data ◽

Software Packages ◽

Develop Software ◽

User Friendly

Abstract Motivation Metabolomics is a data analysis and interpretation field aiming to study functions of small molecules within the organism. Consequently Metabolomics requires researchers in life sciences to be comfortable in downloading, installing and scripting of software that are mostly not user friendly and lack basic GUIs. As the researchers struggle with these skills, there is a dire need to develop software packages that can automatically install software pipelines truly speeding up the learning curve to build software workstations. Therefore, this paper aims to provide MetumpX, a software package that eases in the installation of 103 software by automatically resolving their individual dependencies and also allowing the users to choose which software works best for them. Results MetumpX is a Ubuntu-based software package that facilitate easy download and installation of 103 tools spread across the standard metabolomics pipeline. As far as the authors know MetumpX is the only solution of its kind where the focus lies on automating development of software workstations. Availability and implementation https://github.com/hasaniqbal777/MetumpX-bin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Fuzzy Indication of Reliability in Metagenomics NGS Data Analysis

Procedia Computer Science ◽

10.1016/j.procs.2015.05.448 ◽

2015 ◽

Vol 51 ◽

pp. 2859-2863 ◽

Cited By ~ 1

Author(s):

Milko Krachunov ◽

Dimitar Vassilev ◽

Maria Nisheva ◽

Ognyan Kulev ◽

Valeriya Simeonova ◽

...

Keyword(s):

Data Analysis ◽

Ngs Data Analysis ◽

Ngs Data

Download Full-text

Crosslink: A fast, scriptable genetic mapper for outcrossing species

10.1101/135277 ◽

2017 ◽

Cited By ~ 6

Author(s):

Robert J. Vickerstaff ◽

Richard J. Harrison

Keyword(s):

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Mapping Software ◽

Outcrossing Species ◽

Supplementary Material ◽

Novel Approaches ◽

Similar Accuracy ◽

General Public License

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.

Download Full-text

pyseer: a comprehensive tool for microbial pangenome-wide association studies

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Genome Wide

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Download Full-text