bioinformatic pipeline Latest Research Papers

We established a protocol for fast, cost efficient Sars-CoV-2 sequencing with little as possible hands-on time (around 3h in total, excluding RNA extraction). The whole Sequencing can be done in one working day, including the bioinformatic pipeline. The cost per sample accumulates at around 40$, with already isolated RNA. We adapted and simplified existing workflows using the ‘midnight’ 1,200 bp amplicon split primer sets for PCR, which produce tiled overlapping amplicons covering almost all of the SARS-CoV-2 genome. Subsequently, we applied the Oxford Nanopore Rapid barcoding protocol and the portable MinION Mk1C sequencer in combination with the ARTIC bioinformatics pipeline. We tested the simplified and less time-consuming workflow on confirmed SARS-CoV-2-positive specimens from clinical routine and identified pre-analytical parameters, which may help to decrease the rate of sequencing failures. Duration of the complete pipeline was approx. 7 hrs for one specimen and approx. 11 hrs for 12 multiplexed barcoded specimens. This protocol is a modified version of Nikki Freed and Olin Silanders protocol. To get information such as Primers, visit their protocol. Nikki Freed, Olin Silander 2020. nCoV-2019 sequencing protocol (RAPID barcoding, 1200bp amplicon).doi: 10.1093/biomethods/bpaa014 Our peer-reviewed paper is available here: https://www.mdpi.com/2076-2607/9/12/2598

Download Full-text

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data

South African Computer Journal ◽

10.18489/sacj.v33i2.830 ◽

2021 ◽

Vol 33 (2) ◽

Author(s):

Phelelani Mpangase ◽

Jacqueline Frost ◽

Mohammed Tikly ◽

Michèle Ramsay ◽

Scott Hazelhurst

Keyword(s):

Differential Expression Analysis ◽

Workflow Management ◽

Rna Seq ◽

Bioinformatic Pipeline ◽

Data Analyses ◽

Sequence Production ◽

Differential Gene ◽

Next Generation Sequencing Ngs ◽

Computational Analyses ◽

Improved Technology

The rate of raw sequence production through Next-Generation Sequencing (NGS) has been growing exponentially due to improved technology and reduced costs. This has enabled researchers to answer many biological questions through ``multi-omics'' data analyses. Even though such data promises new insights into how biological systems function and understanding disease mechanisms, computational analyses performed on such large datasets comes with its challenges and potential pitfalls. The aim of this study was to develop a robust portable and reproducible bioinformatic pipeline for the automation of RNA sequencing (RNA-seq) data analyses. Using Nextflow as a workflow management system and Singularity for application containerisation, the nf-rnaSeqCount pipeline was developed for mapping raw RNA-seq reads to a reference genome and quantifying abundance of identified genomic features for differential gene expression analyses. The pipeline provides a quick and efficient way to obtain a matrix of read counts that can be used black with tools such as DESeq2 and edgeR for differential expression analysis. Robust and flexible bioinformatic and computational pipelines for RNA-seq data analysis, from QC to sequence alignment and comparative analyses, will reduce analysis time, and increase accuracy and reproducibility of findings to promote transcriptome research.

Download Full-text

SNPfiltR: an R package for interactive and reproducible SNP filtering

10.22541/au.163976415.53888836/v1 ◽

2021 ◽

Author(s):

Devon DeRaad

Keyword(s):

State Of The Art ◽

R Package ◽

Working Environment ◽

The Novel ◽

Bioinformatic Pipeline ◽

Reduced Representation ◽

Performance Benchmarking ◽

R Programming Language ◽

Current State ◽

R Programming

Here I describe the novel R package SNPfiltR and demonstrate its functionalities as the backbone of a customizable, reproducible SNP filtering pipeline implemented exclusively via the widely adopted R programming language. SNPfiltR extends existing SNP filtering functionalities by automating the visualization of key parameters such as depth, quality, and missing data, then allowing users to set filters based on optimized thresholds, all within a single, cohesive working environment. All SNPfiltR functions require a vcfR object as input, which can be easily generated by reading a SNP dataset stored as a standard vcf file into an R working environment using the function read.vcfR() from the R package vcfR. Performance benchmarking reveals that for moderately sized SNP datasets (up to 50M genotypes with associated quality information), SNPfiltR performs filtering with comparable efficiency to current state of the art command-line-based programs. These benchmarking results indicate that for most reduced-representation genomic datasets, SNPfiltR is an ideal choice for investigating, visualizing, and filtering SNPs as part of a cohesive and easily documentable bioinformatic pipeline. The SNPfiltR package can be downloaded from CRAN with the command [install.packages(“SNPfiltR”)], and a development version is available from GitHub at: (github.com/DevonDeRaad/SNPfiltR). Additionally, thorough documentation for SNPfiltR, including multiple comprehensive vignettes, is available at the website: (devonderaad.github.io/SNPfiltR/).

Download Full-text

ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies

10.1101/2021.12.09.471941 ◽

2021 ◽

Author(s):

Ilya Plyusnin ◽

Phuoc Thien Truong Nguyen ◽

Tarja Sironen ◽

Olli Vapalahti ◽

Teemu Smura ◽

...

Keyword(s):

Variant Calling ◽

Etiologic Agent ◽

Tree Reconstruction ◽

Bioinformatic Pipeline ◽

Transmission Chain ◽

Main Emphasis ◽

Depth Analysis ◽

Public Health Challenge ◽

High Level ◽

Scalable Analysis

Summary: SARS-CoV-2 is the highly transmissible etiologic agent of coronavirus disease 2019 (COVID-19) and has become a global scientific and public health challenge since December 2019. Several new variants of SARS-CoV-2 have emerged globally raising concern about prevention and treatment of COVID-19. Early detection and in depth analysis of the emerging variants allowing pre-emptive alert and mitigation efforts are thus of paramount importance. Here we present ClusTRace, a novel bioinformatic pipeline for a fast and scalable analysis of sequence clusters or clades in large viral phylogenies. ClusTRace offers several high level functionalities including outlier filtering, aligning, phylogenetic tree reconstruction, cluster or clade extraction, variant calling, visualization and reporting. ClusTRace was developed as an aid for COVID-19 transmission chain tracing in Finland and the main emphasis has been on fast and unsupervised screening of phylogenies for markers of super-spreading events and other features of concern, such as high rates of cluster growth and/or accumulation of novel mutations. Availability: All code is freely available from https://bitbucket.org/plyusnin/clustrace/

Download Full-text

Tailer: A Pipeline for Sequencing-Based Analysis of Non-Polyadenylated RNA 3’ End Processing

10.1101/2021.12.06.471174 ◽

2021 ◽

Author(s):

Tim Nicholson-Shaw ◽

Jens Lykke-Andersen

Keyword(s):

Quality Control ◽

Genome Annotation ◽

Data Exploration ◽

Bioinformatic Pipeline ◽

Bioinformatic Tools ◽

Polyadenylated Rna ◽

Rna Targets ◽

Non Coding Rnas ◽

Polyadenylated Rnas ◽

Generation Sequencing

AbstractPost-transcriptional trimming and tailing of RNA 3’ ends play key roles in the processing and quality control of non-coding RNAs (ncRNAs). However, bioinformatic tools to examine changes in the RNA 3’ “tailome” are sparse and not standardized. Here we present Tailer, a bioinformatic pipeline in two parts that allows for robust quantification and analysis of tail information from next generation sequencing experiments that preserve RNA 3’ end information. The first part of Tailer, Tailer-Processing, uses genome annotation or reference FASTA gene sequences to quantify RNA 3’ ends from SAM-formatted alignment files or FASTQ sequence read files produced from sequencing experiments. The second part, Tailer-Analysis, uses the output of Tailer-Processing to identify statistically significant RNA targets of trimming and tailing and create graphs for data exploration. We apply Tailer to RNA 3’ end sequencing experiments from three published studies and find that it accurately and reproducibly recapitulates key findings. Thus, Tailer should be a useful and easily accessible tool to globally investigate tailing dynamics of non-polyadenylated RNAs and conditions that perturb them.

Download Full-text

A Bioinformatic Workflow for InDel Analysis in the Wheat Multi-Copy α-Gliadin Gene Family Engineered with CRISPR/Cas9

International Journal of Molecular Sciences ◽

10.3390/ijms222313076 ◽

2021 ◽

Vol 22 (23) ◽

pp. 13076

Author(s):

María H. Guzmán-López ◽

Miriam Marín-Sanz ◽

Susana Sánchez-León ◽

Francisco Barro

Keyword(s):

Gene Family ◽

Amplicon Sequencing ◽

Wild Type ◽

Bioinformatic Pipeline ◽

Cleavage Activity ◽

Large Deletions ◽

Wheat Sensitivity ◽

Wheat Lines ◽

Multiple Samples

The α-gliadins of wheat, along with other gluten components, are responsible for bread viscoelastic properties. However, they are also related to human pathologies as celiac disease or non-celiac wheat sensitivity. CRISPR/Cas was successfully used to knockout α-gliadin genes in bread and durum wheat, therefore, obtaining low gluten wheat lines. Nevertheless, the mutation analysis of these genes is complex as they present multiple and high homology copies arranged in tandem in A, B, and D subgenomes. In this work, we present a bioinformatic pipeline based on NGS amplicon sequencing for the analysis of insertions and deletions (InDels) in α-gliadin genes targeted with two single guides RNA (sgRNA). This approach allows the identification of mutated amplicons and the analysis of InDels through comparison to the most similar wild type parental sequence. TMM normalization was performed for inter-sample comparisons; being able to study the abundance of each InDel throughout generations and observe the effects of the segregation of Cas9 coding sequence in different lines. The usefulness of the workflow is relevant to identify possible genomic rearrangements such as large deletions due to Cas9 cleavage activity. This pipeline enables a fast characterization of mutations in multiple samples for a multi-copy gene family.

Download Full-text

PhyloHerb: A phylogenomic pipeline for processing genome skimming data for plants

10.1101/2021.11.29.470431 ◽

2021 ◽

Author(s):

Liming Cai ◽

Hongrui Zhang ◽

CHARLES C DAVIS

Keyword(s):

Genomic Dna ◽

High Throughput Sequencing ◽

Data Partitioning ◽

Published Data ◽

Herbarium Specimens ◽

Bioinformatic Pipeline ◽

Biodiversity Research ◽

Blast Search ◽

Genome Skimming ◽

Low Coverage

Premise of the study: The application of high throughput sequencing, especially to herbarium specimens, is greatly accelerating biodiversity research. Among various techniques, low coverage Illumina sequencing of total genomic DNA (genome skimming) can simultaneously recover the plastid, mitochondrial, and nuclear ribosomal regions across hundreds of species. Here, we introduce PhyloHerb -- a bioinformatic pipeline to efficiently and effectively assemble phylogenomic datasets derived from genome skimming. Methods and Results: PhyloHerb uses either a built-in database or user-specified references to extract orthologous sequences using BLAST search. It outputs FASTA files and offers a suite of utility functions to assist with alignment, data partitioning, concatenation, and phylogeny inference. The program is freely available at https://github.com/lmcai/PhyloHerb/. Conclusions: Using published data from Clusiaceae, we demonstrated that PhyloHerb can accurately identify genes using highly fragmented assemblies derived from sequencing older herbarium specimens. Our approach is effective at all taxonomic depths and is scalable to thousands of species.

Download Full-text

Estimated prevalence of Niemann–Pick type C disease in Quebec

Scientific Reports ◽

10.1038/s41598-021-01966-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Marjorie Labrecque ◽

Lahoud Touma ◽

Claude Bhérer ◽

Antoine Duquette ◽

Martine Tétreault

Keyword(s):

Autosomal Recessive ◽

Age Of Onset ◽

Sequencing Data ◽

Bioinformatic Pipeline ◽

Metropolitan Regions ◽

Pathogenic Variants ◽

Hardy Weinberg Equilibrium ◽

Type C ◽

Niemann Pick ◽

Psychiatric Problems

AbstractNiemann–Pick type C (NP-C) disease is an autosomal recessive disease caused by variants in the NPC1 or NPC2 genes. It has a large range of symptoms depending on age of onset, thus making it difficult to diagnose. In adults, symptoms appear mainly in the form of psychiatric problems. The prevalence varies from 0.35 to 2.2 per 100,000 births depending on the country. The aim of this study is to calculate the estimated prevalence of NP-C in Quebec to determine if it is underdiagnosed in this population. The CARTaGENE database is a unique database that regroups individuals between 40 and 69 years old from metropolitan regions of Quebec. RNA-sequencing data was available for 911 individuals and exome sequencing for 198 individuals. We used a bioinformatic pipeline on those individuals to extract the variants in the NPC1/2 genes. The prevalence in Quebec was estimated assuming Hardy–Weinberg Equilibrium. Two pathogenic variants were used. The variant p.Pro543Leu was found in three heterozygous individuals that share a common haplotype, which suggests a founder French-Canadian pathogenic variant. The variant p.Ile1061Thr was found in two heterozygous individuals. Both variants have previously been reported and are usually associated with infantile onset. The estimated prevalence calculated using those two variants is 0.61:100,000 births. This study represents the first estimate of NP-C in Quebec. The estimated prevalence for NP-C is likely underestimated due to misdiagnosis or missed cases. It is therefore important to diagnose all NP-C patients to initiate early treatment.

Download Full-text

Bloodstream infections' origins following fecal microbiota transplantation: a strain-level analysis

Blood Advances ◽

10.1182/bloodadvances.2021005110 ◽

2021 ◽

Author(s):

Adi Eshel ◽

Itai Sharon ◽

Arnon Nagler ◽

David Bomze ◽

Ivetta Danylesko ◽

...

Keyword(s):

Escherichia Coli ◽

Fecal Microbiota Transplantation ◽

Bloodstream Infections ◽

Fecal Microbiota ◽

Immunocompromised Patients ◽

Bioinformatic Pipeline ◽

Graft Versus Host ◽

Stool Samples ◽

Level Analysis ◽

Rule Out

We observed high rates of bloodstream infections (BSIs) following fecal microbiota transplantation (FMT) for graft-versus-host-disease (33 events in 22 patients). To trace the BSIs' origin, we applied a metagenomic bioinformatic pipeline screening donor and recipient stool samples for bacteremia-causing strains in 13 cases. Offending strains were not detected in FMT donations. Enterococcus faecium, Escherichia coli, Pseudomonas aeruginosa, and Acinetobacter baumannii could be detected in stool samples before emerging in the blood. In this largest report of BSIs post-FMT, we present an approach that may be applicable for evaluating BSI origin following microbiota-based interventions. Our findings support FMT safety in immunocompromised patients but do not rule out FMT as an inducer of bacterial translocation.

Download Full-text

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

Viruses ◽

10.3390/v13102006 ◽

2021 ◽

Vol 13 (10) ◽

pp. 2006

Author(s):

Anna Y Budkina ◽

Elena V Korneenko ◽

Ivan A Kotov ◽

Daniil A Kiselev ◽

Ilya V Artyushin ◽

...

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Metagenomic Data ◽

Sequencing Data ◽

Viral Pathogens ◽

Genomic Databases ◽

Bioinformatic Pipeline ◽

Viral Genomes ◽

Sequencing Technologies ◽

Viral Screening

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.

Download Full-text

bioinformatic pipeline
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

nanopore nCoV-2019 sequencing protocol (RAPID barcoding, 1200bp amplicon, combined RT-PCR) v2

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data

SNPfiltR: an R package for interactive and reproducible SNP filtering

ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies

Tailer: A Pipeline for Sequencing-Based Analysis of Non-Polyadenylated RNA 3’ End Processing

A Bioinformatic Workflow for InDel Analysis in the Wheat Multi-Copy α-Gliadin Gene Family Engineered with CRISPR/Cas9

PhyloHerb: A phylogenomic pipeline for processing genome skimming data for plants

Estimated prevalence of Niemann–Pick type C disease in Quebec

Bloodstream infections' origins following fecal microbiota transplantation: a strain-level analysis

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

Export Citation Format

bioinformatic pipelineRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

nanopore nCoV-2019 sequencing protocol (RAPID barcoding, 1200bp amplicon, combined RT-PCR) v2

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data

SNPfiltR: an R package for interactive and reproducible SNP filtering

ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies

Tailer: A Pipeline for Sequencing-Based Analysis of Non-Polyadenylated RNA 3’ End Processing

A Bioinformatic Workflow for InDel Analysis in the Wheat Multi-Copy α-Gliadin Gene Family Engineered with CRISPR/Cas9

PhyloHerb: A phylogenomic pipeline for processing genome skimming data for plants

Estimated prevalence of Niemann–Pick type C disease in Quebec

Bloodstream infections' origins following fecal microbiota transplantation: a strain-level analysis

Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples

bioinformatic pipeline
Recently Published Documents