bioinformatic pipeline
Recently Published Documents


TOTAL DOCUMENTS

94
(FIVE YEARS 72)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Anton Pembaur ◽  
Erwan Sallard ◽  
Patrick Weil ◽  
Jennifer Ortelt ◽  
Parviz Ahmad-Nejad ◽  
...  

We established a protocol for fast, cost efficient Sars-CoV-2 sequencing with little as possible hands-on time (around 3h in total, excluding RNA extraction). The whole Sequencing can be done in one working day, including the bioinformatic pipeline. The cost per sample accumulates at around 40$, with already isolated RNA. We adapted and simplified existing workflows using the ‘midnight’ 1,200 bp amplicon split primer sets for PCR, which produce tiled overlapping amplicons covering almost all of the SARS-CoV-2 genome. Subsequently, we applied the Oxford Nanopore Rapid barcoding protocol and the portable MinION Mk1C sequencer in combination with the ARTIC bioinformatics pipeline. We tested the simplified and less time-consuming workflow on confirmed SARS-CoV-2-positive specimens from clinical routine and identified pre-analytical parameters, which may help to decrease the rate of sequencing failures. Duration of the complete pipeline was approx. 7 hrs for one specimen and approx. 11 hrs for 12 multiplexed barcoded specimens. This protocol is a modified version of Nikki Freed and Olin Silanders protocol. To get information such as Primers, visit their protocol. Nikki Freed, Olin Silander 2020. nCoV-2019 sequencing protocol (RAPID barcoding, 1200bp amplicon).doi: 10.1093/biomethods/bpaa014 Our peer-reviewed paper is available here: https://www.mdpi.com/2076-2607/9/12/2598


2021 ◽  
Vol 33 (2) ◽  
Author(s):  
Phelelani Mpangase ◽  
Jacqueline Frost ◽  
Mohammed Tikly ◽  
Michèle Ramsay ◽  
Scott Hazelhurst

The rate of raw sequence production through Next-Generation Sequencing (NGS) has been growing exponentially due to improved technology and reduced costs. This has enabled researchers to answer many biological questions through ``multi-omics'' data analyses. Even though such data promises new insights into how biological systems function and understanding disease mechanisms, computational analyses performed on such large datasets comes with its challenges and potential pitfalls. The aim of this study was to develop a robust portable and reproducible bioinformatic pipeline for the automation of RNA sequencing (RNA-seq) data analyses. Using Nextflow as a workflow management system and Singularity for application containerisation, the nf-rnaSeqCount pipeline was developed for mapping raw RNA-seq reads to a reference genome and quantifying abundance of identified genomic features for differential gene expression analyses. The pipeline provides a quick and efficient way to obtain a matrix of read counts that can be used black with tools such as DESeq2 and edgeR for differential expression analysis. Robust and flexible bioinformatic and computational pipelines for RNA-seq data analysis, from QC to sequence alignment and comparative analyses, will reduce analysis time, and increase accuracy and reproducibility of findings to promote transcriptome research.


Author(s):  
Devon DeRaad

Here I describe the novel R package SNPfiltR and demonstrate its functionalities as the backbone of a customizable, reproducible SNP filtering pipeline implemented exclusively via the widely adopted R programming language. SNPfiltR extends existing SNP filtering functionalities by automating the visualization of key parameters such as depth, quality, and missing data, then allowing users to set filters based on optimized thresholds, all within a single, cohesive working environment. All SNPfiltR functions require a vcfR object as input, which can be easily generated by reading a SNP dataset stored as a standard vcf file into an R working environment using the function read.vcfR() from the R package vcfR. Performance benchmarking reveals that for moderately sized SNP datasets (up to 50M genotypes with associated quality information), SNPfiltR performs filtering with comparable efficiency to current state of the art command-line-based programs. These benchmarking results indicate that for most reduced-representation genomic datasets, SNPfiltR is an ideal choice for investigating, visualizing, and filtering SNPs as part of a cohesive and easily documentable bioinformatic pipeline. The SNPfiltR package can be downloaded from CRAN with the command [install.packages(“SNPfiltR”)], and a development version is available from GitHub at: (github.com/DevonDeRaad/SNPfiltR). Additionally, thorough documentation for SNPfiltR, including multiple comprehensive vignettes, is available at the website: (devonderaad.github.io/SNPfiltR/).


2021 ◽  
Author(s):  
Ilya Plyusnin ◽  
Phuoc Thien Truong Nguyen ◽  
Tarja Sironen ◽  
Olli Vapalahti ◽  
Teemu Smura ◽  
...  

Summary: SARS-CoV-2 is the highly transmissible etiologic agent of coronavirus disease 2019 (COVID-19) and has become a global scientific and public health challenge since December 2019. Several new variants of SARS-CoV-2 have emerged globally raising concern about prevention and treatment of COVID-19. Early detection and in depth analysis of the emerging variants allowing pre-emptive alert and mitigation efforts are thus of paramount importance. Here we present ClusTRace, a novel bioinformatic pipeline for a fast and scalable analysis of sequence clusters or clades in large viral phylogenies. ClusTRace offers several high level functionalities including outlier filtering, aligning, phylogenetic tree reconstruction, cluster or clade extraction, variant calling, visualization and reporting. ClusTRace was developed as an aid for COVID-19 transmission chain tracing in Finland and the main emphasis has been on fast and unsupervised screening of phylogenies for markers of super-spreading events and other features of concern, such as high rates of cluster growth and/or accumulation of novel mutations. Availability: All code is freely available from https://bitbucket.org/plyusnin/clustrace/


2021 ◽  
Author(s):  
Tim Nicholson-Shaw ◽  
Jens Lykke-Andersen

AbstractPost-transcriptional trimming and tailing of RNA 3’ ends play key roles in the processing and quality control of non-coding RNAs (ncRNAs). However, bioinformatic tools to examine changes in the RNA 3’ “tailome” are sparse and not standardized. Here we present Tailer, a bioinformatic pipeline in two parts that allows for robust quantification and analysis of tail information from next generation sequencing experiments that preserve RNA 3’ end information. The first part of Tailer, Tailer-Processing, uses genome annotation or reference FASTA gene sequences to quantify RNA 3’ ends from SAM-formatted alignment files or FASTQ sequence read files produced from sequencing experiments. The second part, Tailer-Analysis, uses the output of Tailer-Processing to identify statistically significant RNA targets of trimming and tailing and create graphs for data exploration. We apply Tailer to RNA 3’ end sequencing experiments from three published studies and find that it accurately and reproducibly recapitulates key findings. Thus, Tailer should be a useful and easily accessible tool to globally investigate tailing dynamics of non-polyadenylated RNAs and conditions that perturb them.


2021 ◽  
Vol 22 (23) ◽  
pp. 13076
Author(s):  
María H. Guzmán-López ◽  
Miriam Marín-Sanz ◽  
Susana Sánchez-León ◽  
Francisco Barro

The α-gliadins of wheat, along with other gluten components, are responsible for bread viscoelastic properties. However, they are also related to human pathologies as celiac disease or non-celiac wheat sensitivity. CRISPR/Cas was successfully used to knockout α-gliadin genes in bread and durum wheat, therefore, obtaining low gluten wheat lines. Nevertheless, the mutation analysis of these genes is complex as they present multiple and high homology copies arranged in tandem in A, B, and D subgenomes. In this work, we present a bioinformatic pipeline based on NGS amplicon sequencing for the analysis of insertions and deletions (InDels) in α-gliadin genes targeted with two single guides RNA (sgRNA). This approach allows the identification of mutated amplicons and the analysis of InDels through comparison to the most similar wild type parental sequence. TMM normalization was performed for inter-sample comparisons; being able to study the abundance of each InDel throughout generations and observe the effects of the segregation of Cas9 coding sequence in different lines. The usefulness of the workflow is relevant to identify possible genomic rearrangements such as large deletions due to Cas9 cleavage activity. This pipeline enables a fast characterization of mutations in multiple samples for a multi-copy gene family.


2021 ◽  
Author(s):  
Liming Cai ◽  
Hongrui Zhang ◽  
CHARLES C DAVIS

Premise of the study: The application of high throughput sequencing, especially to herbarium specimens, is greatly accelerating biodiversity research. Among various techniques, low coverage Illumina sequencing of total genomic DNA (genome skimming) can simultaneously recover the plastid, mitochondrial, and nuclear ribosomal regions across hundreds of species. Here, we introduce PhyloHerb -- a bioinformatic pipeline to efficiently and effectively assemble phylogenomic datasets derived from genome skimming. Methods and Results: PhyloHerb uses either a built-in database or user-specified references to extract orthologous sequences using BLAST search. It outputs FASTA files and offers a suite of utility functions to assist with alignment, data partitioning, concatenation, and phylogeny inference. The program is freely available at https://github.com/lmcai/PhyloHerb/. Conclusions: Using published data from Clusiaceae, we demonstrated that PhyloHerb can accurately identify genes using highly fragmented assemblies derived from sequencing older herbarium specimens. Our approach is effective at all taxonomic depths and is scalable to thousands of species.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marjorie Labrecque ◽  
Lahoud Touma ◽  
Claude Bhérer ◽  
Antoine Duquette ◽  
Martine Tétreault

AbstractNiemann–Pick type C (NP-C) disease is an autosomal recessive disease caused by variants in the NPC1 or NPC2 genes. It has a large range of symptoms depending on age of onset, thus making it difficult to diagnose. In adults, symptoms appear mainly in the form of psychiatric problems. The prevalence varies from 0.35 to 2.2 per 100,000 births depending on the country. The aim of this study is to calculate the estimated prevalence of NP-C in Quebec to determine if it is underdiagnosed in this population. The CARTaGENE database is a unique database that regroups individuals between 40 and 69 years old from metropolitan regions of Quebec. RNA-sequencing data was available for 911 individuals and exome sequencing for 198 individuals. We used a bioinformatic pipeline on those individuals to extract the variants in the NPC1/2 genes. The prevalence in Quebec was estimated assuming Hardy–Weinberg Equilibrium. Two pathogenic variants were used. The variant p.Pro543Leu was found in three heterozygous individuals that share a common haplotype, which suggests a founder French-Canadian pathogenic variant. The variant p.Ile1061Thr was found in two heterozygous individuals. Both variants have previously been reported and are usually associated with infantile onset. The estimated prevalence calculated using those two variants is 0.61:100,000 births. This study represents the first estimate of NP-C in Quebec. The estimated prevalence for NP-C is likely underestimated due to misdiagnosis or missed cases. It is therefore important to diagnose all NP-C patients to initiate early treatment.


Author(s):  
Adi Eshel ◽  
Itai Sharon ◽  
Arnon Nagler ◽  
David Bomze ◽  
Ivetta Danylesko ◽  
...  

We observed high rates of bloodstream infections (BSIs) following fecal microbiota transplantation (FMT) for graft-versus-host-disease (33 events in 22 patients). To trace the BSIs' origin, we applied a metagenomic bioinformatic pipeline screening donor and recipient stool samples for bacteremia-causing strains in 13 cases. Offending strains were not detected in FMT donations. Enterococcus faecium, Escherichia coli, Pseudomonas aeruginosa, and Acinetobacter baumannii could be detected in stool samples before emerging in the blood. In this largest report of BSIs post-FMT, we present an approach that may be applicable for evaluating BSI origin following microbiota-based interventions. Our findings support FMT safety in immunocompromised patients but do not rule out FMT as an inducer of bacterial translocation.


Viruses ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2006
Author(s):  
Anna Y Budkina ◽  
Elena V Korneenko ◽  
Ivan A Kotov ◽  
Daniil A Kiselev ◽  
Ilya V Artyushin ◽  
...  

According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.


Sign in / Sign up

Export Citation Format

Share Document