scholarly journals Selection of optimal bioinformatic tools and proper reference for reducing the alignment error in targeted sequencing data

2021 ◽  
Vol 11 (1) ◽  
pp. 37
Author(s):  
Mohammadreza Sehhati ◽  
HannaneMohammadi Nodehi ◽  
MohammadAmin Tabatabaiefar
Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1979
Author(s):  
Francesco Musacchia ◽  
Marianthi Karali ◽  
Annalaura Torella ◽  
Steve Laurie ◽  
Valeria Policastro ◽  
...  

Homozygous deletions (HDs) may be the cause of rare diseases and cancer, and their discovery in targeted sequencing is a challenging task. Different tools have been developed to disentangle HD discovery but a sensitive caller is still lacking. We present VarGenius-HZD, a sensitive and scalable algorithm that leverages breadth-of-coverage for the detection of rare homozygous and hemizygous single-exon deletions (HDs). To assess its effectiveness, we detected both real and synthetic rare HDs in fifty exomes from the 1000 Genomes Project obtaining higher sensitivity in comparison with state-of-the-art algorithms that each missed at least one event. We then applied our tool on targeted sequencing data from patients with Inherited Retinal Dystrophies and solved five cases that still lacked a genetic diagnosis. We provide VarGenius-HZD either stand-alone or integrated within our recently developed software, enabling the automated selection of samples using the internal database. Hence, it could be extremely useful for both diagnostic and research purposes.


2021 ◽  
Author(s):  
Francesco Musacchia ◽  
Marianthi Karali ◽  
Annalaura Torella ◽  
Steve Laurie ◽  
Valeria Policastro ◽  
...  

Motivation: Homozygous deletions (HDs) may be the cause of rare diseases and cancer and their discovery in targeted sequencing is a challenging task. Different tools have been developed to disentangle HD discovery but a sensitive caller is still lacking. Results: We present VarGenius-HZD, a sensitive and scalable algorithm that leverages breadth-of-coverage for the detection of rare homozygous and hemizygous single-exon deletions (HDs). To assess its effectiveness we detected both real and synthetic rare HDs in fifty exomes from the 1000 Genomes Project obtaining higher sensitivity in comparison with state-of-the-art algorithms which missed at least one event each. We then applied our tool on targeted sequencing data from patients with Inherited Retinal Dystrophies and solved five cases that still lacked a genetic diagnosis. Availability and implementation: We provide VarGenius-HZD either stand-alone or integrated within our recently developed software enabling the automated selection of samples using the internal database. Hence, it could be extremely useful for both diagnostic and research purposes. Our tool is available under GNU General Public License, version 3 at: https://github.com/frankMusacchia/VarGenius-HZD Contact: [email protected] Supplementary information is available online.


2019 ◽  
Vol 102 (5) ◽  
pp. 1263-1270 ◽  
Author(s):  
Weili Xiong ◽  
Melinda A McFarland ◽  
Cary Pirone ◽  
Christine H Parker

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guilherme B. Neumann ◽  
Paula Korkuć ◽  
Danny Arends ◽  
Manuel J. Wolf ◽  
Katharina May ◽  
...  

Abstract Background German Black Pied cattle (DSN) are an endangered dual-purpose breed which was largely replaced by Holstein cattle due to their lower milk yield. DSN cattle are kept as a genetic reserve with a current herd size of around 2500 animals. The ability to track sequence variants specific to DSN could help to support the conservation of DSN’s genetic diversity and to provide avenues for genetic improvement. Results Whole-genome sequencing data of 304 DSN cattle were used to design a customized DSN200k SNP chip harboring 182,154 variants (173,569 SNPs and 8585 indels) based on ten selection categories. We included variants of interest to DSN such as DSN unique variants and variants from previous association studies in DSN, but also variants of general interest such as variants with predicted consequences of high, moderate, or low impact on the transcripts and SNPs from the Illumina BovineSNP50 BeadChip. Further, the selection of variants based on haplotype blocks ensured that the whole-genome was uniformly covered with an average variant distance of 14.4 kb on autosomes. Using 300 DSN and 162 animals from other cattle breeds including Holstein, endangered local cattle populations, and also a Bos indicus breed, performance of the SNP chip was evaluated. Altogether, 171,978 (94.31%) of the variants were successfully called in at least one of the analyzed breeds. In DSN, the number of successfully called variants was 166,563 (91.44%) while 156,684 (86.02%) were segregating at a minor allele frequency > 1%. The concordance rate between technical replicates was 99.83 ± 0.19%. Conclusion The DSN200k SNP chip was proved useful for DSN and other Bos taurus as well as one Bos indicus breed. It is suitable for genetic diversity management and marker-assisted selection of DSN animals. Moreover, variants that were segregating in other breeds can be used for the design of breed-specific customized SNP chips. This will be of great value in the application of conservation programs for endangered local populations in the future.


2019 ◽  
Vol 20 (S24) ◽  
Author(s):  
Yu Zhang ◽  
Changlin Wan ◽  
Pengcheng Wang ◽  
Wennan Chang ◽  
Yan Huo ◽  
...  

Abstract Background Various statistical models have been developed to model the single cell RNA-seq expression profiles, capture its multimodality, and conduct differential gene expression test. However, for expression data generated by different experimental design and platforms, there is currently lack of capability to determine the most proper statistical model. Results We developed an R package, namely Multi-Modal Model Selection (M3S), for gene-wise selection of the most proper multi-modality statistical model and downstream analysis, useful in a single-cell or large scale bulk tissue transcriptomic data. M3S is featured with (1) gene-wise selection of the most parsimonious model among 11 most commonly utilized ones, that can best fit the expression distribution of the gene, (2) parameter estimation of a selected model, and (3) differential gene expression test based on the selected model. Conclusion A comprehensive evaluation suggested that M3S can accurately capture the multimodality on simulated and real single cell data. An open source package and is available through GitHub at https://github.com/zy26/M3S.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ting Hon ◽  
Kristin Mars ◽  
Greg Young ◽  
Yu-Chih Tsai ◽  
Joseph W. Karalius ◽  
...  

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


2020 ◽  
Vol 36 (16) ◽  
pp. 4510-4512
Author(s):  
Giulio Isacchini ◽  
Carlos Olivares ◽  
Armita Nourmohammad ◽  
Aleksandra M Walczak ◽  
Thierry Mora

Abstract Summary Recent advances in modelling VDJ recombination and subsequent selection of T- and B-cell receptors provide useful tools to analyse and compare immune repertoires across time, individuals and tissues. A suite of tools—IGoR, OLGA and SONIA—have been publicly released to the community that allow for the inference of generative and selection models from high-throughput sequencing data. However, using these tools requires some scripting or command-line skills and familiarity with complex datasets. As a result, the application of the above models has not been available to a broad audience. In this application note, we fill this gap by presenting Simple OLGA & SONIA (SOS), a web-based interface where users with no coding skills can compute the generation and post-selection probabilities of their sequences, as well as generate batches of synthetic sequences. The application also functions on mobile phones. Availability and implementation SOS is freely available to use at sites.google.com/view/statbiophysens/sos with source code at github.com/statbiophys/sos.


Author(s):  
Liam F Spurr ◽  
Mehdi Touat ◽  
Alison M Taylor ◽  
Adrian M Dubuc ◽  
Juliann Shih ◽  
...  

Abstract Summary The expansion of targeted panel sequencing efforts has created opportunities for large-scale genomic analysis, but tools for copy-number quantification on panel data are lacking. We introduce ASCETS, a method for the efficient quantitation of arm and chromosome-level copy-number changes from targeted sequencing data. Availability and implementation ASCETS is implemented in R and is freely available to non-commercial users on GitHub: https://github.com/beroukhim-lab/ascets, along with detailed documentation. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document