scholarly journals SeeCiTe: a method to assess CNV calls from SNP arrays using trio data

Author(s):  
Ksenia Lavrichenko ◽  
Øyvind Helgeland ◽  
Pål R Njølstad ◽  
Inge Jonassen ◽  
Stefan Johansson

Abstract Motivation Single nucleotide polymorphism (SNP) genotyping arrays remain an attractive platform for assaying copy number variants (CNVs) in large population-wide cohorts. However, current tools for calling CNVs are still prone to extensive false positive calls when applied to biobank scale arrays. Moreover, there is a lack of methods exploiting cohorts with trios available (e.g. nuclear family) to assist in quality control and downstream analyses following the calling. Results We developed SeeCiTe (Seeing CNVs in Trios), a novel CNV-quality control tool that postprocesses output from current CNV-calling tools exploiting child-parent trio data to classify calls in quality categories and provide a set of visualizations for each putative CNV call in the offspring. We apply it to the Norwegian Mother, Father and Child Cohort Study (MoBa) and show that SeeCiTe improves the specificity and sensitivity compared to the common empiric filtering strategies. To our knowledge, it is the first tool that utilizes probe-level CNV data in trios (and singletons) to systematically highlight potential artifacts and visualize signal intensities in a streamlined fashion suitable for biobank scale studies. Availability and implementation The software is implemented in R with the source code freely available at https://github.com/aksenia/SeeCiTe Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Ksenia Lavrichenko ◽  
Øyvind Helgeland ◽  
Pål R Njølstad ◽  
Inge Jonassen ◽  
Stefan Johansson

AbstractMotivationSingle nucleotide polymorphism (SNP) genotyping arrays remain an attractive platform for assaying copy number variants (CNVs) in large population-wide cohorts. However current tools for calling CNVs are still prone to extensive false positive calls when applied to biobank scale arrays. Moreover, there is a lack of methods exploiting cohorts with trios available (e.g. nuclear family) to assist in quality control and downstream analyses following the calling.ResultsWe developed SeeCiTe (Seeing Cnvs in Trios), a novel CNV quality control tool that post-processes output from current CNV calling tools exploiting child-parent trio data to classify calls in quality categories and provide a set of visualizations for each putative CNV call in the offspring. We apply it to the Norwegian Mother, Father, and Child Cohort Study (MoBa) and show that SeeCiTe improves the specificity and sensitivity compared to the common empiric filtering strategies. To our knowledge it is the first tool that utilizes probe-level CNV data in trios to systematically highlight potential artefacts and visualize signal intensities in a streamlined fashion suitable for biobank scale studies.Availability and ImplementationThe software is implemented in R with the source code freely available at https://github.com/aksenia/[email protected], [email protected] or [email protected]


2019 ◽  
Vol 35 (21) ◽  
pp. 4442-4444 ◽  
Author(s):  
Jia-Xing Yue ◽  
Gianni Liti

Abstract Summary Simulated genomes with pre-defined and random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG, a lightweight tool for simulating the full-spectrum of genomic variants (single nucleotide polymorphisms, Insertions/Deletions, copy number variants, inversions and translocations) for any organisms (including human). The simplicity and versatility of simuG make it a unique general-purpose genome simulator for a wide-range of simulation-based applications. Availability and implementation Code in Perl along with user manual and testing data is available at https://github.com/yjx1217/simuG. This software is free for use under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Quang Tran ◽  
Alexej Abyzov

Abstract Summary Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation—LongAGE—based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. Availability and implementation LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zekun Yin ◽  
Hao Zhang ◽  
Meiyang Liu ◽  
Wen Zhang ◽  
Honglei Song ◽  
...  

Abstract Motivation Modern sequencing technologies continue to revolutionize many areas of biology and medicine. Since the generated datasets are error-prone, downstream applications usually require quality control methods to pre-process FASTQ files. However, existing tools for this task are currently not able to fully exploit the capabilities of computing platforms leading to slow runtimes. Results We present RabbitQC, an extremely fast integrated quality control tool for FASTQ files, which can take full advantage of modern hardware. It includes a variety of operations and supports different sequencing technologies (Illumina, Oxford Nanopore and PacBio). RabbitQC achieves speedups between one and two orders-of-magnitude compared to other state-of-the-art tools. Availability and implementation C++ sources and binaries are available at https://github.com/ZekunYin/RabbitQC. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Omer Sabary ◽  
Yoav Orlev ◽  
Roy Shafir ◽  
Leon Anavy ◽  
Eitan Yaakobi ◽  
...  

Abstract Motivation Recent years have seen a growing number and an expanding scope of studies using synthetic oligo libraries for a range of applications in synthetic biology. As experiments are growing by numbers and complexity, analysis tools can facilitate quality control and support better assessment and inference. Results We present a novel analysis tool, called SOLQC, which enables fast and comprehensive analysis of synthetic oligo libraries, based on NGS analysis performed by the user. SOLQC provides statistical information such as the distribution of variant representation, different error rates and their dependence on sequence or library properties. SOLQC produces graphical reports from the analysis, in a flexible format. We demonstrate SOLQC by analyzing literature libraries. We also discuss the potential benefits and relevance of the different components of the analysis. Availability and implementation SOLQC is a free software for non-commercial use, available at https://app.gitbook.com/@yoav-orlev/s/solqc/. For commercial use please contact the authors. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Carmen Seller Oria ◽  
Adrian Thummerer ◽  
Jeffrey Free ◽  
Johannes A. Langendijk ◽  
Stefan Both ◽  
...  

2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i831-i839
Author(s):  
Dong-gi Lee ◽  
Myungjun Kim ◽  
Sang Joon Son ◽  
Chang Hyung Hong ◽  
Hyunjung Shin

Abstract Motivation Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. Results The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. Availability and implementation The code for the framework will be available at http://www.alphaminers.net/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 24 ◽  
pp. 121-128
Author(s):  
Sigal Ben-Zaken ◽  
Yoav Meckel ◽  
Dan Nemet ◽  
Alon Eliakim

The ACSL A/G polymorphism is associated with endurance trainability. Previous studies have demonstrated that homozygotes of the minor AA allele had a reduced maximal oxygen consumption response to training compared to the common GG allele homozygotes, and that the ACSL A/G single nucleotide polymorphism explained 6.1% of the variance in the VO2max response to endurance training. The contribution of ACSL single nucleotide polymorphism to endurance trainability was shown in nonathletes, however, its potential role in professional athletes is not clear. Moreover, the genetic basis to anaerobic trainability is even less studied. Therefore, the aim of the present study was to examine the prevalence of ACSL single nucleotide polymorphism among professional Israeli long distance runners (n=59), middle distance runners (n=31), sprinters and jumpers (n=48) and non-athletic controls (n=60). The main finding of the present study was that the ACSL1 AA genotype, previously shown to be associated with reduced endurance trainability, was not higher among sprinters and jumpers (15%) compared to middle- (16%) and long-distance runners (15%). This suggests that in contrast to previous studies indicating that the ACSL1 single nucleotide polymorphism may influence endurance trainability among non-athletic individuals, the role of this polymorphism among professional athletes is still not clear.


2016 ◽  
Vol 145 (3) ◽  
pp. 308-315 ◽  
Author(s):  
Patrick C. Mathias ◽  
Emily H. Turner ◽  
Sheena M. Scroggins ◽  
Stephen J. Salipante ◽  
Noah G. Hoffman ◽  
...  

Genomics ◽  
1987 ◽  
Vol 1 (3) ◽  
pp. 228-231 ◽  
Author(s):  
Takenori Takizawa ◽  
Yoshimasa Yoneyama ◽  
Shiro Miwa ◽  
Akira Yoshida

Sign in / Sign up

Export Citation Format

Share Document