scholarly journals Large scale statistical analysis of genome data with Ruby and R: skipping interface libraries

2014 ◽  
Vol 20 (0) ◽  
Author(s):  
Sergio RP Line ◽  
Ana P De Souza ◽  
Luciana S Mofatto
2006 ◽  
Vol 141 (3) ◽  
pp. 811-824 ◽  
Author(s):  
Sunchung Park ◽  
Nobuko Sugimoto ◽  
Matthew D. Larson ◽  
Randy Beaudry ◽  
Steven van Nocker

Author(s):  
Sen Zhao ◽  
Oleg Agafonov ◽  
Abdulrahman Azab ◽  
Tomasz Stokowy ◽  
Eivind Hovig

AbstractAdvances in next-generation sequencing technology has enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data, however there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN™ and DeepVariant) using Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN™ and DeepVariant show a better accuracy in SNPs and indels calling, with no significant differences in their F1-score. DRAGEN™ platform offers accuracy, flexibility and a highly-efficient running speed, and therefore superior advantage in the analysis of WGS data on a large scale. The combination of DRAGEN™ and DeepVariant also provides a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical application.


mBio ◽  
2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Xyrus X. Maurer-Alcalá ◽  
Rob Knight ◽  
Laura A. Katz

ABSTRACTSeparate germline and somatic genomes are found in numerous lineages across the eukaryotic tree of life, often separated into distinct tissues (e.g., in plants, animals, and fungi) or distinct nuclei sharing a common cytoplasm (e.g., in ciliates and some foraminifera). In ciliates, germline-limited (i.e., micronuclear-specific) DNA is eliminated during the development of a new somatic (i.e., macronuclear) genome in a process that is tightly linked to large-scale genome rearrangements, such as deletions and reordering of protein-coding sequences. Most studies of germline genome architecture in ciliates have focused on the model ciliatesOxytricha trifallax,Paramecium tetraurelia, andTetrahymena thermophila, for which the complete germline genome sequences are known. Outside of these model taxa, only a few dozen germline loci have been characterized from a limited number of cultivable species, which is likely due to difficulties in obtaining sufficient quantities of “purified” germline DNA in these taxa. Combining single-cell transcriptomics and genomics, we have overcome these limitations and provide the first insights into the structure of the germline genome of the ciliateChilodonella uncinata, a member of the understudied classPhyllopharyngea. Our analyses reveal the following: (i) large gene families contain a disproportionate number of genes from scrambled germline loci; (ii) germline-soma boundaries in the germline genome are demarcated by substantial shifts in GC content; (iii) single-cell omics techniques provide large-scale quality germline genome data with limited effort, at least for ciliates with extensively fragmented somatic genomes. Our approach provides an efficient means to understand better the evolution of genome rearrangements between germline and soma in ciliates.IMPORTANCEOur understanding of the distinctions between germline and somatic genomes in ciliates has largely relied on studies of a few model genera (e.g.,Oxytricha,Paramecium,Tetrahymena). We have used single-cell omics to explore germline-soma distinctions in the ciliateChilodonella uncinata, which likely diverged from the better-studied ciliates ~700 million years ago. The analyses presented here indicate that developmentally regulated genome rearrangements between germline and soma are demarcated by rapid transitions in local GC composition and lead to diversification of protein families. The approaches used here provide the basis for future work aimed at discerning the evolutionary impacts of germline-soma distinctions among diverse ciliates.


1987 ◽  
Author(s):  
A GOGUEL ◽  
A HOUBOUYAN ◽  
J ROUSSI

One of the aim of the survey conducted in last december 1986 was to assess the efficacy of 2 procedures of standardization :1) the INR system, derived from thromboplastin calibration and adopted in 1983 by the WHO.2) the Reference Calibrated Plasmas (RCP) procedure, evaluated on large scale, through French interlaboratory trials (1977-85), exhibiting net improvement of the dispersion of overall data.Labs were asked to perform with their local thromboplastin and method, the PT of a human lyophilized plasma 86 H/I, originated from long term antivitarnines-K (AVK) treated patients. Results were expressed *in time ; *in % activity, according to the traditional procedure based on saline dilutions of normal plasma ; *in INR using the ISI of the local reagent calibrated by the manufacturer. Calibrated plasmas procedure allow the determination of corrected activity ; *in % activity and INR, according to the linear calibration curve obtained from the PT of 2 reference calibrated plasmas with determinated activities in INR and % activity. These RCP were provided with and tested under the same conditions as plasma 86 H/I6 (2 systems of RCP : AVK and artificially depleted).Statistical analysis shows that the "RCP" procedure leads to the best improvement of the interlaboratory variation for the overall data, and the best uniformization of mean results, whatever the way of expression (%, INR), the thromboplastin brand, and the method of PT testing. Results play also in favour of a system of AVK reference plasmas, giving a better grouping than the artificial calibrated plasmas. The INR system nevertheless provides a common scale of data reporting, but might hold profit from an efficient procedure of standardization, such as the calibrated AVK plasmas procedure.Coefficient of variation (CV) expressed in %. Overall data PT of 86 H/I. French Etalonorme Survey.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Ksenia Krasheninnikova ◽  
Mark Diekhans ◽  
Joel Armstrong ◽  
Aleksei Dievskii ◽  
Benedict Paten ◽  
...  

Abstract Background Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an accurate and efficient computational approach for synteny block production. Findings halSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred–way, reference-free vertebrate alignments built with the Cactus system. Conclusions halSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at https://github.com/ComparativeGenomicsToolkit/hal/.


2019 ◽  
Vol 35 (24) ◽  
pp. 5359-5360 ◽  
Author(s):  
Caroline J Sands ◽  
Arnaud M Wolfer ◽  
Gonçalo D S Correia ◽  
Noureddin Sadawi ◽  
Arfan Ahmed ◽  
...  

Abstract Summary As large-scale metabolic phenotyping studies become increasingly common, the need for systemic methods for pre-processing and quality control (QC) of analytical data prior to statistical analysis has become increasingly important, both within a study, and to allow meaningful inter-study comparisons. The nPYc-Toolbox provides software for the import, pre-processing, QC and visualization of metabolic phenotyping datasets, either interactively, or in automated pipelines. Availability and implementation The nPYc-Toolbox is implemented in Python, and is freely available from the Python package index https://pypi.org/project/nPYc/, source is available at https://github.com/phenomecentre/nPYc-Toolbox. Full documentation can be found at http://npyc-toolbox.readthedocs.io/ and exemplar datasets and tutorials at https://github.com/phenomecentre/nPYc-toolbox-tutorials.


Sign in / Sign up

Export Citation Format

Share Document