snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Software incompatibilities, and inconsistencies across computing environments, are recurrent challenges, leading to poor reproducibility. Existing semi-automated or automated solutions lack comprehensive quality checks, flexible workflow architecture, and user control. To address these challenges, we have developed snpQT: a scalable, stand-alone software pipeline using nextflow and BioContainers, for comprehensive, reproducible and interactive quality control of human genomic data. snpQT offers some 36 discrete quality filters or correction steps in a complete standardised pipeline, producing graphical reports to demonstrate the state of data before and after each quality control procedure. This includes human genome build conversion, population stratification against data from the 1,000 Genomes Project, automated population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used, and a synthetic dataset and comprehensive online tutorial are provided for testing, educational purposes, and demonstration. The snpQT pipeline is designed to run with minimal user input and coding experience; quality control steps are implemented with numerous user-modifiable thresholds, and workflows can be flexibly combined in custom combinations. snpQT is open source and freely available at https://github.com/nebfield/snpQT. A comprehensive online tutorial and installation guide is provided through to GWAS (https://snpqt.readthedocs.io/en/latest/), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset.

Download Full-text

snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

F1000Research ◽

10.12688/f1000research.53821.1 ◽

2021 ◽

Vol 10 ◽

pp. 567

Author(s):

Christina Vasilopoulou ◽

Benjamin Wingfield ◽

Andrew P. Morris ◽

William Duddy

Keyword(s):

Quality Control ◽

Snp Array ◽

Genomic Data ◽

Control Procedure ◽

Common Input ◽

User Input ◽

Online Tutorial ◽

Step Procedure ◽

Before And After ◽

Quality Checks

Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Dependency hell and reproducibility are recurrent challenges. Existing semi-automated or automated solutions lack comprehensive quality checks, flexible workflow architecture, and user control. To address these challenges, we have developed snpQT: a scalable, stand-alone software pipeline using nextflow and BioContainers, for comprehensive, reproducible and interactive quality control of human genomic data. snpQT offers some 36 discrete quality filters or correction steps in a complete standardised pipeline, producing graphical reports to demonstrate the state of data before and after each quality control procedure. This includes human genome build conversion, population stratification against data from the 1,000 Genomes Project, automated population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used, and a synthetic dataset and comprehensive online tutorial are provided for testing, educational purposes, and demonstration. The snpQT pipeline is designed to run with minimal user input and coding experience; quality control steps are implemented with default thresholds which can be modified by the user, and workflows can be flexibly combined in custom combinations. snpQT is open source and freely available at https://github.com/nebfield/snpQT. A comprehensive online tutorial and installation guide is provided through to GWAS (https://snpqt.readthedocs.io/en/latest/), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset.

Download Full-text

A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays

Frontiers in Genetics ◽

10.3389/fgene.2021.736390 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ting-Hsuan Sun ◽

Yu-Hsuan Joni Shao ◽

Chien-Lin Mao ◽

Miao-Neng Hung ◽

Yi-Yun Lo ◽

...

Keyword(s):

Quality Control ◽

Rare Variant ◽

Rare Variants ◽

Snp Array ◽

Variant Calling ◽

Control Procedure ◽

Sequencing Data ◽

Snp Arrays ◽

Array Data ◽

Quality Control Procedure

Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An approach that improves the genotyping quality is needed for clinical applications.Methods: We developed a quality-control procedure for rare variants which integrates different algorithms, filters, and experiments to increase the accuracy of variant calling. Using data from the TWB 2.0 custom Axiom array, we adopted an advanced normalization adjustment to prevent false calls caused by splitting the cluster and a rare het adjustment which decreases false calls in rare variants. The concordance of allelic frequencies from array data was compared to those from sequencing datasets of Taiwanese. Finally, genotyping results were used to detect familial hypercholesterolemia (FH), thrombophilia (TH), and maturity-onset diabetes of the young (MODY) to assess the performance in disease screening. All heterozygous calls were verified by Sanger sequencing or qPCR. The positive predictive value (PPV) of each step was estimated to evaluate the performance of our procedure.Results: We analyzed SNP array data from 43,433 individuals, which interrogated 267,247 rare variants. The advanced normalization and rare het adjustment methods adjusted genotyping calling of 168,134 variants (96.49%). We further removed 3916 probesets which were discordant in MAFs between the SNP array and sequencing data. The PPV for detecting pathogenic variants with 0.01%<MAF≤1% exceeded 99.37%. PPVs for those with an MAF of ≤0.01% improved from 95% to 100% for FH, 42.11% to 85.19% for TH, and 18.24% to 72.22% for MODY after adopting our rare variant quality-control procedure and experimental verification.Conclusion: Adopting our quality-control procedure, SNP arrays can adequately detect variants with MAF values ranging 0.01%∼0.1%. For variants with MAF values of ≤0.01%, experimental validation is needed unless sequencing data from a homogeneous population of >10,000 are available. The results demonstrated our procedure could perform correct genotype calling of rare variants. It provides a solution of pathogenic variant detection through SNP array. The approach brings tremendous promise for implementing precision medicine in medical practice.

Download Full-text

3PC-041 Automation of parenteral nutrition compounding: results of gravimetric quality control before and after its implementation

10.1136/ejhpharm-2019-eahpconf.122 ◽

2019 ◽

Author(s):

R Barbosa ◽

S Nogueira ◽

T Soares ◽

S Fraga ◽

A Silva ◽

...

Keyword(s):

Quality Control ◽

Parenteral Nutrition ◽

Before And After

Download Full-text

Comparison of materials used for cleaning equipment in retail food premises, and of two methods for the enumeration of bacteria on cleaned equipment and work surfaces

Journal of Hygiene ◽

10.1017/s0022172400028692 ◽

1970 ◽

Vol 68 (2) ◽

pp. 221-232 ◽

Cited By ~ 24

Author(s):

R. J. Gilbert

Keyword(s):

Control Method ◽

Field Trials ◽

Surface Sampling ◽

Step Procedure ◽

Cleaning Equipment ◽

Before And After ◽

Plate Counts ◽

Self Service ◽

Materials Used ◽

Retail Food

SUMMARYThere is no official scheme for testing disinfectants and detergent/disinfectants for use in the retail food trade and few recommended procedures have been given for the cleaning of equipment with these agents. Therefore, field trials were carried out in a large self-service store. Comparisons were made of the various cleaning efficiencies, as determined by bacterial plate counts, of detergent and disinfectant solutions and machine cleaning oils applied with either clean cloths or disposable paper towels to items of equipment. The most satisfactory results were always obtained when anionic detergent (0·75 % w/v) and hypochlorite (200 p.p.m. available chlorine) solutions were applied in a ‘two-step’ procedure.Tests were made to compare the calcium alginate swab-rinse and the agar sausage (Agaroid) techniques for the enumeration of bacteria on stainless steel, plastic, formica and wooden surfaces before and after a cleaning process. Although recovery rates were always greater by the swab-rinse technique, the agar sausage technique was considered to be a useful routine control method for surface sampling.

Download Full-text

Expanding HadISD: quality-controlled, sub-daily station data from 1931

10.5194/gi-2016-9 ◽

2016 ◽

Author(s):

Robert J. H. Dunn ◽

Kate M. Willett ◽

David E. Parker ◽

Lorna Mitchell

Keyword(s):

Quality Control ◽

Wind Speed ◽

Time Range ◽

Underlying Structure ◽

Control Procedure ◽

Future Projections ◽

Quality Control Procedure ◽

Temporal Coverage ◽

Extremes Of Temperature ◽

Selection Of

Abstract. HadISD is a sub-daily, station-based, quality-controlled dataset designed to study past extremes of temperature, pressure and humidity and allow comparisons to future projections. Herein we describe the first major update to the HadISD dataset. The temporal coverage of the dataset has been extended to 1931 to present, doubling the time range over which data are provided. Improvements made to the station selection and merging procedures result in 7677 stations being provided in version 2.0.0.2015p of this dataset. The selection of stations to merge together making composites has also been improved and made more robust. The underlying structure of the quality control procedure is the same as for HadISD.1.0.x, but a number of improvements have been implemented in individual tests. Also, more detailed quality control tests for wind speed and direction have been added. The data will be made available as netCDF files at www.metoffice.gov.uk/hadobs/hadisd and updated annually.

Download Full-text

Migration from NetWISP to Nutritics – a quality control procedure

Proceedings of The Nutrition Society ◽

10.1017/s0029665118000939 ◽

2018 ◽

Vol 77 (OCE3) ◽

Author(s):

S. Cassidy ◽

B. Phillips ◽

J. Caldeira Fernandes da Silva ◽

A. Parle

Keyword(s):

Quality Control ◽

Control Procedure ◽

Quality Control Procedure

Download Full-text

Analysis of Sensitometry in Radiographic Films using Optical Density Measurement

European Journal of Medical and Health Sciences ◽

10.24018/ejmed.2020.2.4.430 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Oladotun A. Ojo ◽

Peter A. Oluwafisoye ◽

Charles O. Chime

Keyword(s):

Quality Control ◽

Optical Density ◽

Density Measurement ◽

Absorbed Dose ◽

Good Prediction ◽

Radiographic Film ◽

X Ray ◽

Dose Curve ◽

Optical Density Measurement ◽

Before And After

The sensitivity of radiographic films is an important factor to the clarity and accuracy of X-ray exposure to patients during treatment or diagnostic periods. It is therefore important to do a thorough analysis of the sensitivity of the radiographic film before and after exposure to enhance the Quality Assurance (QA) and the Quality Control (QC), of the exposure procedures. The optical densities (OD) of each film was measured, with a densitometer model MA 5336, made by GAMMEX. These values were then converted to the absorbed dose (X mGy), which is the amount of dose absorbed by each patient. The optical density versus the dose curve, followed the expected pattern, showing a good prediction from the General model, that the films employed in the exposures were of good quality and standard. Hence the optical density versus dose sensitometric curves depicts the outcome of the various films sensitivity after an exposure to the X-ray radiation through the patients.

Download Full-text

Level of Inbreeding in Norik of Muran Horse: Pedigree vs. Genomic Data

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun201967061457 ◽

2019 ◽

Vol 67 (6) ◽

pp. 1457-1463 ◽

Cited By ~ 2

Author(s):

Radovan Kasarda ◽

Nina Moravčíková ◽

Ondrej Kadlečík ◽

Anna Trakovická ◽

Marko Halo ◽

...

Keyword(s):

Genetic Diversity ◽

Quality Control ◽

Genomic Analysis ◽

Genomic Data ◽

Reference Population ◽

Pedigree Analysis ◽

Snp Markers ◽

Runs Of Homozygosity ◽

Current Generation ◽

Genomic Inbreeding

The objective of this study was to analyse the level of pedigree and genomic inbreeding in a herd of the Norik of Muran horses. The pedigree file included 1374 animals (603 stallions and 771 mares), while the reference population consisted of animals that were genotyped by using 70k SNP platform (n = 25). The trend of pedigree inbreeding was expressed as the probability that an animal has two identical alleles by descent according to classical formulas. The trend of genomic inbreeding was derived from the distribution of runs of homozygosity (ROHs) with various length in the genome based on the assumption that these regions reflect the autozygosity originated from past generations of ancestors. A maximum of 19 generations was found in pedigree file. As expected, the highest level of pedigree completeness was found in first five generations. Subsequent quality control of genomic data resulted in totally 54432 SNP markers covering 2.242 Mb of the autosomal genome. The pedigree analysis showed that in current generation can be expected the pedigree inbreeding at level 0.23% (ΔFPEDi = 0.19 ± 1.17%). Comparable results was obtained also by the genomic analysis, when the inbreeding in current generation reached level 0.11%. Thus, in term of genetic diversity both analyses reflected sufficient level of variability across analysed population of Norik of Muran horses.

Download Full-text

A nonparametric on-line quality control procedure for vectorial observations

Journal of Nonparametric Statistics ◽

10.1080/10485252.2012.740039 ◽

2013 ◽

Vol 25 (1) ◽

pp. 21-32

Author(s):

Jun Mao ◽

David R. McDonald ◽

Mahmoud Zarepour

Keyword(s):

Quality Control ◽

Control Procedure ◽

Quality Control Procedure ◽

On Line

Download Full-text

Automated Quality Control of In Situ Soil Moisture from the North American Soil Moisture Database Using NLDAS-2 Products

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-14-0275.1 ◽

2015 ◽

Vol 54 (6) ◽

pp. 1267-1282 ◽

Cited By ~ 18

Author(s):

Youlong Xia ◽

Trent W. Ford ◽

Yihua Wu ◽

Steven M. Quiring ◽

Michael B. Ek

Keyword(s):

Quality Control ◽

Soil Moisture ◽

North American ◽

Soil Layer ◽

Control Procedure ◽

Multiple Model ◽

Quality Control Procedure ◽

The North ◽

Automated Quality Control

AbstractThe North American Soil Moisture Database (NASMD) was initiated in 2011 to provide support for developing climate forecasting tools, calibrating land surface models, and validating satellite-derived soil moisture algorithms. The NASMD has collected data from over 30 soil moisture observation networks providing millions of in situ soil moisture observations in all 50 states, as well as Canada and Mexico. It is recognized that the quality of measured soil moisture in NASMD is highly variable because of the diversity of climatological conditions, land cover, soil texture, and topographies of the stations, and differences in measurement devices (e.g., sensors) and installation. It is also recognized that error, inaccuracy, and imprecision in the data can have significant impacts on practical operations and scientific studies. Therefore, developing an appropriate quality control procedure is essential to ensure that the data are of the best quality. In this study, an automated quality control approach is developed using the North American Land Data Assimilation System, phase 2 (NLDAS-2), Noah soil porosity, soil temperature, and fraction of liquid and total soil moisture to flag erroneous and/or spurious measurements. Overall results show that this approach is able to flag unreasonable values when the soil is partially frozen. A validation example using NLDAS-2 multiple model soil moisture products at the 20-cm soil layer showed that the quality control procedure had a significant positive impact in Alabama, North Carolina, and west Texas. It had a greater impact in colder regions, particularly during spring and autumn. Over 433 NASMD stations have been quality controlled using the methodology proposed in this study, and the algorithm will be implemented to control data quality from the other ~1200 NASMD stations in the near future.

Download Full-text