scholarly journals Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing

2018 ◽  
Author(s):  
Roger Ros-Freixedes ◽  
Battagin Mara ◽  
Martin Johnsson ◽  
Gregor Gorjanc ◽  
Alan J Mileham ◽  
...  

AbstractBackgroundInherent sources of error and bias that affect the quality of the sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing there is a need to understand the impact of these errors and bias on resulting genotype calls.ResultsWe used a dataset of 26 pigs sequenced both at 2x with multiplexing and at 30x without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, a default and desired step for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points.ConclusionsWe propose a simple pipeline to correct this bias and we recommend that users of low-coverage sequencing be wary of unexpected biases produced by tools designed for high-coverage sequencing.

Animals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 899
Author(s):  
Fotis Pappas ◽  
Christos Palaiokostas

Incorporation of genomic technologies into fish breeding programs is a modern reality, promising substantial advances regarding the accuracy of selection, monitoring the genetic diversity and pedigree record verification. Single nucleotide polymorphism (SNP) arrays are the most commonly used genomic tool, but the investments required make them unsustainable for emerging species, such as Arctic charr (Salvelinus alpinus), where production volume is low. The requirement to genotype a large number of animals for breeding practices necessitates cost effective genotyping approaches. In the current study, we used double digest restriction site-associated DNA (ddRAD) sequencing of either high or low coverage to genotype Arctic charr from the Swedish national breeding program and performed analytical procedures to assess their utility in a range of tasks. SNPs were identified and used for deciphering the genetic structure of the studied population, estimating genomic relationships and implementing an association study for growth-related traits. Missing information and underestimation of heterozygosity in the low coverage set were limiting factors in genetic diversity and genomic relationship analyses, where high coverage performed notably better. On the other hand, the high coverage dataset proved to be valuable when it comes to identifying loci that are associated with phenotypic traits of interest. In general, both genotyping strategies offer sustainable alternatives to hybridization-based genotyping platforms and show potential for applications in aquaculture selective breeding.


Author(s):  
S. Rubinacci ◽  
D.M. Ribeiro ◽  
R. Hofmeister ◽  
O. Delaneau

AbstractLow-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.


2018 ◽  
Vol 34 (S1) ◽  
pp. 20-21
Author(s):  
Shaun Harris ◽  
Deborah Fitzsimmons ◽  
Roshan das Nair ◽  
Lucy Bradshaw

Introduction:People with traumatic brain injuries (TBIs) commonly report memory impairments which are persistent, debilitating, and reduce quality of life. As part of the Rehabilitation of Memory in Brain Injury trial, a cost-effectiveness analysis was undertaken to examine the comparative costs and effects of a group memory rehabilitation program for people with TBI.Methods:Individual-level cost and outcome data were collected. Patients were randomized to usual care (n=157) or usual care plus memory rehabilitation (n=171). The primary outcome for the economic analysis was the EuroQol-5D quality of life score at 12 months. A UK NHS costing perspective was used. Missing data was addressed by multiple imputation. One-way sensitivity analyses examined the impact of varying different parameters, and the impact of available cases, on base case findings whilst non-parametric bootstrapping examined joint uncertainty.Results:At 12 months, the intervention was GBP 26.89 (USD 35.76) (SE 249.15) cheaper than usual care; but this difference was statistically non-significant (p=0.914). At 12 months, a QALY loss of −0.007 was observed in the intervention group confidence interval (95% CI: −0.025–0.012) and a QALY gain seen in the usual care group 0.004 (95% CI: -0.017–0.025). This difference was not statistically significant (p=0.442). The base case analysis gave an ICER of GBP 2,445 (USD 3,252) reflecting that the intervention was less effective and less costly compared to usual care. Sensitivity analyses illustrated considerable uncertainty. When joint uncertainty was examined, the probability of the intervention being cost-effective at a willingness-to-pay threshold of GBP 20,000 per QALY gain was 29 percent and 24 percent at GBP 30,000.Conclusions:Our cost-utility analysis indicates that memory rehabilitation was cheaper but less effective than usual care but these findings must be interpreted in the light of small statistically non–significant differences and considerable uncertainty was evident. The ReMemBrIn intervention is unlikely to be considered cost-effective for people with TBI.


Author(s):  
Marta Byrska-Bishop ◽  
Uday S. Evani ◽  
Xuefang Zhao ◽  
Anna O. Basile ◽  
Haley J. Abel ◽  
...  

ABSTRACTThe 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and microarray genotyping. Here, we present a new, high coverage WGS resource encompassing the original 2,504 1kGP samples, as well as an additional 698 related samples that result in 602 complete trios in the 1kGP cohort. We sequenced this expanded 1kGP cohort of 3,202 samples to a targeted depth of 30X using Illumina NovaSeq 6000 instruments. We performed SNV/INDEL calling against the GRCh38 reference using GATK’s HaplotypeCaller, and generated a comprehensive set of SVs by integrating multiple analytic methods through a sophisticated machine learning model, upgrading the 1kGP dataset to current state-of-the-art standards. Using this strategy, we defined over 111 million SNVs, 14 million INDELs, and ∼170 thousand SVs across the entire cohort of 3,202 samples with estimated false discovery rate (FDR) of 0.3%, 1.0%, and 1.8%, respectively. By comparison to the low-coverage phase 3 callset, we observed substantial improvements in variant discovery and estimated FDR that were facilitated by high coverage re-sequencing and expansion of the cohort. Specifically, we called 7% more SNVs, 59% more INDELs, and 170% more SVs per genome than the phase 3 callset. Moreover, we leveraged the presence of families in the cohort to achieve superior haplotype phasing accuracy and we demonstrate improvements that the high coverage panel brings especially for INDEL imputation. We make all the data generated as part of this project publicly available and we envision this updated version of the 1kGP callset to become the new de facto public resource for the worldwide scientific community working on genomics and genetics.


2019 ◽  
Author(s):  
Sarah E. Jensen ◽  
Jean Rigaud Charles ◽  
Kebede Muleta ◽  
Peter Bradbury ◽  
Terry Casstevens ◽  
...  

AbstractSuccessful management and utilization of increasingly large genomic datasets is essential for breeding programs to increase genetic gain and accelerate cultivar development. To help with data management and storage, we developed a sorghum Practical Haplotype Graph (PHG) pangenome database that stores all identified haplotypes and variant information for a given set of individuals. We developed two PHGs in sorghum, one with 24 individuals and another with 398 individuals, that reflect the diversity across genic regions of the sorghum genome. 24 founders of the Chibas sorghum breeding program were sequenced at low coverage (0.01x) and processed through the PHG to identify genome-wide variants. The PHG called SNPs with only 5.9% error at 0.01x coverage - only 3% lower than its accuracy when calling SNPs from 8x coverage sequence. Additionally, 207 progeny from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes in the progeny were imputed from the parental haplotypes available in the PHG and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from 0.57-0.73 for different traits, and are similar to prediction accuracies obtained with genotyping-by-sequencing (GBS) or markers from sequencing targeted amplicons (rhAmpSeq). This study provides a proof of concept for using a sorghum PHG to call and impute SNPs from low-coverage sequence data and also shows that the PHG can unify genotype calls from different sequencing platforms. By reducing the amount of input sequence needed, the PHG has the potential to decrease the cost of genotyping for genomic selection, making GS more feasible and facilitating larger breeding populations that can capture maximum recombination. Our results demonstrate that the PHG is a useful research and breeding tool that can maintain variant information from a diverse group of taxa, store sequence data in a condensed but readily accessible format, unify genotypes from different genotyping methods, and provide a cost-effective option for genomic selection for any species.


The flood of applications that demand massive data has imposed a challenge for 5G cellular network in order to deliver high data rates, a better quality of service, and low energy consumption. Heterogenous ultra- dense networks are one of the major technologies to address such challenges. HUDNs play a big role in a cellular system. They deliver cost-effective coverage with low transmit power and high capacty to face the risen data and the high expectations of the user's performance. In this paper, we introduce the impact of small cells on the cellular system and the technologies the small cells utilize to make the cellular system faces the subscriber's demands. First, we discuss the fundamentals of used technologies in small cells. Next, we studied the small cell management. Then, self-organizing networks are studied. After that, we have reviewed the small cell's power consumption, mobility, and handover. Finally, the real-world experience of mm-waves and MIMO in 5G small cells


Author(s):  
Manuel García-Goñi

Education programs are beneficial for patients with different chronic conditions. Prior studies have examined direct education, where information is transferred directly to patients. In contrast, in this program, information is transferred directly to nurses who become specialists and transfer education individually to patients. Hence, this paper evaluates the impact of having specialist nurses for stoma patients at hospitals, as those nurses provide healthcare to patients but also inform and educate patients about their condition and needs. The analysis uses an observational study with ostomized patients in Spain at hospitals with and without specialist nurses, and measures health service utilization and health-related quality of life (HRQL), besides performing a cost analysis and a cost-effectiveness analysis at both types of hospitals. The results show that patients with access to specialist nurses self-manage better, present lower adverse events and a better evolution of HRQL, and significantly demand more consultations with specialist nurses and less to A&E, primary care or specialists, resulting in important savings for the health system. Consequently, specializing or hiring nurses to provide indirect education to stoma patients is cost-effective and highly beneficial for patients. This type of indirect education strategy might be considered for specific conditions with low incidence or difficulties in identifying target patients or delivering information directly to them.


2004 ◽  
Vol 133 (1) ◽  
pp. 159-171 ◽  
Author(s):  
R. G. PEBODY ◽  
N. J. GAY ◽  
A. GIAMMANCO ◽  
S. BARON ◽  
J. SCHELLEKENS ◽  
...  

High titres of pertussis toxin (PT) antibody have been shown to be predictive of recent infection with Bordetella pertussis. The seroprevalence of standardized anti-PT antibody was determined in six Western European countries between 1994 and 1998 and related to historical surveillance and vaccine programme data. Standardized anti-PT titres were calculated for a series of whole-cell and acellular pertussis vaccine trials. For the serological surveys, high-titre sera (>125 units/ml) were distributed throughout all age groups in both high- (>90%) and low-coverage (<90%) countries. High-titre sera were more likely in infants in countries using high-titre-producing vaccines in their primary programme (Italy, 11·5%; Western Germany, 13·3%; France, 4·3%; Eastern Germany, 4·0%) compared to other countries (The Netherlands, 0·5%; Finland, 0%). Recent infection was significantly more likely in adolescents (10–19 years old) and adults in high-coverage countries (Finland, The Netherlands, France, East Germany), whereas infection was more likely in children (3–9 years old) than adolescents in low-coverage (<90%; Italy, West Germany, United Kingdom) countries. The impact and role of programmatic changes introduced after these surveys aimed at protecting infants from severe disease by accelerating the primary schedule or vaccinating older children and adolescents with booster doses can be evaluated with this approach.


2021 ◽  
Author(s):  
Fiolet ◽  
Yousra Kherabi ◽  
Conor MacDonald ◽  
Jade Ghosn ◽  
Nathan Peiffer-Smadja

Vaccines are critical cost-effective tools to control the COVID-19 pandemic. However, the emergence of more transmissible SARS-CoV-2 variants may threaten the potential herd immunity sought from mass vaccination campaigns.The objective of this study was to provide an up-to-date comparative analysis of the characteristics, adverse events, efficacy, effectiveness and impact of the variants of concern (Alpha, Beta, Gamma and Delta) for fourteen currently authorized COVID-19 vaccines (BNT16b2, mRNA-1273, AZD1222, Ad26.COV2.S, Sputnik V, NVX-CoV2373, Ad5-nCoV, CoronaVac, BBIBP-CorV, COVAXIN, Wuhan Sinopharm vaccine, QazCovid-In, Abdala and ZF200) and two vaccines (CVnCoV and NVX-CoV2373) currently in rolling review in several national drug agencies.Overall, all COVID-19 vaccines had a high efficacy against the traditional strain and the variants of SARS-CoV-2, and were well tolerated. BNT162b2, mRNA-1273 and Sputnik V had the highest efficacy (&gt;90%) after two doses at preventing symptomatic cases in phase III trials. Efficacy was ranging from 10.4% for AZD1222 in South Africa to 50% for NVX-CoV2373 in South Africa and 50 % for CoronaVac in Brazil, where the 501YV.2 and P1 variants were dominant. Seroneutralization studies showed a negligible reduction in neutralization activity against Alpha for most of vaccines, whereas the impact was modest for Delta. Beta and Gamma exhibited a greater reduction in neutralizing activity for mRNA vaccines, Sputnik V and CoronaVac. Regarding observational real-life data, most studies concerned the Pfizer and Moderna vaccines. Full immunization with mRNA vaccines effectively prevents SARS-CoV-2 infection against Alpha and Beta. All vaccines appeared to be safe and effective tools to prevent symptomatic and severe COVID-19, hospitalization and death against all variants of concern, but the quality of evidence greatly varied depending on the vaccines considered. There are remaining questions regarding specific populations excluded from trials, the duration of immunity and heterologous vaccination. Serious adverse event and particularly anaphylaxis (2.5-4.7 cases per million doses among adults) and myocarditis (3.5 cases per million) for mRNA vaccines ; thrombosis with thrombocytopenia syndrome for Janssen vaccine (3 cases per million) and AstraZeneca vaccine (2 cases per million) and Guillain-Barre syndrome (7.8 cases per million) for Janssen vaccine are very rare. COVID-19 vaccine benefits outweigh risks, despite rare serious adverse effect.


2018 ◽  
Author(s):  
Torsten Günther ◽  
Carl Nettelblad

AbstractHigh quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map suc-cessfully, or receive higher quality scores. This reference bias can have effects on downstream population genomic analysis when heterozygous sites are falsely considered homozygous for the reference allele.In palaeogenomic studies of human populations, mapping against the human reference genome is used to identify endogenous human sequences. Ancient DNA studies usually operate with low sequencing coverages and fragmentation of DNA molecules causes a large proportion of the sequenced fragments to be shorter than 50 bp – reducing the amount of accepted mismatches, and increasing the probability of multiple matching sites in the genome. These ancient DNA specific properties are potentially exacerbating the impact of reference bias on downstream analyses, especially since most studies of ancient human populations use pseudohaploid data, i.e. they randomly sample only one sequencing read per site.We show that reference bias is pervasive in published ancient DNA sequence data of pre-historic humans with some differences between individual genomic regions. We illustrate that the strength of reference bias is negatively correlated with fragment length. Reference bias can cause differences in the results of downstream analyses such as population affinities, heterozygosity estimates and estimates of archaic ancestry. These spurious results highlight how important it is to be aware of these technical artifacts and that we need strategies to mitigate the effect. Therefore, we suggest some post-mapping filtering strategies to resolve reference bias which help to reduce its impact substantially.


Sign in / Sign up

Export Citation Format

Share Document