mapping bias
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 15)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Charles Markello ◽  
Charles Huang ◽  
Alex Rodriguez ◽  
Andrew Carroll ◽  
Pi-Chuan Chang ◽  
...  

Methods that use a linear genome reference for genome sequencing data analysis are reference biased. In the field of clinical genetics for rare diseases, a resulting reduction in genotyping accuracy in some regions has likely prevented the resolution of some cases. Pangenome graphs embed population variation into a reference structure. While pangenome graphs have helped to reduce reference mapping bias, further performance improvements are possible. We introduce VG-Pedigree, a pedigree-aware workflow based on the pangenome-mapping tool of Giraffe (Sirén et al. 2021) and the variant-calling tool DeepTrio (Kolesnikov et al. 2021) using a specially-trained model for Giraffe-based alignments. We demonstrate mapping and variant calling improvements in both single-nucleotide variants (SNVs) and insertion and deletion (INDEL) variants over those produced by alignments created using BWA MEM to a linear-reference and Giraffe mapping to a pangenome graph containing data from the 1000 Genomes Project. We have also adapted and upgraded the deleterious-variant (DV) detecting methods and programs of Gu et al. into a streamlined workflow (Gu et al. 2019). We used these workflows in combination to detect small lists of candidate DVs among 15 family quartets and quintets of the Undiagnosed Diseases Program (UDP). All candidate DVs that were previously diagnosed using the mendelian models covered by the previously published Gu et al. methods were recapitulated by these workflows. The results of these experiments indicate a slightly greater absolute count of DVs are detected in the proband population than in their matched unaffected siblings.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuhua Zhan ◽  
Cortland Griswold ◽  
Lewis Lukens

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.


2021 ◽  
Author(s):  
Erin Conwell ◽  
Felix Pichardo ◽  
Gregor Horvath ◽  
Amanda Auen

Children’s ability to learn words with multiple meanings may be hindered by their adherence to a one-to-one form-to-meaning mapping bias. Previous research on children’s learning of pseudohomophones has yielded mixed results, suggesting a range of factors that may impact when children entertain a new meaning for a familiar word. One such factor is repetition of the new meaning (Storkel & Maekawa, 2005) and another is the acoustic differentiation of the two meanings (Conwell, 2017). This study asked 72 4-year-old English-learning children to assign novel meanings to familiar words and manipulated how many times they heard the words with their new referents as well as whether the productions were acoustically longer than typical productions of the words. The results show that repetition supports the learning of a pseudohomophone, but acoustic differentiation does not. There was no evidence of an interaction of the two factors. Homophone learning is facilitated by increased exposure to a second meaning, but children do not use acoustic differences in homophone learning, despite the availability of such differences in their experience.


2021 ◽  
Author(s):  
S. Sánchez-Ramírez ◽  
A. D. Cutter

ABSTRACTSummaryChanges to regulatory sequences account for important phenotypic differences between species and populations. In heterozygote individuals, regulatory polymorphism typically manifests as allele-specific expression (ASE) of transcripts. ASE data from inter-species and inter-population hybrids, in conjunction with expression data from the parents, can be used to infer regulatory changes in cis and trans throughout the genome. Improper data handling, however, can create problems of mapping bias and excessive loss of information, which are prone to arise unintentionally from the cumbersome pipelines with multiple dependencies that are common among current methods. Here, we introduce a new, selfcontained method implemented in Python that generates allele-specific expression counts from genotype-specific map alignments. Rather than assessing individual SNPs, our approach sorts and counts reads within a given homologous region by comparing individual read-mapping statistics from each parental alignment. Reads that are aligned ambiguously to both references are resolved proportionally to the allele-specific matching read counts or statistically using a binomial distribution. Using simulations, we show CompMap has low error rates in assessing regulatory divergence.AvailabilityThe Python code with examples and installation instructions is available on the GitHub repository https://github.com/santiagosnchez/[email protected] information


2020 ◽  
Author(s):  
Emiliana Weiss ◽  
Heloisa S. Andrade ◽  
Juliana Rodrigues Lara ◽  
Andreia S. Souza ◽  
Michelle A. Paz ◽  
...  

AbstractKIR2DL4 is an important immune modulator expressed in Natural Killer cells, being HLA-G its main ligand. We characterize KIR2DL4 gene diversity considering the promoter, all exons, and all introns, in a highly admixed Brazilian population sample using massively parallel sequencing. We also introduce a molecular method to amplify and sequence the complete KIR2DL4 gene. To avoid mapping bias and genotype errors commonly observed in gene families, we have developed a bioinformatic pipeline designed to minimize mapping, genotyping, and haplotyping errors. We have applied this method to survey the variability of 220 samples from the State of São Paulo, southeastern Brazil. We have also compared the KIR2DL4 genetic diversity in Brazilian samples with the previously reported by the 1000Genomes consortium. KIR2DL4 presents high linkage disequilibrium throughout the gene, with coding sequences associated with specific promoters. There were few, but divergent, promoter haplotypes. We have also detected many new KIR2DL4 sequences, all with nucleotide exchanges in introns and encoding previously described proteins. Exons 3 and 4, which encode the external domains, were the most variable ones. The ancestry background influences KIR2DL4 allele frequencies and must be considered for association studies regarding KIR2DL4.


2020 ◽  
Author(s):  
Jeremy Kuhn ◽  
Carlo Geraci ◽  
Philippe Schlenker ◽  
Brent Strickland

The idea that the form of a word reflects information about its meaning has its roots in Platonic philosophy, and has been experimentally investigated for concrete, sensory-based properties since the early 20th century. Here, we provide evidence for an abstract property of ‘boundedness’ that introduces a systematic, iconic bias on the phonological expectations of a novel lexicon. We show that this abstract property is general across events and objects. In Experiment 1, we show that subjects are systematically more likely to associate sign language signs that end with a gestural boundary with telic verbs (denoting events with temporal boundaries, e.g., die, arrive) and with count nouns (denoting objects with spatial boundaries, e.g., ball, coin). In Experiments 2-3, we show that this iconic mapping acts on conceptual representations, not on grammatical features. Specifically, the mapping does not carry over to psychological nouns (e.g. people are not more likely to associate a gestural boundary with idea than with knowledge). Although these psychological nouns are still syntactically encoded as either count or mass, they do not denote objects that are conceived of as having spatial boundaries. The mapping bias thus breaks down. Experiments 4-5 replicate these findings with a new set of stimuli. Finally, in Experiments 6-11, we explore possible extensions to a similar bias for spoken language stimuli, with mixed results. Generally, the results here suggest that ‘boundedness’ of words’ referents (in space or time) has a powerful effect on intuitions regarding the form that the words should take.


Water ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 801 ◽  
Author(s):  
Brian Ayugi ◽  
Guirong Tan ◽  
Niu Ruoyun ◽  
Hassen Babaousmail ◽  
Moses Ojara ◽  
...  

This study uses the quantile mapping bias correction (QMBC) method to correct the bias in five regional climate models (RCMs) from the latest output of the Rossby Center Climate Regional Model (RCA4) over Kenya. The outputs were validated using various scalar metrics such as root-mean-square difference (RMSD), mean absolute error (MAE), and mean bias. The study found that the QMBC algorithm demonstrates varying performance among the models in the study domain. The results show that most of the models exhibit reasonable improvement after corrections at seasonal and annual timescales. Specifically, the European Community Earth-System (EC-EARTH) and Commonwealth Scientific and Industrial Research Organization (CSIRO) models depict remarkable improvement as compared to other models. On the contrary, the Institute Pierre Simon Laplace Model CM5A-MR (IPSL-CM5A-MR) model shows little improvement across the rainfall seasons (i.e., March–May (MAM) and October–December (OND)). The projections forced with bias-corrected historical simulations tallied observed values demonstrate satisfactory simulations as compared to the uncorrected RCMs output models. This study has demonstrated that using QMBC on outputs from RCA4 is an important intermediate step to improve climate data before performing any regional impact analysis. The corrected models may be used in projections of drought and flood extreme events over the study area.


Author(s):  
Brian Ayugi ◽  
Guirong Tan ◽  
Rouyun Niu ◽  
Hassen Babaousmail ◽  
Moses Ojara ◽  
...  

Accurate assessment and projections of extreme climate events requires the use of climate datasets with no or minimal error. This study uses quantile mapping bias correction (QMBC) method to correct the bias of five Regional Climate Models (RCMs) from the latest output of Rossby Climate Model Center (RCA4) over Kenya, East Africa. The outputs were validated using various scalar metrics such as Root Mean Square Difference (RMSD), Mean Absolute Error (MAE) and mean Bias. The study found that the QMBC algorithm demonstrate varying performance among the models in the study domain. The results show that most of the models exhibit significant improvement after corrections at seasonal and annual timescales. Specifically, the European community Earth-System (EC-EARTH) and Commonwealth Scientific and Industrial Research Organization (CSIRO) models depict exemplary improvement as compared to other models. On the contrary, the Institute Pierre Simon Laplace Model CM5A-MR (IPSL-CM5A-MR) model show little improvement across various timescales (i.e. March-April-May (MAM) and October-November-December (OND)). The projections forced with bias corrected historical simulations tallied observed values demonstrate satisfactory simulations as compared to the uncorrected RCMs output models. This study has demonstrated that using QMBC on outputs from RCA4 is an important intermediate step to improve climate data prior to performing any regional impact analysis. The corrected models can be used for projections of drought and flood extreme events over the study area. This study analysis is crucial from the sustainable planning for adaptation and mitigation of climate change and disaster risk reduction perspective.


Sign in / Sign up

Export Citation Format

Share Document