mouse dataset Latest Research Papers

Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele specific expression (BASE)

10.1101/2020.10.01.322362 ◽

2020 ◽

Author(s):

Brecca Miller ◽

Alison Morse ◽

Jacqueline E. Borgert ◽

Zihao Liu ◽

Kelsey Sinclair ◽

...

Keyword(s):

Hypothesis Test ◽

Bioinformatics Pipeline ◽

Specific Expression ◽

Regulatory Variation ◽

Reduction Techniques ◽

Direct Cross ◽

Allele Specific ◽

Diploid Individual ◽

Prohibitive Cost ◽

Mouse Dataset

ABSTRACTAllelic imbalance (AI) occurs when alleles in a diploid individual are differentially expressed and indicates cis acting regulatory variation. What is the distribution of allelic effects in a natural population? Are all alleles the same? Are all alleles distinct? Tests of allelic effect are performed by crossing individuals and comparing expression between alleles directly in the F1. However, a crossing scheme that compares alleles pairwise is a prohibitive cost for more than a handful of alleles as the number of crosses is at least (n2-n)/2 where n is the number of alleles. We show here that a testcross design followed by a hypothesis test of AI between testcrosses can be used to infer differences between non-tester alleles, allowing n alleles to be compared with n crosses. Using a mouse dataset where both testcrosses and direct comparisons have been performed, we show that ∼75% of the predicted differences between non-tester alleles are validated in a background of ∼10% differences in AI. The testing for AI involves several complex bioinformatics steps. BASE is a complete bioinformatics pipeline that incorporates state-of-the-art error reduction techniques and a flexible Bayesian approach to estimating AI and formally comparing levels of AI between conditions. The modular structure of BASE has been packaged in Galaxy, made available in Nextflow and sbatch. (https://github.com/McIntyre-Lab/BASE_2020). In the mouse data, the direct test identifies more cis effects than the testcross. Cis-by-trans interactions with trans-acting factors on the X contributing to observed cis effects in autosomal genes in the direct cross remains a possible explanation for the discrepancy.

Information theoretic alignment free variant calling

PeerJ Computer Science ◽

10.7717/peerj-cs.71 ◽

2016 ◽

Vol 2 ◽

pp. e71

Author(s):

Justin Bedo ◽

Benjamin Goudey ◽

Jeremy Wazny ◽

Zeyu Zhou

Keyword(s):

Sequence Data ◽

Multinomial Distribution ◽

Variant Calling ◽

Whole Genome Sequence ◽

Reference Sequence ◽

Information Theoretic ◽

Learning Tasks ◽

Leibler Divergence ◽

Suitable Reference ◽

Mouse Dataset

While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of lengthkas a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence.The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.

Information theoretic alignment free variant calling

10.7287/peerj.preprints.2015 ◽

2016 ◽

Author(s):

Justin Bedo ◽

Benjamin Goudey ◽

Jeremy Wazny ◽

Zeyu Zhou

Keyword(s):

Sequence Data ◽

Multinomial Distribution ◽

Variant Calling ◽

Whole Genome Sequence ◽

Reference Sequence ◽

Information Theoretic ◽

Learning Tasks ◽

Leibler Divergence ◽

Suitable Reference ◽

Mouse Dataset

While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of length k as a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence. The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.

Information theoretic alignment free variant calling

10.7287/peerj.preprints.2015v1 ◽

2016 ◽

Author(s):

Justin Bedo ◽

Benjamin Goudey ◽

Jeremy Wazny ◽

Zeyu Zhou

Keyword(s):

Sequence Data ◽

Multinomial Distribution ◽

Variant Calling ◽

Whole Genome Sequence ◽

Reference Sequence ◽

Information Theoretic ◽

Learning Tasks ◽

Leibler Divergence ◽

Suitable Reference ◽

Mouse Dataset

While traditional methods for calling variants across whole genome sequence data rely on alignment to an appropriate reference sequence, alternative techniques are needed when a suitable reference does not exist. We present a novel alignment and assembly free variant calling method based on information theoretic principles designed to detect variants have strong statistical evidence for their ability to segregate samples in a given dataset. Our method uses the context surrounding a particular nucleotide to define variants. Given a set of reads, we model the probability of observing a given nucleotide conditioned on the surrounding prefix and suffixes of length k as a multinomial distribution. We then estimate which of these contexts are stable intra-sample and varying inter-sample using a statistic based on the Kullback–Leibler divergence. The utility of the variant calling method was evaluated through analysis of a pair of bacterial datasets and a mouse dataset. We found that our variants are highly informative for supervised learning tasks with performance similar to standard reference based calls and another reference free method (DiscoSNP++). Comparisons against reference based calls showed our method was able to capture very similar population structure on the bacterial dataset. The algorithm’s focus on discriminatory variants makes it suitable for many common analysis tasks for organisms that are too diverse to be mapped back to a single reference sequence.

mouse dataset
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele specific expression (BASE)

Information theoretic alignment free variant calling

Information theoretic alignment free variant calling

Information theoretic alignment free variant calling

Export Citation Format

mouse datasetRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele specific expression (BASE)

Information theoretic alignment free variant calling

Information theoretic alignment free variant calling

Information theoretic alignment free variant calling

mouse dataset
Recently Published Documents