scholarly journals HAHap: a read-based haplotyping method using hierarchical assembly

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5852
Author(s):  
Yu-Yu Lin ◽  
Ping Chun Wu ◽  
Pei-Lung Chen ◽  
Yen-Jen Oyang ◽  
Chien-Yu Chen

Background The need for read-based phasing arises with advances in sequencing technologies. The minimum error correction (MEC) approach is the primary trend to resolve haplotypes by reducing conflicts in a single nucleotide polymorphism-fragment matrix. However, it is frequently observed that the solution with the optimal MEC might not be the real haplotypes, due to the fact that MEC methods consider all positions together and sometimes the conflicts in noisy regions might mislead the selection of corrections. To tackle this problem, we present a hierarchical assembly-based method designed to progressively resolve local conflicts. Results This study presents HAHap, a new phasing algorithm based on hierarchical assembly. HAHap leverages high-confident variant pairs to build haplotypes progressively. The phasing results by HAHap on both real and simulated data, compared to other MEC-based methods, revealed better phasing error rates for constructing haplotypes using short reads from whole-genome sequencing. We compared the number of error corrections (ECs) on real data with other methods, and it reveals the ability of HAHap to predict haplotypes with a lower number of ECs. We also used simulated data to investigate the behavior of HAHap under different sequencing conditions, highlighting the applicability of HAHap in certain situations.

2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Jingyu Guo ◽  
Hongliang Qi ◽  
Yuan Xu ◽  
Zijia Chen ◽  
Shulong Li ◽  
...  

Limited-angle computed tomography (CT) has great impact in some clinical applications. Existing iterative reconstruction algorithms could not reconstruct high-quality images, leading to severe artifacts nearby edges. Optimal selection of initial image would influence the iterative reconstruction performance but has not been studied deeply yet. In this work, we proposed to generate optimized initial image followed by total variation (TV) based iterative reconstruction considering the feature of image symmetry. The simulated data and real data reconstruction results indicate that the proposed method effectively removes the artifacts nearby edges.


2014 ◽  
Vol 26 (1) ◽  
pp. 12 ◽  
Author(s):  
C. Ponsart ◽  
D. Le Bourhis ◽  
H. Knijn ◽  
S. Fritz ◽  
C. Guyader-Joly ◽  
...  

Genomic tools are now available for most livestock species and are used routinely for genomic selection (GS) in cattle. One of the most important developments resulting from the introduction of genomic testing for dairy cattle is the application of reasonably priced low-density single nucleotide polymorphism technology in the selection of females. In this context, combining genome testing and reproductive biotechnologies in young heifers enables new strategies to generate replacement and elite females in a given period of time. Moreover, multiple markers have been detected in biopsies of preimplantation stage embryos, thus paving the way to develop new strategies based on preimplantation diagnosis and the genetic screening of embryos. Based on recent advances in GS, the present review focuses on new possibilities inherent in reproductive technologies used for commercial purposes and in genetic schemes, possible side effects and beneficial impacts on reproductive efficiency. A particular focus is on the different steps allowing embryo genotyping, including embryo micromanipulation, DNA production and quality assessment.


Mobile DNA ◽  
2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Pol Vendrell-Mir ◽  
Fabio Barteri ◽  
Miriam Merenciano ◽  
Josefa González ◽  
Josep M. Casacuberta ◽  
...  

Abstract Background Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. Results We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. Conclusions We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.


Plants ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1004
Author(s):  
Hiroshi Kato ◽  
Feng Li ◽  
Akemi Shimizu

We have succeeded in selecting four higher yield mutants from five gamma-ray irradiated high-yielding Japanese rice varieties using a novel approach. A total of 464 M2 plants which had heavier total panicle weights per plant were first selected from 9801 irradiated M2 plants. Their higher yields were confirmed by yield trials conducted for three years with a six to ten-pairwise replicated plot design. FukuhibikiH6 and FukuhibikiH8 were selected from an irradiated high-yielding variety Fukuhibiki and showed 1.2% to 22.5% higher yield than their original significantly. YamadawaraH3 was selected from an irradiated high-yielding variety Yamadawara and its yield advantages were 2.7% to 3.9%. However, there was no difference in the genotypes of the 96 SNP (single nucleotide polymorphism) markers between the higher yield mutants and their respective original varieties. The differences in the measured phenotypical traits between each mutant and its original variety were not constant and the actual differences were marginal. Therefore, the higher yields of the selected mutants were likely to have been caused by physiological traits rather than phenotypical traits. The selection method used in this study is an application of the directed evolution method which has long been commonly used in the substantial improvements of microorganisms and their proteins.


2016 ◽  
Vol 14 (05) ◽  
pp. 1644003 ◽  
Author(s):  
Kento Kodama ◽  
Hiroto Saigo

Despite the accumulation of quantitative trait loci (QTL) data in many complex human diseases, most of current approaches that have attempted to relate genotype to phenotype have achieved limited success, and genetic factors of many common diseases are yet remained to be elucidated. One of the reasons that makes this problem complex is the existence of single nucleotide polymorphism (SNP) interaction, or epistasis. Due to excessive amount of computation for searching the combinatorial space, existing approaches cannot fully incorporate high-order SNP interactions into their models, but limit themselves to detecting only lower-order SNP interactions. We present an empirical approach based on ridge regression with polynomial kernels and model selection technique for determining the true degree of epistasis among SNPs. Computer experiments in simulated data show the ability of the proposed method to correctly predict the number of interacting SNPs provided that the number of samples is large enough relative to the number of SNPs. For cases in which the number of the available samples is limited, we propose to perform sliding window approach to ensure sufficiently large sample/SNP ratio in each window. In computational experiments using heterogeneous stock mice data, our approach has successfully detected subregions that harbor known causal SNPs. Our analysis further suggests the existence of additional candidate causal SNPs interacting to each other in the neighborhood of the known causal gene. Software is available from https://github.com/HirotoSaigo/KDSNP .


Author(s):  
Kotaro Dokan ◽  
Sayu Kawamura ◽  
Kosuke M Teshima

Abstract Single nucleotide polymorphism (SNP) data are widely used in research on natural populations. Although they are useful, SNP genotyping data are known to contain bias, normally referred to as ascertainment bias, because they are conditioned by already confirmed variants. This bias is introduced during the genotyping process, including the selection of populations for novel SNP discovery and the number of individuals involved in the discovery panel and selection of SNP markers. It is widely recognized that ascertainment bias can cause inaccurate inferences in population genetics and several methods to address these bias issues have been proposed. However, especially in natural populations, it is not always possible to apply an ideal ascertainment scheme because natural populations tend to have complex structures and histories. In addition, it was not fully assessed if ascertainment bias has the same effect on different types of population structure. Here we examine the effects of bias produced during the selection of population for SNP discovery and consequent SNP marker selection processes under three demographic models: the island, stepping-stone, and population split models. Results show that site frequency spectra and summary statistics contain biases that depend on the joint effect of population structure and ascertainment schemes. Additionally, population structure inferences are also affected by ascertainment bias. Based on these results, it is recommended to evaluate the validity of the ascertainment strategy prior to the actual typing process because the direction and extent of ascertainment bias vary depending on several factors.


Author(s):  
Aleksey V. Zimin ◽  
Steven L. Salzberg

AbstractThe introduction of third-generation DNA sequencing technologies in recent years has allowed scientists to generate dramatically longer sequence reads, which when used in whole-genome sequencing projects have yielded better repeat resolution and far more contiguous genome assemblies. While the promise of better contiguity has held true, the relatively high error rate of long reads, averaging 8–15%, has made it challenging to generate a highly accurate final sequence. Current long-read sequencing technologies display a tendency toward systematic errors, in particular in homopolymer regions, which present additional challenges. A cost-effective strategy to generate highly contiguous assemblies with a very low overall error rate is to combine long reads with low-cost short-read data, which currently have an error rate below 0.5%. This hybrid strategy can be pursued either by incorporating the short-read data into the early phase of assembly, during the read correction step, or by using short reads to “polish” the consensus built from long reads. In this report, we present the assembly polishing tool POLCA (POLishing by Calling Alternatives) and compare its performance with two other popular polishing programs, Pilon and Racon. We show that on simulated data POLCA is more accurate than Pilon, and comparable in accuracy to Racon. On real data, all three programs show similar performance, but POLCA is consistently much faster than either of the other polishing programs.


Author(s):  
M. A. Kosataya ◽  
A. V. Marina ◽  
A. V. Sergeeva ◽  
T. V. Zhilyaeva ◽  
A. S. Blagonravova ◽  
...  

To predict the development of side effects and personalized selection of drug therapy, it seems relevant to study the association of individual genetic risk factors with undesirable side effects of antipsychotics in chronic patients with schizophrenia. Thereby the purpose of this study was to evaluate the association between the carriage of the T-allele of the single nucleotide genetic polymorphism MTHFR677C>T and the severity of the metabolic side effects of antipsychotics. As a result of the study a greater risk of developing components of the metabolic syndrome was revealed, as well as a significantly higher fasting blood glucose level in the group of carriers of the minor T-allele of polymorphism MTHFR677C>T.


Sign in / Sign up

Export Citation Format

Share Document