scholarly journals HapIso: An Accurate Method for the Haplotype-Specific Isoforms Reconstruction from Long Single-Molecule Reads

2016 ◽  
Author(s):  
Serghei Mangul ◽  
Harry (Taegyun) Yang ◽  
Farhad Hormozdiari ◽  
Elizabeth Tseng ◽  
Alex Zelikovsky ◽  
...  

AbstractSequencing of RNA provides the possibility to study an individual’s transcriptome landscape and determine allelic expression ratios. Single-molecule protocols generate multi-kilobase reads longer than most transcripts allowing sequencing of complete haplotype isoforms. This allows partitioning the reads into two parental haplotypes. While the read length of the single-molecule protocols is long, the relatively high error rate limits the ability to accurately detect the genetic variants and assemble them into the haplotype-specific isoforms. In this paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a method able to tolerate the relatively high error-rate of the single-molecule platform and partition the isoform reads into the parental alleles. Phasing the reads according to the allele of origin allows our method to efficiently distinguish between the read errors and the true biological mutations. HapIso uses a k-means clustering algorithm aiming to group the reads into two meaningful clusters maximizing the similarity of the reads within cluster and minimizing the similarity of the reads from different clusters. Each cluster corresponds to a parental haplotype. We use family pedigree information to evaluate our approach. Experimental validation suggests that HapIso is able to tolerate the relatively high error-rate and accurately partition the reads into the parental alleles of the isoform transcripts. Furthermore, our method is the first method able to reconstruct the haplotype-specific isoforms from long single-molecule reads.The open source Python implementation of HapIso is freely available for download at https://github.com/smangul1/HapIso/

2016 ◽  
Author(s):  
Alexander Artyomenko ◽  
Nicholas C Wu ◽  
Serghei Mangul ◽  
Eleazar Eskin ◽  
Ren Sun ◽  
...  

AbstractAs a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv


1993 ◽  
Vol 41 (6) ◽  
pp. 852-863 ◽  
Author(s):  
Q. Wang ◽  
G. Li ◽  
V.K. Bhargava ◽  
L.J. Mason

2021 ◽  
Author(s):  
Dan Levy ◽  
Zihua Wang ◽  
Andrea Moffitt ◽  
Michael H. Wigler

Replication of tandem repeats of simple sequence motifs, also known as microsatellites, is error prone and variable lengths frequently occur during population expansions. Therefore, microsatellite length variations could serve as markers for cancer. However, accurate error-free quantitation of microsatellite lengths is difficult with current methods because of a high error rate during amplification and sequencing. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure so that it can replicate faithfully, yet not so much that the flanking regions cannot be reliably identified. In this work we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring two independent first copies of an initial template, we reach error rates below one in a million. We discuss potential clinical applications of this method.


Author(s):  
Serghei Mangul ◽  
Harry Yang ◽  
Farhad Hormozdiari ◽  
Elizabeth Tseng ◽  
Alex Zelikovsky ◽  
...  

1984 ◽  
Vol 20 (23) ◽  
pp. 986 ◽  
Author(s):  
M. Moeneclaey ◽  
H. Bruneel
Keyword(s):  

2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 39-40
Author(s):  
Pattarapol Sumreddee ◽  
Sajjad Toghiani ◽  
Andrew J Roberts ◽  
El H Hay ◽  
Samuel E Aggrey ◽  
...  

Abstract Pedigree information was traditionally used to assess inbreeding. Availability of high-density marker panels provides an alternative to assess inbreeding, particularly in the presence of incomplete and error-prone pedigrees. Assessment of autozygosity across chromosomal segments using runs of homozygosity (ROH) is emerging as a valuable tool to estimate inbreeding due to its general flexibility and ability to quantify chromosomal contribution to genome-wide inbreeding. Unfortunately, identifying ROH segments is sensitive to the parameters used during the search process. These parameters are heuristically set, leading to significant variation in the results. The minimum length required to identify a ROH segment has major effects on the estimation of inbreeding, yet it is arbitrarily set. Understanding the rise, purging, and the effects of deleterious mutations requires the ability to discriminate between ancient and recent inbreeding. However, thresholds to discriminate between short and long ROH segments are largely unknown. To address these questions, an inbred Hereford cattle population of 785 animals genotyped for 30,220 SNPs was used. A search algorithm to approximate mutation loads was used to determine the minimum length of ROH segments. It consisted of finding genome segments with significant differences in trait means between animals with high and low autozygosity intervals at certain threshold values. The minimum length was around 1 Mb for weaning and yearling weights and ADG, and 2.5 Mb for birth weight. Using a model-based clustering algorithm, a mixture of three Gaussian distributions was clearly separable, resulting in three classes of short (< 6.16 Mb), medium (6.16–12.57 Mb), and long (>12.27 Mb) ROH segments, representing ancient, intermediate, and recent inbreeding. Contribution of ancient, intermediate and recent to genome-wide inbreeding was 37.4%, 40.1% and 22.5%, respectively. Inbreeding depression analyses showed a greater damaging effect of recent inbreeding, likely due to purging of old highly deleterious haplotypes.


Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 460 ◽  
Author(s):  
Yuta Suzuki ◽  
Yunhao Wang ◽  
Kin Au ◽  
Shinichi Morishita

We address the problem of observing personal diploid methylomes, CpG methylome pairs of homologous chromosomes that are distinguishable with respect to phased heterozygous variants (PHVs), which is challenging due to scarcity of PHVs in personal genomes. Single molecule real-time (SMRT) sequencing is promising as it outputs long reads with CpG methylation information, but a serious concern is whether reliable PHVs are available in erroneous SMRT reads with an error rate of ∼15%. To overcome the issue, we propose a statistical model that reduces the error rate of phasing CpG site to 1%, thereby calling CpG hypomethylation in each haplotype with >90% precision and sensitivity. Using our statistical model, we examined GNAS complex locus known for a combination of maternally, paternally, or biallelically expressed isoforms, and observed allele-specific methylation pattern almost perfectly reflecting their respective allele-specific expression status, demonstrating the merit of elucidating comprehensive personal diploid methylomes and transcriptomes.


1981 ◽  
Vol 38 (9) ◽  
pp. 1168-1170 ◽  
Author(s):  
Harold E. Welch ◽  
Kenneth H. Mills

Fish can be permanently marked by scarring soft fin rays. Advantages over existing marking methods include rapidity of application, permanence, individual identification, low costs, and lack of adverse effects caused by the mark. Disadvantages include lack of recognition by untrained observers and a relatively high error rate when reading marks.Key words: fish marking, fin rays


Sign in / Sign up

Export Citation Format

Share Document