scholarly journals Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2

Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 141 ◽  
Author(s):  
Feichen Shen ◽  
Jeffrey M. Kidd

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.

2008 ◽  
Vol 7 (4) ◽  
pp. 319-326 ◽  
Author(s):  
Hylke M Blauw ◽  
Jan H Veldink ◽  
Michael A van Es ◽  
Paul W van Vught ◽  
Christiaan GJ Saris ◽  
...  

2020 ◽  
Vol 52 (1) ◽  
Author(s):  
Dailu Guan ◽  
Amparo Martínez ◽  
Anna Castelló ◽  
Vincenzo Landi ◽  
María Gracia Luigi-Sierra ◽  
...  

2017 ◽  
Author(s):  
Vijay Kumar ◽  
Julie Rosenbaum ◽  
Zihua Wang ◽  
Talitha Forcier ◽  
Michael Ronemus ◽  
...  

ABSTRACTWe introduce a new protocol, mutational sequencing or muSeq, which randomly deaminates unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling a 135,000 fragment PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2308 ◽  
Author(s):  
Rediat Tewolde ◽  
Timothy Dallman ◽  
Ulf Schaefer ◽  
Carmen L. Sheppard ◽  
Philip Ashton ◽  
...  

Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets fromSalmonella enteridisandStreptococcus pneumoniae. Of the 323 samples, 92.9% (n= 300), 97.5% (n= 315) and 99.7% (n= 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n= 49) and 67.3% (n= 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Nedenia Bonvino Stafuzza ◽  
Rafael Medeiros de Oliveira Silva ◽  
Breno de Oliveira Fragomeni ◽  
Yutaka Masuda ◽  
Yijian Huang ◽  
...  

BMC Genomics ◽  
2010 ◽  
Vol 11 (1) ◽  
pp. 357 ◽  
Author(s):  
Krishna R Kalari ◽  
Scott J Hebbring ◽  
High Chai ◽  
Liang Li ◽  
Jean-Pierre A Kocher ◽  
...  

PLoS ONE ◽  
2015 ◽  
Vol 10 (5) ◽  
pp. e0128465 ◽  
Author(s):  
Julia Brenndörfer ◽  
André Altmann ◽  
Regina Widner-Andrä ◽  
Benno Pütz ◽  
Darina Czamara ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document