scholarly journals Sensitive detection of DNA contamination in tumor samples via microhaplotypes

2020 ◽  
Author(s):  
Brett Whitty ◽  
John F. Thompson

AbstractBackgroundLow levels of sample contamination can have disastrous effects on the accurate identification of somatic variation in tumor samples. Detection of sample contamination in DNA is generally based on observation of low frequency variants that suggest more than a single source of DNA is present. This strategy works with standard DNA samples but is especially problematic in solid tumor FFPE samples because there can be huge variations in allele frequency (AF) due to massive copy number changes arising from large gains and losses across the genome. The tremendously variable allele frequencies make detection of contamination challenging. A method not based on individual AF is needed for accurate determination of whether a sample is contaminated and to what degree.MethodsWe used microhaplotypes to determine whether sample contamination is present. Microhaplotypes are sets of variants on the same sequencing read that can be unambiguously phased. Instead of measuring AF, the number and frequency of microhaplotypes is determined. Contamination detection becomes based on fundamental genomic properties, linkage disequilibrium (LD) and the diploid nature of human DNA, rather than variant frequencies. We optimized microhaplotype content based on 164 single nucleotide variant sets located in genes already sequenced within a cancer panel. Thus, contamination detection uses existing sequence data and does not require sequencing of any extraneous regions. The content is chosen based on LD data from the 1000 Genomes Project to be ancestry agnostic, providing the same sensitivity for contamination detection with samples from individuals of African, East Asian, and European ancestry.ResultsDetection of contamination at 1% and below is possible using this design. The methods described here can also be extended to other DNA mixtures such as forensic and non-invasive prenatal testing samples where DNA mixes of 1% or less can be similarly detected.ConclusionsThe microhaplotype method allows sensitive detection of DNA contamination in FFPE tumor samples. These methods provide a foundation for examining DNA mixtures in a variety of contexts. With the appropriate panels and high sequencing depth, low levels of secondary DNA can be detected and this can be valuable in a variety of applications.

2015 ◽  
Vol 7 (26) ◽  
pp. 14243-14253 ◽  
Author(s):  
Ponnaboina Thirupathi ◽  
Joo-Young Park ◽  
Lok Nath Neupane ◽  
Mallela Y. L. N. Kishore ◽  
Keun-Hyeung Lee

Plant Disease ◽  
2021 ◽  
Author(s):  
Terry Torres-Cruz ◽  
Briana Whitaker ◽  
Robert Proctor ◽  
Kirk Broders ◽  
Imane Laraba ◽  
...  

Species within Fusarium are of global agricultural, medical, and food/feed safety concern and have been extensively characterized. However, accurate identification of species is challenging and usually requires DNA sequence data. FUSARIUM-ID (http://isolate.fusariumdb.org/) is a publicly available database designed to support the identification of Fusarium species using sequences of multiple phylogenetically informative loci, especially the highly informative ~680 bp 5' portion of the translation elongation factor 1-alpha (TEF1) gene that has been adopted as the primary barcoding locus in the genus. However, FUSARIUM-ID v.1.0 and 2.0 had several limitations, including inconsistent metadata annotation for the archived sequences and poor representation of some species complexes and marker loci. Here, we present FUSARIUM-ID v.3.0, which provides the following improvements: (i) additional and updated annotation of metadata for isolates associated with each sequence, (ii) expanded taxon representation in the TEF1 sequence database, (iii) availability of the sequence database as a downloadable file to enable local BLAST queries, and (iv) a tutorial file for users to perform local BLAST searches using either freely-available software, such as SequenceServer, BLAST+ executable in the command line, and Galaxy, or the proprietary Geneious software. FUSARIUM-ID will be updated on a regular basis by archiving sequences of TEF1 and other loci from newly identified species and greater in-depth sampling of currently recognized species.


F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 915
Author(s):  
Muhammad Afrisal ◽  
Yukio Iwatsuki ◽  
Andi Iqbal Burhanuddin

Background: The Lethrinidae (emperors) include many important food fish species. Accurate determination of species and stocks is important for fisheries management. The taxonomy of the genus Lethrinus is problematic, for example with regards to the identification of the thumbprint emperor Lethrinus harak. Little research has been done on L. harak diversity in the Pacific and Indian Oceans. This study aimed to evaluate the morphometric and genetic characters of the thumbprint emperor, L. harak (Forsskål, 1775) in the Pacific and Indian Oceans. Methods: This research was conducted in the Marine Biology Laboratory, Faculty of Marine Science and Fisheries, Hasanuddin University, and Division of Fisheries Science, University of Miyazaki. Morphometric character measurements were based on holotype character data, while genetic analysis was performed on cytochrome oxidase subunit I (COI) sequence data. Morphometric data were analysed using principal component analysis (PCA) statistical tests in MINITAB, and genetic data were analysed in MEGA 6. Results: Statistical test results based on morphometric characters revealed groupings largely representative of the Indian and Pacific Oceans. The Seychelles was separated from other Indian Ocean sites and Australian populations were closer to the Pacific than the Indian Ocean group. The genetic distance between the groups was in the low category (0.000 - 0.042). The phylogenetic topology reconstruction accorded well with the morphometric character analysis, with two main L. harak clades representing Indian and Pacific Ocean, and Australia in the Pacific Ocean clade. Conclusions: These results indicate that the morphological character size of L. harak from Makassar and the holotype from Saudi Arabia have changed. Genetic distance and phylogeny reconstruction are closely related to low genetic distance.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. A737-A738
Author(s):  
Sydney Chang ◽  
Alan Copperman ◽  
Andrea Elizabeth Dunaif

Abstract We found 17 rare PCOS-specific functional protein-altering variants (PAVs) or mutations in the gene, AMH, that decrease the biologic activity of the encoded protein, anti-Müllerian hormone (AMH), in the heterozygous state. Approximately 3% of European ancestry PCOS cases in our cohort of ~700 were affected. Our preliminary studies found evidence for a metabolic phenotype in both PCOS as well as their male first-degree relatives who were heterozygous carriers of these AMH PAVs. We performed this study to test the hypothesis that AMH mutations are associated with metabolic abnormalities in the general population. The Mount Sinai BioMe Biobank is an electronic health record (EHR)-linked biobank, containing anonymized whole exome sequences from 30,813 participants of diverse ancestries. We interrogated the sequence data to identify individuals with PCOS-related AMH PAVs. IRB-approval was obtained to review the linked EHR. Outcomes were the presence of obesity (BMI ≥ 30 kg/m2), type 2 diabetes (hemoglobin A1C ≥ 6.5%), prediabetes (hemoglobin A1C 5.7% - 6.4%), elevated cholesterol (total cholesterol ≥ 200 mg/dL), hypertriglyceridemia (TG ≥ 150 mg/dL), and hypertension (≥2 blood pressure values ≥140/90, or administration of antihypertensive medications). Control subjects were obtained from the National Health and Nutrition Examination Survey using propensity score matching (for sex, age, and BMI) with a 1:4 case:control ratio. A total of 292 individuals with AMH PAVs were identified, resulting in a combined 0.95% prevalence of AMH PAVs in an unselected population (1.07% in Europeans, 0.28% in African Americans, 0.54% in Hispanics, and 0.07% in Asians). After adjusting for age, BMI, and race/ethnicity, there was a statistically significant increased prevalence of hypertriglyceridemia in both women (OR 7.29, 95% CI 3.77-14.00) and men (OR 10.15, 95% CI 4.68-22.00) with AMH PAVs compared to sex-, age- and BMI-matched controls. There was also a statistically significant increased prevalence of elevated cholesterol in men with AMH PAVs compared to controls (OR 2.48, 95% CI 1.15-5.34). There were no significant differences between individuals with AMH PAVs and matched controls with respect to the other outcomes. These findings suggest that decreased bioactivity of AMH is causally related to dyslipidemia in the general population. TG levels are elevated in both sexes, whereas increases in cholesterol are only seen in men. The mechanisms by which decreases in AMH bioactivity alter circulating lipid levels are of considerable interest since AMH has no known metabolic actions. It is possible that the putative metabolic effects of AMH are mediated by increases in circulating testosterone (T) levels that in turn alter lipid metabolism. Since T levels are not commonly available in EHR, future studies will be needed to investigate this hypothesis as well as to explore metabolic actions of AMH.


2021 ◽  
pp. 1-10
Author(s):  
Tae-Hwi Schwantes-An ◽  
Matteo Vatta ◽  
Marco Abreu ◽  
Leah Wetherill ◽  
Howard J. Edenberg ◽  
...  

<b><i>Introduction:</i></b> Patients with chronic kidney disease experience high rates of cardiovascular mortality and morbidity. When kidney disease progresses to the need for dialysis, sudden cardiac death (SCD) accounts for 25–35% of all cardiovascular deaths. The objective was to determine if rare genetic variants known to be associated with cardiovascular death in the general population are associated with SCD in patients undergoing hemodialysis. <b><i>Methods:</i></b> We performed a case-control study comparing 126 (37 African American [AfAn] and 89 European ancestry [EA]) SCD subjects and 107 controls (34 AfAn and 73 EA), matched for age, sex, self-reported race, dialysis duration (&#x3c;2, 2–5 and &#x3e;5 years), and the presence or absence of diabetes mellitus. To target the coding regions of genes previously reported to be associated with 15 inherited cardiac conditions (ICCs), we used the TruSight Cardio Kit (Illumina, San Diego, CA, USA) to capture the genetic regions of interest. In all, the kit targets 572-kb regions that include the protein-coding regions and 40-bp 5′ and 3′ end-flanking regions of 174 genes associated with the 15 ICCs. Using the sequence data, burden tests were conducted to identify genes with an increased number of variants among SCD cases compared to matched controls. <b><i>Results:</i></b> Eleven genes were associated with SCD, but after correction for multiple testing, none of the 174 genes were identified as having more variants in the SCD cases than the matched controls, including previously identified genes. Secondary burden tests grouping variants based on diseases and gene function did not produce statistically significant associations. <b><i>Discussion/Conclusions:</i></b> We found no associations between genes known to be associated with ICCs and SCD in our sample of patients undergoing hemodialysis. This suggests that genetic causes are unlikely to be a major pathogenic factor in SCD in hemodialysis patients, although our sample size limits definitive conclusions.


2019 ◽  
Author(s):  
Vanessa R. Marcelino ◽  
Philip T.L.C. Clausen ◽  
Jan P. Buchmann ◽  
Michelle Wille ◽  
Jonathan R. Iredell ◽  
...  

AbstractHigh-throughput sequencing of DNA and RNA from environmental and host-associated samples (metagenomics and metatranscriptomics) is a powerful tool to assess which organisms are present in a sample. Taxonomic identification software usually align individual short sequence reads to a reference database, sometimes containing taxa with complete genomes only. This is a challenging task given that different species can share identical sequence regions and complete genome sequences are only available for a fraction of organisms. A recently developed approach to map sequence reads to reference databases involves weighing all high scoring read-mappings to the data base as a whole to produce better-informed alignments. We used this novel concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. Using simulated fungal and bacterial metagenomes, we demonstrate that CCMetagen substantially outperforms other commonly used metagenome classifiers, attaining a 3 – 1580 fold increase in precision and a 2 – 922 fold increase in F1 scores for species-level classifications when compared to Kraken2, Centrifuge and KrakenUniq. CCMetagen is sufficiently fast and memory efficient to use the entire NCBI nucleotide collection (nt) as reference, enabling the assessment of species with incomplete genome sequence data from all biological kingdoms. Our pipeline efficiently produced a comprehensive overview of the microbiome of two biological data sets, including both eukaryotes and prokaryotes. CCMetagen is user-friendly and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 915
Author(s):  
Muhammad Afrisal ◽  
Yukio Iwatsuki ◽  
Andi Iqbal Burhanuddin

Background: The Lethrinidae (emperors) include many important food fish species. Accurate determination of species and stocks is important for fisheries management. The taxonomy of the genus Lethrinus is problematic, for example with regards to the identification of the thumbprint emperor Lethrinus harak. Little research has been done on L. harak diversity in the Pacific and Indian Oceans. This study aimed to evaluate the morphometric and genetic characters of the thumbprint emperor, L. harak (Forsskål, 1775) in the Pacific and Indian Oceans. Methods: This research was conducted in the Marine Biology Laboratory, Faculty of Marine Science and Fisheries, Hasanuddin University, and Division of Fisheries Science, University of Miyazaki. Morphometric character measurements were based on holotype character data, while genetic analysis was performed on cytochrome oxidase subunit I (COI) sequence data. Morphometric data were analysed using principal component analysis (PCA) statistical tests in MINITAB, and genetic data were analysed in MEGA 6. Results: Statistical test results based on morphometric characters revealed groupings largely representative of the Indian and Pacific Oceans. The Seychelles was separated from other Indian Ocean sites and Australian populations were closer to the Pacific than the Indian Ocean group. The genetic distance between the groups was in the low category (0.000 - 0.042). The phylogenetic topology reconstruction accorded well with the morphometric character analysis, with two main L. harak clades representing Indian and Pacific Ocean, and Australia in the Pacific Ocean clade. Conclusions: These results indicate that geographical and environmental factors can affect the morphometric and genetic characteristics of L. harak.


2015 ◽  
Author(s):  
Shane McCarthy ◽  
Sayantan Das ◽  
Warren Kretzschmar ◽  
Olivier Delaneau ◽  
Andrew R. Wood ◽  
...  

We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Alejandra Vergara-Lope ◽  
M. Reza Jabalameli ◽  
Clare Horscroft ◽  
Sarah Ennis ◽  
Andrew Collins ◽  
...  

Abstract Quantification of linkage disequilibrium (LD) patterns in the human genome is essential for genome-wide association studies, selection signature mapping and studies of recombination. Whole genome sequence (WGS) data provides optimal source data for this quantification as it is free from biases introduced by the design of array genotyping platforms. The Malécot-Morton model of LD allows the creation of a cumulative map for each choromosome, analogous to an LD form of a linkage map. Here we report LD maps generated from WGS data for a large population of European ancestry, as well as populations of Baganda, Ethiopian and Zulu ancestry. We achieve high average genetic marker densities of 2.3–4.6/kb. These maps show good agreement with prior, low resolution maps and are consistent between populations. Files are provided in BED format to allow researchers to readily utilise this resource.


Sign in / Sign up

Export Citation Format

Share Document