scholarly journals Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in beetles (Coleoptera)

2019 ◽  
Author(s):  
James M. Pflug ◽  
Valerie Renee Holmes ◽  
Crystal Burrus ◽  
J. Spencer Johnston ◽  
David R. Maddison

ABSTRACTMeasuring genome size across different species can yield important insights into evolution of the genome and allow for more informed decisions when designing next-generation genomic sequencing projects. New techniques for estimating genome size using shallow genomic sequence data have emerged which have the potential to augment our knowledge of genome sizes, yet these methods have only been used in a limited number of empirical studies. In this project, we compare estimation methods using next-generation sequencing (k-mer methods and average read depth of single-copy genes) to measurements from flow cytometry, the gold standard for genome size measures, using ground beetles (Carabidae) and other members of the beetle suborder Adephaga as our test system. We also present a new protocol for using read-depth of single-copy genes to estimate genome size. Additionally, we report flow cytometry measurements for five previously unmeasured carabid species, as well as 21 new draft genomes and six new draft transcriptomes across eight species of adephagan beetles. No single sequence-based method performed well on all species, and all tended to underestimate the genome sizes, although only slightly in most samples. For one species, Bembidion haplogonum, most sequence-based methods yielded estimates half the size suggested by flow cytometry. This discrepancy for k-mer methods can be explained by a large number of repetitive sequences, but we have no explanation for why read-depth methods yielded results that were also strikingly low.

2020 ◽  
Vol 10 (9) ◽  
pp. 3047-3060 ◽  
Author(s):  
James M. Pflug ◽  
Valerie Renee Holmes ◽  
Crystal Burrus ◽  
J. Spencer Johnston ◽  
David R. Maddison

Measuring genome size across different species can yield important insights into evolution of the genome and allow for more informed decisions when designing next-generation genomic sequencing projects. New techniques for estimating genome size using shallow genomic sequence data have emerged which have the potential to augment our knowledge of genome sizes, yet these methods have only been used in a limited number of empirical studies. In this project, we compare estimation methods using next-generation sequencing (k-mer methods and average read depth of single-copy genes) to measurements from flow cytometry, a standard method for genome size measures, using ground beetles (Carabidae) and other members of the beetle suborder Adephaga as our test system. We also present a new protocol for using read-depth of single-copy genes to estimate genome size. Additionally, we report flow cytometry measurements for five previously unmeasured carabid species, as well as 21 new draft genomes and six new draft transcriptomes across eight species of adephagan beetles. No single sequence-based method performed well on all species, and all tended to underestimate the genome sizes, although only slightly in most samples. For one species, Bembidion sp. nr. transversale, most sequence-based methods yielded estimates half the size suggested by flow cytometry.


Genome ◽  
2013 ◽  
Vol 56 (9) ◽  
pp. 487-494 ◽  
Author(s):  
Kate L. Hertweck

The research field of comparative genomics is moving from a focus on genes to a more holistic view including the repetitive complement. This study aimed to characterize relative proportions of the repetitive fraction of large, complex genomes in a nonmodel system. The monocotyledonous plant order Asparagales (onion, asparagus, agave) comprises some of the largest angiosperm genomes and represents variation in both genome size and structure (karyotype). Anonymous, low coverage, single-end Illumina data from 11 exemplar Asparagales taxa were assembled using a de novo method. Resulting contigs were annotated using a reference library of available monocot repetitive sequences. Mapping reads to contigs provided rough estimates of relative proportions of each type of transposon in the nuclear genome. The results were parsed into general repeat types and synthesized with genome size estimates and a phylogenetic context to describe the pattern of transposable element evolution among these lineages. The major finding is that although some lineages in Asparagales exhibit conservation in repeat proportions, there is generally wide variation in types and frequency of repeats. This approach is an appropriate first step in characterizing repeats in evolutionary lineages with a paucity of genomic resources.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6563
Author(s):  
Jianying Sun ◽  
Xiaofeng Dong ◽  
Qinghe Cao ◽  
Tao Xu ◽  
Mingku Zhu ◽  
...  

Background Ipomoea is the largest genus in the family Convolvulaceae. The species in this genus have been widely used in many fields, such as agriculture, nutrition, and medicine. With the development of next-generation sequencing, more than 50 chloroplast genomes of Ipomoea species have been sequenced. However, the repeats and divergence regions in Ipomoea have not been well investigated. In the present study, we sequenced and assembled eight chloroplast genomes from sweet potato’s close wild relatives. By combining these with 32 published chloroplast genomes, we conducted a detailed comparative analysis of a broad range of Ipomoea species. Methods Eight chloroplast genomes were assembled using short DNA sequences generated by next-generation sequencing technology. By combining these chloroplast genomes with 32 other published Ipomoea chloroplast genomes downloaded from GenBank and the Oxford Research Archive, we conducted a comparative analysis of the repeat sequences and divergence regions across the Ipomoea genus. In addition, separate analyses of the Batatas group and Quamoclit group were also performed. Results The eight newly sequenced chloroplast genomes ranged from 161,225 to 161,721 bp in length and displayed the typical circular quadripartite structure, consisting of a pair of inverted repeat (IR) regions (30,798–30,910 bp each) separated by a large single copy (LSC) region (87,575–88,004 bp) and a small single copy (SSC) region (12,018–12,051 bp). The average guanine-cytosine (GC) content was approximately 40.5% in the IR region, 36.1% in the LSC region, 32.2% in the SSC regions, and 37.5% in complete sequence for all the generated plastomes. The eight chloroplast genome sequences from this study included 80 protein-coding genes, four rRNAs (rrn23, rrn16, rrn5, and rrn4.5), and 37 tRNAs. The boundaries of single copy regions and IR regions were highly conserved in the eight chloroplast genomes. In Ipomoea, 57–89 pairs of repetitive sequences and 39–64 simple sequence repeats were found. By conducting a sliding window analysis, we found six relatively high variable regions (ndhA intron, ndhH-ndhF, ndhF-rpl32, rpl32-trnL, rps16-trnQ, and ndhF) in the Ipomoea genus, eight (trnG, rpl32-trnL, ndhA intron, ndhF-rpl32, ndhH-ndhF, ccsA-ndhD, trnG-trnR, and pasA-ycf3) in the Batatas group, and eight (ndhA intron, petN-psbM, rpl32-trnL, trnG-trnR, trnK-rps16, ndhC-trnV, rps16-trnQ, and trnG) in the Quamoclit group. Our maximum-likelihood tree based on whole chloroplast genomes confirmed the phylogenetic topology reported in previous studies. Conclusions The chloroplast genome sequence and structure were highly conserved in the eight newly-sequenced Ipomoea species. Our comparative analysis included a broad range of Ipomoea chloroplast genomes, providing valuable information for Ipomoea species identification and enhancing the understanding of Ipomoea genetic resources.


Genome ◽  
2018 ◽  
Vol 61 (8) ◽  
pp. 567-574 ◽  
Author(s):  
Wen Zhou ◽  
Bin Li ◽  
Lin Li ◽  
Wen Ma ◽  
Yuanchu Liu ◽  
...  

Dioscorea zingiberensis (Dioscoreceae) is the main plant source of diosgenin (steroidal sapogenins), the precursor for the production of steroid hormones in the pharmaceutical industry. Despite its large economic value, genomic information of the genus Dioscorea is currently unavailable. Here, we present an initial survey of the D. zingiberensis genome performed by next-generation sequencing technology together with a genome size investigation inferred by flow cytometry. The whole genome survey of D. zingiberensis generated 31.48 Gb of sequence data with approximately 78.70× coverage. The estimated genome size is 800 Mb, with a high level of heterozygosity based on K-mer analysis. These reads were assembled into 334 288 contigs with a N50 length of 1079 bp, which were further assembled into 92 163 scaffolds with a total length of 173.46 Mb. A total of 4935 genes, 81 tRNAs, 69 rRNAs, and 661 miRNAs were predicted by the genome analysis, and 263 484 repeated sequences were obtained with 419 372 simple sequence repeats (SSRs). Among these SSRs, the mononucleotide repeat type was the most abundant (up to 54.60% of the total SSRs), followed by the dinucleotide (29.60%), trinucleotide (11.37%), tetranucleotide (3.53%), pentanucleotide (0.65%), and hexanucleotide (0.25%) repeat types. The 1C-value of D. zingiberensis was calibrated against Salvia miltiorrhiza and calculated as 0.87 pg (851 Mb) by flow cytometry, which was very close to the result of the genome survey. This is the first report of genome-wide characterization within this taxon.


2004 ◽  
Vol 385 (11) ◽  
pp. 1059-1067 ◽  
Author(s):  
Ewa Golonka ◽  
Renata Filipek ◽  
Artur Sabat ◽  
Anna Sinczak ◽  
Jan Potempa

AbstractStaphylococcus aureus, a leading cause of bacterial infections in humans, is endowed with a wealth of virulence factors that contribute to the disease process. Several extracellular proteolytic enzymes, including cysteine proteinases referred to as the staphopains (staphopain A, encoded by thescpAgene, and staphopain B, encoded bysspB), have proposed roles for staphylococcal virulence. Here we present data regarding the distribution, copy number and genetic variability of the genes encoding the staphopains in a large number ofS. aureusstrains. The polymorphism of thescpAandsspBgenes in three laboratory strains and 126 clinical isolates was analyzed by polymerase chain reaction (PCR)-restriction fragment length polymorphism (RFLP). Both genes were detected in all isolates by PCR amplification and, based on the PCR-RFLP patterns, classified as four types forscpAand six types forsspB. Those with the most divergent patterns were subjected to DNA sequencing and compared with genomic sequence data for the seven available strains ofS. aureus. Southern blot analysis of thescpAandsspBsequences indicates that they are strongly conserved as single-copy genes in the genome of eachS. aureusstrain investigated. Taken together, these data suggest that the staphopains have important housekeeping and/or virulence functions, and therefore may constitute an interesting target for the development of therapeutic inhibitors for the treatment of staphylococcal diseases.


1998 ◽  
Vol 180 (24) ◽  
pp. 6697-6703 ◽  
Author(s):  
Jeanne Carr ◽  
Glenmore Shearer

ABSTRACT The genome size, complexity, and ploidy of the dimorphic pathogenic fungus Histoplasma capsulatum was determined by using DNA renaturation kinetics, genomic reconstruction, and flow cytometry. Nuclear DNA was isolated from two strains, G186AS and Downs, and analyzed by renaturation kinetics and genomic reconstruction with three putative single-copy genes (calmodulin, α-tubulin, and β-tubulin). G186AS was found to have a genome of approximately 2.3 × 107 bp with less than 0.5% repetitive sequences. The Downs strain, however, was found to have a genome approximately 40% larger with more than 16 times more repetitive DNA. The Downs genome was determined to be 3.2 × 107 bp with approximately 8% repetitive DNA. To determine ploidy, the DNA mass per cell measured by flow cytometry was compared with the 1n genome estimate to yield a DNA index (DNA per cell/1n genome size). Strain G186AS was found to have a DNA index of 0.96, and Downs had a DNA index of 0.94, indicating that both strains are haploid. Genomic reconstruction and Southern blot data obtained with α- and β-tubulin probes indicated that some genetic duplication has occurred in the Downs strain, which may be aneuploid or partially diploid.


2014 ◽  
Vol 13 (2) ◽  
pp. 142-152 ◽  
Author(s):  
Alexandra Marina Gottlieb ◽  
Lidia Poggio

The development of modern approaches to the genetic improvement of the tree crops Ilex paraguariensis (‘yerba mate’) and Ilex dumosa (‘yerba señorita’) is halted by the scarcity of basic genetic information. In this study, we characterized the implementation of low-cost methodologies such as representational difference analysis (RDA), single-strand conformation polymorphisms (SSCP), and reverse and direct dot-blot filter hybridization assays coupled with thorough bioinformatic characterization of sequence data for both species. Also, we estimated the genome size of each species using flow cytometry. This study contributes to the better understanding of the genetic differences between two cultivated species, by generating new quantitative and qualitative genome-level data. Using the RDA technique, we isolated a group of non-coding repetitive sequences, tentatively considered as Ilex-specific, which were 1.21- to 39.62-fold more abundant in the genome of I. paraguariensis. Another group of repetitive DNA sequences involved retrotransposons, which appeared 1.41- to 35.77-fold more abundantly in the genome of I. dumosa. The genomic DNA of each species showed different performances in filter hybridizations: while I. paraguariensis showed a high intraspecific affinity, I. dumosa exhibited a higher affinity for the genome of the former species (i.e. interspecific). These differences could be attributed to the occurrence of homologous but slightly divergent repetitive DNA sequences, highly amplified in the genome of I. paraguariensis but not in the genome of I. dumosa. Additionally, our hybridization outcomes suggest that the genomes of both species have less than 80% similarity. Moreover, for the first time, we report herein a genome size estimate of 1670 Mbp for I. paraguariensis and that of 1848 Mbp for I. dumosa.


Author(s):  
S. Negm ◽  
A. Greenberg ◽  
A.M. Larracuente ◽  
J.S. Sproul

AbstractStudy of DNA repeats in model organisms highlights the role of repetitive DNA in many processes that drive genome evolution and phenotypic change. Because repetitive DNA is much more dynamic than single-copy DNA, repetitive sequences can reveal signals of evolutionary history over short time scales that may not be evident in sequences from slower-evolving genomic regions. Many tools for studying repeats are directed toward organisms with existing genomic resources, including genome assemblies and repeat libraries. However, signals in repeat variation may prove especially valuable in disentangling evolutionary histories in diverse non-model groups, for which genomic resources are limited. Here we introduce RepeatProfiler, a tool for generating, visualizing, and comparing repetitive DNA profiles from low-coverage, short-read sequence data. RepeatProfiler automates the generation and visualization of repetitive DNA coverage depth profiles and allows for statistical comparison of profile shape across samples. In addition, RepeatProfiler facilitates comparison of profiles by extracting signal from sequence variants across profiles which can then be analyzed as molecular morphological characters using phylogenetic analysis. We validate RepeatProfiler with data sets from ground beetles (Bembidion), flies (Drosophila), and tomatoes (Solanum). We highlight the potential of repetitive DNA profiles as a high-resolution data source for studies in species delimitation, comparative genomics, and repeat biology.


2020 ◽  
Vol 12 (12) ◽  
pp. 2384-2390
Author(s):  
Pepijn W Kooij ◽  
Jaume Pellicer

Abstract Each day, as the amount of genomic data and bioinformatics resources grows, researchers are increasingly challenged with selecting the most appropriate approach to analyze their data. In addition, the opportunity to undertake comparative genomic analyses is growing rapidly. This is especially true for fungi due to their small genome sizes (i.e., mean 1C = 44.2 Mb). Given these opportunities and aiming to gain novel insights into the evolution of mutualisms, we focus on comparing the quality of whole genome assemblies for fungus-growing ants cultivars (Hymenoptera: Formicidae: Attini) and a free-living relative. Our analyses reveal that currently available methodologies and pipelines for analyzing whole-genome sequence data need refining. By using different genome assemblers, we show that the genome assembly size depends on what software is used. This, in turn, impacts gene number predictions, with higher gene numbers correlating positively with genome assembly size. Furthermore, the majority of fungal genome size data currently available are based on estimates derived from whole-genome assemblies generated from short-read genome data, rather than from the more accurate technique of flow cytometry. Here, we estimated the haploid genome sizes of three ant fungal symbionts by flow cytometry using the fungus Pleurotus ostreatus (Jacq.) P. Kumm. (1871) as a calibration standard. We found that published genome sizes based on genome assemblies are 2.5- to 3-fold larger than our estimates based on flow cytometry. We, therefore, recommend that flow cytometry is used to precalibrate genome assembly pipelines, to avoid incorrect estimates of genome sizes and ensure robust assemblies.


Sign in / Sign up

Export Citation Format

Share Document