scholarly journals GenomeScope 2.0 and Smudgeplots: Reference-free profiling of polyploid genomes

2019 ◽  
Author(s):  
T. Rhyker Ranallo-Benavidez ◽  
Kamil S. Jaron ◽  
Michael C. Schatz

AbstractAn important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as genome size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that, within seconds, accurately infers genome properties across thousands of simulated and eleven real datasets spanning a broad range of complexity. We also present a new method called Smudgeplots (https://github.com/KamilSJaron/smudgeplot) to visualize and infer the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in the Meloidogyne genus and also the extreme case of octoploid Fragaria x ananassa.

2021 ◽  
Author(s):  
Nicolas Pompidor ◽  
Carine Charron ◽  
Catherine Hervouet ◽  
Stéphanie Bocs ◽  
Gaëtan Droc ◽  
...  

Abstract Background and Aims Modern sugarcane cultivars (Saccharum spp.) are high polyploids, aneuploids (2n = ~12x = ~120) derived from interspecific hybridizations between the domesticated sweet species Saccharum officinarum and the wild species S. spontaneum. Methods To analyse the architecture and origin of such a complex genome, we analysed the sequences of all 12 hom(oe)ologous haplotypes (BAC clones) from two distinct genomic regions of a typical modern cultivar, as well as the corresponding sequence in Miscanthus sinense and Sorghum bicolor, and monitored their distribution among representatives of the Saccharum genus. Key Results The diversity observed among haplotypes suggested the existence of three founding genomes (A, B, C) in modern cultivars, which diverged between 0.8 and 1.3 Mya. Two genomes (A, B) were contributed by S. officinarum; these were also found in its wild presumed ancestor S. robustum, and one genome (C) was contributed by S. spontaneum. These results suggest that S. officinarum and S. robustum are derived from interspecific hybridization between two unknown ancestors (A and B genomes). The A genome contributed most haplotypes (nine or ten) while the B and C genomes contributed one or two haplotypes in the regions analysed of this typical modern cultivar. Interspecific hybridizations likely involved accessions or gametes with distinct ploidy levels and/or were followed by a series of backcrosses with the A genome. The three founding genomes were found in all S. barberi, S. sinense and modern cultivars analysed. None of the analysed accessions contained only the A genome or the B genome, suggesting that representatives of these founding genomes remain to be discovered. Conclusions This evolutionary model, which combines interspecificity and high polyploidy, can explain the variable chromosome pairing affinity observed in Saccharum. It represents a major revision of the understanding of Saccharum diversity.


Author(s):  
Suchart Chanama ◽  
Chanwit Suriyachadkun ◽  
Manee Chanama

A novel actinomycete, strain SMC 257T, was isolated from a soil sample collected from mountain forest, Nan Province, Thailand. Strain SMC 257T formed tightly closed spiral spore chains on aerial mycelia. A polyphasic approach was used for the taxonomic study of this strain. Phylogenetic analysis based on 16S rRNA gene sequences indicated that strain SMC 257T belonged to the genus Nonomuraea , and the closest phylogenetically related species were Nonomuraea roseoviolacea subsp. carminata JCM 9946T (98.9 % 16S rRNA gene sequence similarity), Nonomuraea rhodomycinica TBRC 6557T (98.4 %), and Nonomuraea roseoviolacea subsp. roseoviolacea JCM 3145T (98.3 %). Genome sequencing revealed a genome size of 9.76 Mbp and a G+C content of 72.3 mol%. The genome average nucleotide identity (ANI) and the digital DNA–DNA hybridization (dDDH) values that distinguished this novel strain from its closest related species were species boundary of 95–96 % and 70 %, respectively. The cell wall peptidoglycan contained meso-diaminopimelic acid. The whole-cell sugars were glucose, ribose, madurose and mannose. The major menaquinone was MK-9(H4). The polar lipid profile consisted of phosphatidylethanolamine, hydroxyphosphatidylethanolamine, lysophosphatidylethanolamine, diphosphatidylglycerol, N-phosphatidylglycerol, phosphatidylinositol and phosphatidylinositol mannosides. The predominant cellular fatty acids were C17 : 0 10-methyl and iso-C16 : 0. Based on comparative analysis of phenotypic, chemotaxonomic and genotypic data, strain SMC 257T is considered to represent a novel species of the genus Nonomuraea , for which the name Nonomuraea montanisoli is proposed. The type strain is SMC 257T (=TBRC 13065T=NBRC 114772T).


2020 ◽  
Vol 16 (12) ◽  
pp. e1008439
Author(s):  
Jennifer Lu ◽  
Steven L. Salzberg

GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI’s Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app https://jenniferlu717.shinyapps.io/SkewIT/ that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository: https://github.com/jenniferlu717/SkewIT.


2020 ◽  
Vol 70 (8) ◽  
pp. 4646-4652 ◽  
Author(s):  
Nadezhda V. Agafonova ◽  
Elena N. Kaparullina ◽  
Denis S. Grouzdev ◽  
Nina V. Doronina

Novel aerobic, restricted facultatively methylotrophic bacteria were isolated from buds of English oak (Quercus robur L.; strain DubT) and northern red oak (Quercus rubra L.; strain KrD). The isolates were Gram-negative, asporogenous, motile short rods that multiplied by binary fisson. They utilized methanol, methylamine and a few polycarbon compounds as carbon and energy sources. Optimal growth occurred at 25 °C and pH 7.5. The dominant phospholipids were phosphatidylethanolamine, phosphatidylcholine, diphosphatidylglycerol and phoshatidylglycerol. The major cellular fatty acids of cells were C18 : 1 ω7c, 11-methyl C18 : 1 ω7c and C16 : 0. The major ubiquinone was Q-10. Analysis of 16S rRNA gene sequences showed that the strains were closely related to the members of the genus Hansschlegelia : Hansschlegelia zhihuaiae S113T(97.5–98.0 %), Hansschlegelia plantiphila S1T (97.4–97.6 %) and Hansschlegelia beijingensis PG04T(97.0–97.2 %). The 16S rRNA gene sequence similarity between strains DubT and KrD was 99.7 %, and the DNA–DNA hybridization (DDH) result between the strains was 85 %. The ANI and the DDH values between strain DubT and H. zhihuaiae S113T were 80.1 and 21.5  %, respectively. Genome sequencing of the strain DubT revealed a genome size of 3.57 Mbp and a G+C content of 67.0 mol%. Based on the results of the phenotypic, chemotaxonomic and genotypic analyses, it is proposed that the isolates be assigned to the genus Hansschlegelia as Hansschlegelia quercus sp. nov. with the type strain DubT (=VKM B-3284T=CCUG 73648T=JCM 33463T).


2020 ◽  
Vol 70 (3) ◽  
pp. 1868-1875 ◽  
Author(s):  
Shan-Hui Li ◽  
Jaeho Song ◽  
Yeonjung Lim ◽  
Yochan Joung ◽  
Ilnam Kang ◽  
...  

A Gram-stain-negative, rod-shaped, aerobic, non-flagellated, chemoheterotrophic bacterium, designated IMCC14385T, was isolated from surface seawater of the East Sea, Republic of Korea. The 16S rRNA gene sequence analysis indicated that IMCC14385T represented a member of the genus Halioglobus sharing 94.6–97.8 % similarities with species of the genus. Whole-genome sequencing of IMCC14385T revealed a genome size of 4.3 Mbp and DNA G+C content of 56.7 mol%. The genome of IMCC14385T shared an average nucleotide identity of 76.6 % and digital DNA–DNA hybridization value of 21.6 % with the genome of Halioglobus japonicus KCTC 23429T. The genome encoded the complete poly-β-hydroxybutyrate biosynthesis pathway. The strain contained summed feature 8 (C18 : 1 ω7c and/or C18 : 1 ω6c), summed feature 3 (C16 : 1 ω7c and/or C16 : 1 ω6c) and C17 : 1 ω8c as the predominant cellular fatty acids as well as ubiquinone-8 (Q-8) as the respiratory quinone. The polar lipids detected in the strain were phosphatidylethanolamine, phosphatidylglycerol, diphosphatidylglycerol, five unidentified phospholipids, an unidentified aminolipid, an unidentified aminophospholipid and four unidentified lipids. On the basis of taxonomic data obtained in this study, it is suggested that IMCC14385T represents a novel species of the genus Halioglobus , for which the name Halioglobus maricola sp. nov. is proposed. The type strain is IMCC14385T (=KCTC 72520T=NBRC 114072T).


Author(s):  
Angéline Antezack ◽  
Manon Boxberger ◽  
Mariem Ben Khedher ◽  
Bernard La Scola ◽  
Virginie Monnet-Corti

A Gram-stain-negative bacterium, designated strain Marseille-Q3039T, was isolated from subgingival dental plaque of a woman with gingivitis in Marseille, France. Strain Marseille-Q3039T was found to be an anaerobic, motile and spore-forming crescent-shaped bacterium that grew at 25–41.5 °C (optimum, 37 °C), pH 5.5–8.5 (optimum, pH 7.5) and salinity of 5.0 g l−1 NaCl. The results of 16S rRNA gene sequence analysis revealed that strain Marseille-Q3039T was closely related to Selenomonas infelix ATCC 43532T (98.42 % similarity), Selenomonas dianae ATCC 43527T (97.25 %) and Centipedia periodontii DSM 2778T (97.19 %). The orthologous average nucleotide identity and digital DNA–DNA hybridization relatedness between strain Q3039T and its closest phylogenetic neighbours were respectively 84.57 and 28.2 % for S. infelix ATCC 43532T and 83.93 and 27.2 % for C. periodontii DSM 2778T. The major fatty acids were identified as C13 : 0 (27.7 %), C15 : 0 (24.4 %) and specific C13 : 0 3-OH (12.3 %). Genome sequencing revealed a genome size of 2 351 779 bp and a G+C content of 57.2 mol%. On the basis of the results from phenotypic, chemotaxonomic, genomic and phylogenetic analyses and data, we concluded that strain Marseille-Q3039T represents a novel species of the genus Selenomonas , for which the name Selenomonas timonae sp. nov. is proposed (=CSUR Q3039=CECT 30128).


Author(s):  
Héléna Cuny ◽  
Clément Offret ◽  
Amine M. Boukerb ◽  
Leila Parizadeh ◽  
Olivier Lesouhaitier ◽  
...  

Three bacterial strains, named hOe-66T, hOe-124 and hOe-125, were isolated from the haemolymph of different specimens of the flat oyster Ostrea edulis collected in Concarneau bay (Finistère, France). These strains were characterized by a polyphasic approach, including (i) whole genome analyses with 16S rRNA gene sequence alignment and pangenome analysis, determination of the G+C content, average nucleotide identity (ANI), and in silico DNA–DNA hybridization (isDDH), and (ii) fatty acid methyl ester and other phenotypic analyses. Strains hOe-66T, hOe-124 and hOe-125 were closely related to both type strains Pseudoalteromonas rhizosphaerae RA15T and Pseudoalteromonas neustonica PAMC 28425T with less than 93.3% ANI and 52.3% isDDH values. Regarding their phenotypic traits, the three strains were Gram-negative, 1–2 µm rod-shaped, aerobic, motile and non-spore-forming bacteria. Cells grew optimally at 25 °C in 2.5% NaCl and at 7–8 pH. The most abundant fatty acids were summed feature 3 (C16:1 ω7c/C16:1 ω6c), C16:0 and C17:1 ω8c. The strains carried a genome average size of 4.64 Mb and a G+C content of 40.28 mol%. The genetic and phenotypic results suggested that strains hOe-66T, hOe-124 and hOe-125 belong to a new species of the genus Pseudoalteromonas . In this context, we propose the name Pseudoalteromonas ostreae sp. nov. The type strain is hOe-66T (=CECT 30303T=CIP 111911T).


Plants ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 270 ◽  
Author(s):  
Yun Gyeong Lee ◽  
Sang Chul Choi ◽  
Yuna Kang ◽  
Kyeong Min Kim ◽  
Chon-Sik Kang ◽  
...  

The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.


2020 ◽  
Vol 10 (6) ◽  
pp. 2057-2068 ◽  
Author(s):  
Jessica R. Eisenstatt ◽  
Lars Boeckmann ◽  
Wei-Chun Au ◽  
Valerie Garcia ◽  
Levi Bursch ◽  
...  

The evolutionarily conserved centromeric histone H3 variant (Cse4 in budding yeast, CENP-A in humans) is essential for faithful chromosome segregation. Mislocalization of CENP-A to non-centromeric chromatin contributes to chromosomal instability (CIN) in yeast, fly, and human cells and CENP-A is highly expressed and mislocalized in cancers. Defining mechanisms that prevent mislocalization of CENP-A is an area of active investigation. Ubiquitin-mediated proteolysis of overexpressed Cse4 (GALCSE4) by E3 ubiquitin ligases such as Psh1 prevents mislocalization of Cse4, and psh1Δ strains display synthetic dosage lethality (SDL) with GALCSE4. We previously performed a genome-wide screen and identified five alleles of CDC7 and DBF4 that encode the Dbf4-dependent kinase (DDK) complex, which regulates DNA replication initiation, among the top twelve hits that displayed SDL with GALCSE4. We determined that cdc7-7 strains exhibit defects in ubiquitin-mediated proteolysis of Cse4 and show mislocalization of Cse4. Mutation of MCM5 (mcm5-bob1) bypasses the requirement of Cdc7 for replication initiation and rescues replication defects in a cdc7-7 strain. We determined that mcm5-bob1 does not rescue the SDL and defects in proteolysis of GALCSE4 in a cdc7-7 strain, suggesting a DNA replication-independent role for Cdc7 in Cse4 proteolysis. The SDL phenotype, defects in ubiquitin-mediated proteolysis, and the mislocalization pattern of Cse4 in a cdc7-7 psh1Δ strain were similar to that of cdc7-7 and psh1Δ strains, suggesting that Cdc7 regulates Cse4 in a pathway that overlaps with Psh1. Our results define a DNA replication initiation-independent role of DDK as a regulator of Psh1-mediated proteolysis of Cse4 to prevent mislocalization of Cse4.


Author(s):  
Gábor Hetyei

We show how Viennot’s combinatorial theory of orthogonal polynomials may be used to generalize some recent results of Sukumar and Hodges (Hodges & Sukumar 2007 Proc. R. Soc. A 463 , 2401–2414 ( doi:10.1098/rspa.2007.0001 ); Sukumar & Hodges 2007 Proc. R. Soc. A 463 , 2415–2427 ( doi:10.1098/rspa.2007.0003 )) on the matrix entries in powers of certain operators in a representation of su(1, 1). Our results link these calculations to finding the moments and inverse polynomial coefficients of certain Laguerre polynomials and Meixner polynomials of the second kind. As an immediate consequence of results by Koelink, Groenevelt and Van Der Jeugt (Van Der Jeugt 1997 J. Math. Phys. 38 , 2728–2740 ( doi:10.1063/1.531984 ); Koelink & Van Der Jeugt 1998 SIAM J. Math. Anal. 29 , 794–822 ( doi:10.1137/S003614109630673X ); Groenevelt & Koelink 2002 J. Phys. A 35 , 65–85 ( doi:10.1088/0305-4470/35/1/306 )), for the related operators, substitutions into essentially the same Laguerre polynomials and Meixner polynomials of the second kind may be used to express their eigenvectors. Our combinatorial approach explains and generalizes this ‘coincidence’.


Sign in / Sign up

Export Citation Format

Share Document