scholarly journals Optimization of I/O Intensive Genome Assemblies on the Cori Supercomputer with Burst Buffer

Author(s):  
Joshua Pritchett ◽  
Bill Andreopoulos
Keyword(s):  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
John T. Lovell ◽  
Nolan B. Bentley ◽  
Gaurab Bhattarai ◽  
Jerry W. Jenkins ◽  
Avinash Sreedasyam ◽  
...  

AbstractGenome-enabled biotechnologies have the potential to accelerate breeding efforts in long-lived perennial crop species. Despite the transformative potential of molecular tools in pecan and other outcrossing tree species, highly heterozygous genomes, significant presence–absence gene content variation, and histories of interspecific hybridization have constrained breeding efforts. To overcome these challenges, here, we present diploid genome assemblies and annotations of four outbred pecan genotypes, including a PacBio HiFi chromosome-scale assembly of both haplotypes of the ‘Pawnee’ cultivar. Comparative analysis and pan-genome integration reveal substantial and likely adaptive interspecific genomic introgressions, including an over-retained haplotype introgressed from bitternut hickory into pecan breeding pedigrees. Further, by leveraging our pan-genome presence–absence and functional annotation database among genomes and within the two outbred haplotypes of the ‘Lakota’ genome, we identify candidate genes for pest and pathogen resistance. Combined, these analyses and resources highlight significant progress towards functional and quantitative genomics in highly diverse and outbred crops.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
John W Davey ◽  
Carolina M C Catta-Preta ◽  
Sally James ◽  
Sarah Forrester ◽  
Maria Cristina M Motta ◽  
...  

Abstract Angomonas deanei is an endosymbiont-bearing trypanosomatid with several highly fragmented genome assemblies and unknown chromosome number. We present an assembly of the A. deanei nuclear genome based on Oxford Nanopore sequence that resolves into 29 complete or close-to-complete chromosomes. The assembly has several previously unknown special features; it has a supernumerary chromosome, a chromosome with a 340-kb inversion, and there is a translocation between two chromosomes. We also present an updated annotation of the chromosomal genome with 10,365 protein-coding genes, 59 transfer RNAs, 26 ribosomal RNAs, and 62 noncoding RNAs.


2021 ◽  
Vol 22 (5) ◽  
pp. 2244
Author(s):  
Anton E. Shikov ◽  
Yury V. Malovichko ◽  
Arseniy A. Lobov ◽  
Maria E. Belousova ◽  
Anton A. Nizhnikov ◽  
...  

Bacillus thuringiensis, commonly referred to as Bt, is an object of the lasting interest of microbiologists due to its highly effective insecticidal properties, which make Bt a prominent source of biologicals. To categorize the exuberance of Bt strains discovered, serotyping assays are utilized in which flagellin serves as a primary seroreactive molecule. Despite its convenience, this approach is not indicative of Bt strains’ phenotypes, neither it reflects actual phylogenetic relationships within the species. In this respect, comparative genomic and proteomic techniques appear more informative, but their use in Bt strain classification remains limited. In the present work, we used a bottom-up proteomic approach based on fluorescent two-dimensional difference gel electrophoresis (2D-DIGE) coupled with liquid chromatography/tandem mass spectrometry(LC-MS/MS) protein identification to assess which stage of Bt culture, vegetative or spore, would be more informative for strain characterization. To this end, the proteomic differences for the israelensis-attributed strains were assessed to compare sporulating cultures of the virulent derivative to the avirulent one as well as to the vegetative stage virulent bacteria. Using the same approach, virulent spores of the israelensis strain were also compared to the spores of strains belonging to two other major Bt serovars, namely darmstadiensis and thuringiensis. The identified proteins were analyzed regarding the presence of the respective genes in the 104 Bt genome assemblies available at open access with serovar attributions specified. Of 21 proteins identified, 15 were found to be encoded in all the present assemblies at 67% identity threshold, including several virulence factors. Notable, individual phylogenies of these core genes conferred neither the serotyping nor the flagellin-based phylogeny but corroborated the reconstruction based on phylogenomics approaches in terms of tree topology similarity. In its turn, the distribution of accessory protein genes was not confined to the existing serovars. The obtained results indicate that neither gene presence nor the core gene sequence may serve as distinctive bases for the serovar attribution, undermining the notion that the serotyping system reflects strains’ phenotypic or genetic similarity. We also provide a set of loci, which fit in with the phylogenomics data plausibly and thus may serve for draft phylogeny estimation of the novel strains.


Author(s):  
Wayne Xu ◽  
James R Tucker ◽  
Wubishet A Bekele ◽  
Frank M You ◽  
Yong-Bi Fu ◽  
...  

Abstract Barley (Hordeum vulgare L.) is one of the most important global crops. The six-row barley cultivar Morex reference genome has been used by the barley research community worldwide. However, this reference genome can have limitations when used for genomic and genetic diversity analysis studies, gene discovery, and marker development when working in two-row germplasm that is more common to Canadian barley. Here we assembled, for the first time, the genome sequence of a Canadian two-row malting barley, cultivar AAC Synergy. We applied deep Illumina paired-end reads, long mate-pair reads, PacBio sequences, 10X chromium linked read libraries, and chromosome conformation capture sequencing (Hi-C) to generate a contiguous assembly. The genome assembled from super-scaffolds had a size of 4.85 Gb, N50 of 2.32 Mb and an estimated 93.9% of complete genes from a plant database (BUSCO, benchmarking universal single-copy orthologous genes). After removal of small scaffolds (< 300 Kb), the assembly was arranged into pseudomolecules of 4.14 Gb in size with seven chromosomes plus unanchored scaffolds. The completeness and annotation of the assembly were assessed by comparing it with the updated version of six-row Morex and recently released two-row Golden Promise genome assemblies.


2021 ◽  
Vol 9 (7) ◽  
pp. 1468
Author(s):  
Gavin J. Fenske ◽  
Joy Scaria

Salmonella enterica is common foodborne pathogen that generates both enteric and systemic infections in hosts. Antibiotic resistance is common is certain serovars of the pathogen and of great concern to public health. Recent reports have documented the co-occurrence of metal resistance with antibiotic resistance in one serovar of S. enterica. Therefore, we sought to identify possible co-occurrence in a large genomic dataset. Genome assemblies of 56,348 strains of S. enterica comprising 20 major serovars were downloaded from NCBI. The downloaded assemblies were quality controlled and in silico serotyped to ensure consistency and avoid improper annotation from public databases. Metal and antibiotic resistance genes were identified in the genomes as well as plasmid replicons. Co-occurrent genes were identified by constructing a co-occurrence matrix and grouping said matrix using k-means clustering. Three groups of co-occurrent genes were identified using k-means clustering. Group 1 was comprised of the pco and sil operons that confer resistance to copper and silver, respectively. Group 1 was distributed across four serovars. Group 2 contained the majority of the genes and little to no co-occurrence was observed. Metal and antibiotic co-occurrence was identified in group 3 that contained genes conferring resistance to: arsenic, mercury, beta-lactams, sulfonamides, and tetracyclines. Group 3 genes were also associated with an IncQ1 class plasmid replicon. Metal and antibiotic co-occurrence from group 3 genes is mostly isolated to one clade of S. enterica I 4,[5],12:i:-.


2017 ◽  
Vol 9 (7) ◽  
pp. 1873-1879 ◽  
Author(s):  
Cessa Rauch ◽  
Gregor Christa ◽  
Jan de Vries ◽  
Christian Woehle ◽  
Sven B. Gould

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zev N. Kronenberg ◽  
Arang Rhie ◽  
Sergey Koren ◽  
Gregory T. Concepcion ◽  
Paul Peluso ◽  
...  

AbstractHaplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80–91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.


DNA Research ◽  
2020 ◽  
Vol 27 (3) ◽  
Author(s):  
Nolan Bornowski ◽  
John P Hamilton ◽  
Pan Liao ◽  
Joshua C Wood ◽  
Natalia Dudareva ◽  
...  

Abstract Species within the mint family, Lamiaceae, are widely used for their culinary, cultural, and medicinal properties due to production of a wide variety of specialized metabolites, especially terpenoids. To further our understanding of genome diversity in the Lamiaceae and to provide a resource for mining biochemical pathways, we generated high-quality genome assemblies of four economically important culinary herbs, namely, sweet basil (Ocimum basilicum L.), sweet marjoram (Origanum majorana L.), oregano (Origanum vulgare L.), and rosemary (Rosmarinus officinalis L.), and characterized their terpenoid diversity through metabolite profiling and genomic analyses. A total 25 monoterpenes and 11 sesquiterpenes were identified in leaf tissue from the 4 species. Genes encoding enzymes responsible for the biosynthesis of precursors for mono- and sesqui-terpene synthases were identified in all four species. Across all 4 species, a total of 235 terpene synthases were identified, ranging from 27 in O. majorana to 137 in the tetraploid O. basilicum. This study provides valuable resources for further investigation of the genetic basis of chemodiversity in these important culinary herbs.


Sign in / Sign up

Export Citation Format

Share Document