Genome sequencing and population resequencing provide insights into the genetic basis of domestication and diversity of vegetable soybean

Horticulture Research ◽

10.1093/hr/uhab052 ◽

2022 ◽

Vol 9 ◽

Author(s):

Na Liu ◽

Yongchao Niu ◽

Guwen Zhang ◽

Zhijuan Feng ◽

Yuanpeng Bo ◽

...

Keyword(s):

De Novo ◽

Phylogenetic Analyses ◽

Repetitive Sequences ◽

Wild Soybean ◽

Sugar Transport ◽

Sucrose Phosphate Synthase ◽

Comparative Genomic ◽

Vegetable Soybean ◽

De Novo Genome Assembly ◽

The Difference

Abstract Vegetable soybean is one of the most important vegetables in China, and the demand for this vegetable has markedly increased worldwide over the past two decades. Here, we present a high-quality de novo genome assembly of the vegetable soybean cultivar Zhenong 6 (ZN6), which is one of the most popular cultivars in China. The 20 pseudochromosomes cover 94.57% of the total 1.01 Gb assembly size, with contig N50 of 3.84 Mb and scaffold N50 of 48.41 Mb. A total of 55 517 protein-coding genes were annotated. Approximately 54.85% of the assembled genome was annotated as repetitive sequences, with the most abundant long terminal repeat transposable elements. Comparative genomic and phylogenetic analyses with grain soybean Williams 82, six other Fabaceae species and Arabidopsis thaliana genomes highlight the difference of ZN6 with other species. Furthermore, we resequenced 60 vegetable soybean accessions. Alongside 103 previously resequenced wild soybean and 155 previously resequenced grain soybean accessions, we performed analyses of population structure and selective sweep of vegetable, grain, and wild soybean. They were clearly divided into three clades. We found 1112 and 1047 genes under selection in the vegetable soybean and grain soybean populations compared with the wild soybean population, respectively. Among them, we identified 134 selected genes shared between vegetable soybean and grain soybean populations. Additionally, we report four sucrose synthase genes, one sucrose-phosphate synthase gene, and four sugar transport genes as candidate genes related to important traits such as seed sweetness and seed size in vegetable soybean. This study provides essential genomic resources to promote evolutionary and functional genomics studies and genomically informed breeding for vegetable soybean.

Long-read sequencing and de novo genome assembly of marine medaka (Oryzias melastigma)

BMC Genomics ◽

10.1186/s12864-020-07042-7 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Pingping Liang ◽

Hafiz Sohaib Ahmed Saqib ◽

Xiaomin Ni ◽

Yingjia Shen

Keyword(s):

Dna Repair ◽

Positive Selection ◽

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Gene Families ◽

Comparative Genomic ◽

De Novo Genome Assembly ◽

Marine Medaka ◽

Oryzias Melastigma

Abstract Background Marine medaka (Oryzias melastigma) is considered as an important ecotoxicological indicator to study the biochemical, physiological and molecular responses of marine organisms towards increasing amount of pollutants in marine and estuarine waters. Results In this study, we reported a high-quality and accurate de novo genome assembly of marine medaka through the integration of single-molecule sequencing, Illumina paired-end sequencing, and 10X Genomics linked-reads. The 844.17 Mb assembly is estimated to cover more than 98% of the genome and is more continuous with fewer gaps and errors than the previous genome assembly. Comparison of O. melastigma with closely related species showed significant expansion of gene families associated with DNA repair and ATP-binding cassette (ABC) transporter pathways. We identified 274 genes that appear to be under significant positive selection and are involved in DNA repair, cellular transportation processes, conservation and stability of the genome. The positive selection of genes and the considerable expansion in gene numbers, especially related to stimulus responses provide strong supports for adaptations of O. melastigma under varying environmental stresses. Conclusions The highly contiguous marine medaka genome and comparative genomic analyses will increase our understanding of the underlying mechanisms related to its extraordinary adaptation capability, leading towards acceleration in the ongoing and future investigations in marine ecotoxicology.

De novo sequencing, assembly and functional annotation of Armillaria borealis genome

BMC Genomics ◽

10.1186/s12864-020-06964-6 ◽

2020 ◽

Vol 21 (S7) ◽

Author(s):

Vasilina S. Akulova ◽

Vadim V. Sharov ◽

Anastasiya I. Aksyonova ◽

Yuliya A. Putintseva ◽

Natalya V. Oreshkova ◽

...

Keyword(s):

Comparative Analysis ◽

Genome Assembly ◽

Functional Annotation ◽

De Novo ◽

Fundamental Problem ◽

Repetitive Sequences ◽

Far East ◽

White Rot ◽

Climatic Effects ◽

De Novo Genome Assembly

Abstract Background Massive forest decline has been observed almost everywhere as a result of negative anthropogenic and climatic effects, which can interact with pests, fungi and other phytopathogens and aggravate their effects. Climatic changes can weaken trees and make fungi, such as Armillaria more destructive. Armillaria borealis (Marxm. & Korhonen) is a fungus from the Physalacriaceae family (Basidiomycota) widely distributed in Eurasia, including Siberia and the Far East. Species from this genus cause the root white rot disease that weakens and often kills woody plants. However, little is known about ecological behavior and genetics of A. borealis. According to field research data, A. borealis is less pathogenic than A. ostoyae, and its aggressive behavior is quite rare. Mainly A. borealis behaves as a secondary pathogen killing trees already weakened by other factors. However, changing environment might cause unpredictable effects in fungus behavior. Results The de novo genome assembly and annotation were performed for the A. borealis species for the first time and presented in this study. The A. borealis genome assembly contained ~ 68 Mbp and was comparable with ~ 60 and ~ 79.5 Mbp for the A. ostoyae and A. mellea genomes, respectively. The N50 for contigs equaled 50,544 bp. Functional annotation analysis revealed 21,969 protein coding genes and provided data for further comparative analysis. Repetitive sequences were also identified. The main focus for further study and comparative analysis will be on the enzymes and regulatory factors associated with pathogenicity. Conclusions Pathogenic fungi such as Armillaria are currently one of the main problems in forest conservation. A comprehensive study of these species and their pathogenicity is of great importance and needs good genomic resources. The assembled genome of A. borealis presented in this study is of sufficiently good quality for further detailed comparative study on the composition of enzymes in other Armillaria species. There is also a fundamental problem with the identification and classification of species of the Armillaria genus, where the study of repetitive sequences in the genomes of basidiomycetes and their comparative analysis will help us identify more accurately taxonomy of these species and reveal their evolutionary relationships.

Benchmarking topological accuracy of bacterial phylogenomic workflows using in silico evolution

10.1101/2021.08.03.454900 ◽

2021 ◽

Author(s):

Boas CL van der Putten ◽

Niek AH Huijsmans ◽

Daniel R Mende ◽

Constance Schultsz

Keyword(s):

De Novo ◽

Phylogenetic Analyses ◽

Bacterial Species ◽

Phylogenetic Reconstruction ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

De Novo Genome Assembly ◽

Relevant Alternatives ◽

Wide Range ◽

Similar Accuracy

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95% average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked seventeen distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and SKA achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or SKESA outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction k-mer alignment methods are relevant alternatives to reference mapping at species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

Comparative Genomics of Clinical Isolates of the Emerging Tick-Borne Pathogen Neoehrlichia mikurensis

Microorganisms ◽

10.3390/microorganisms9071488 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1488

Author(s):

Anna Grankvist ◽

Daniel Jaén-Luchoro ◽

Linda Wass ◽

Per Sikora ◽

Christine Wennerås

Keyword(s):

Vascular Endothelium ◽

De Novo ◽

Phylogenetic Analyses ◽

Geographic Origin ◽

Comparative Genomic ◽

Whole Genome ◽

Illumina Hiseq ◽

Protein Coding ◽

Ehrlichia Ruminantium ◽

Protein Coding Genes

Tick-borne ‘Neoehrlichia (N.) mikurensis’ is the cause of neoehrlichiosis, an infectious vasculitis of humans. This strict intracellular pathogen is a member of the family Anaplasmataceae and has been unculturable until recently. The only available genetic data on this new pathogen are six partially sequenced housekeeping genes. The aim of this study was to advance the knowledge regarding ‘N. mikurensis’ genomic relatedness with other Anaplasmataceae members, intra-species genotypic variability and potential virulence factors explaining its tropism for vascular endothelium. Here, we present the de novo whole-genome sequences of three ‘N. mikurensis’ strains derived from Swedish patients diagnosed with neoehrlichiosis. The genomes were obtained by extraction of DNA from patient plasma, library preparation using 10x Chromium technology, and sequencing by Illumina Hiseq-4500. ‘N. mikurensis’ was found to have the next smallest genome of the Anaplasmataceae family (1.1 Mbp with 27% GC contents) consisting of 845 protein-coding genes, every third of which with unknown function. Comparative genomic analyses revealed that ‘N. mikurensis’ was more closely related to Ehrlichia chaffeensis than to Ehrlichia ruminantium, the opposite of what 16SrRNA sequence-based phylogenetic analyses determined. The genetic variability of the three whole-genome-sequenced ‘N. mikurensis’ strains was extremely low, between 0.14 and 0.22‰, a variation that was associated with geographic origin. No protein-coding genes exclusively shared by N. mikurensis and E. ruminantium were identified to explain their common tropism for vascular endothelium.

1199. Phylogenomic analysis of Campylobacter jejuni isolated from gastroenteritis cases in Michigan

Open Forum Infectious Diseases ◽

10.1093/ofid/ofaa439.1384 ◽

2020 ◽

Vol 7 (Supplement_1) ◽

pp. S621-S621

Author(s):

Jose A Rodrigues ◽

Heather M Blankenship ◽

Wonhee Cha ◽

Rebekah Mosci ◽

Shannon D Manning

Keyword(s):

De Novo ◽

Phylogenetic Analyses ◽

Foodborne Pathogen ◽

Antibiotic Resistance Genes ◽

Resistance Mechanisms ◽

Phylogenomic Analysis ◽

De Novo Genome Assembly ◽

Specific Strain ◽

Phylogenomic Analyses ◽

Clade 1

Abstract Background C. jejuni is the leading cause of bacterial gastroenteritis worldwide. It has been classified as a serious antibiotic resistant threat, causing 13,000 hospitalizations and 120 deaths annually. Our goal was to describe the diversity of clinical C. jejuni using phylogenomics and classify resistance mechanisms. Methods Isolates were collected via sentinel surveillance at four hospitals, and demographic and clinical data were obtained. DNA was extracted and sequenced. Raw reads were processed with Trimmomatic and quality checked with FastQC. De novo genome assembly was performed in Spades. Assembled genomes were filtered for quality and completeness; samples of 1.4-2.1MB were annotated in Prokka followed by pangenome and phylogenetic analyses. Multilocus sequence typing loci and virulence and antibiotic resistance genes were extracted from each genome. Results Among the 214 C. jejuni isolates recovered, 86 unique sequence types (STs) were identified; five were novel STs with unique allele combinations. ST353 (8.3%: n=18), ST982 (7.4%: n=16), ST50 (5.1 %: n=11) and ST48 (5.1%: n=11) were the most prevalent STs identified, while the majority (50.1%: n=50) of STs were singletons. The pangenome analysis identified 8781, 615, and 1169 total, core, and shell core genes, respectively, which grouped the isolates into three major clades. Most isolates belonged to clade 1. A neighbor-net analysis detected significant recombination among all 86 STs (pairwise homoplasy index p=< 0.00001) and evidence of horizontal gene transfer across clades. The beta-lactamase gene, blaOXA-605, was the most common resistance gene identified (58.8%: n=125) followed by tet(O) (56.0%: n=121), which mediate resistance to beta-lactams and tetracyclines, respectively. Resistance phenotypes were confirmed using microbroth dilution. Conclusion: Together, these data demonstrate that the C. jejuni population is highly diverse and carries important resistance determinants. The phylogenomic analyses also provide insight into the evolution of this major foodborne pathogen. Future work will focus on identifying molecular and epidemiological factors associated with specific strain types and resistance and virulence profiles circulating in Michigan. Disclosures All Authors: No reported disclosures

Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1501764112 ◽

2015 ◽

Vol 112 (11) ◽

pp. E1257-E1262 ◽

Cited By ~ 103

Author(s):

Yan-Bo Sun ◽

Zi-Jun Xiong ◽

Xue-Yan Xiang ◽

Shi-Ping Liu ◽

Wei-Wei Zhou ◽

...

Keyword(s):

De Novo ◽

Structural Evolution ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Protein Coding ◽

Comparable Rate ◽

Evolutionary Studies ◽

Genomic Studies ◽

The Difference

The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

Insights into triterpene synthesis and unsaturated fatty-acid accumulation provided by chromosomal-level genome analysis of Akebia trifoliata subsp. australis

Horticulture Research ◽

10.1038/s41438-020-00458-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hui Huang ◽

Juan Liang ◽

Qi Tan ◽

Linfeng Ou ◽

Xiaolin Li ◽

...

Keyword(s):

Fatty Acids ◽

Fatty Acid ◽

Unsaturated Fatty Acids ◽

Developmental Stages ◽

De Novo ◽

Acyl Carrier Protein ◽

Repetitive Sequences ◽

Gene Families ◽

Terpene Synthase ◽

Comparative Genomic

AbstractAkebia trifoliata subsp. australis is a well-known medicinal and potential woody oil plant in China. The limited genetic information available for A. trifoliata subsp. australis has hindered its exploitation. Here, a high-quality chromosome-level genome sequence of A. trifoliata subsp. australis is reported. The de novo genome assembly of 682.14 Mb was generated with a scaffold N50 of 43.11 Mb. The genome includes 25,598 protein-coding genes, and 71.18% (485.55 Mb) of the assembled sequences were identified as repetitive sequences. An ongoing massive burst of long terminal repeat (LTR) insertions, which occurred ~1.0 million years ago, has contributed a large proportion of LTRs in the genome of A. trifoliata subsp. australis. Phylogenetic analysis shows that A. trifoliata subsp. australis is closely related to Aquilegia coerulea and forms a clade with Papaver somniferum and Nelumbo nucifera, which supports the well-established hypothesis of a close relationship between basal eudicot species. The expansion of UDP-glucoronosyl and UDP-glucosyl transferase gene families and β-amyrin synthase-like genes and the exclusive contraction of terpene synthase gene families may be responsible for the abundant oleanane-type triterpenoids in A. trifoliata subsp. australis. Furthermore, the acyl-ACP desaturase gene family, including 12 stearoyl-acyl-carrier protein desaturase (SAD) genes, has expanded exclusively. A combined transcriptome and fatty-acid analysis of seeds at five developmental stages revealed that homologs of SADs, acyl-lipid desaturase omega fatty acid desaturases (FADs), and oleosins were highly expressed, consistent with the rapid increase in the content of fatty acids, especially unsaturated fatty acids. The genomic sequences of A. trifoliata subsp. australis will be a valuable resource for comparative genomic analyses and molecular breeding.

Chromosome-scale genome assembly for the duckweed Spirodela intermedia, integrating cytogenetic maps, PacBio and Oxford Nanopore libraries

Scientific Reports ◽

10.1038/s41598-020-75728-9 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Phuong T. N. Hoang ◽

Anne Fiebig ◽

Petr Novák ◽

Jiří Macas ◽

Hieu X. Cao ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Small Chromosome ◽

De Novo Genome Assembly ◽

Comparative Cytogenetics ◽

Protein Coding ◽

Copy Numbers ◽

Oxford Nanopore ◽

Rdna Copy

Abstract Duckweeds are small, free-floating, morphologically highly reduced organisms belonging to the monocot order Alismatales. They display the most rapid growth among flowering plants, vary ~ 14-fold in genome size and comprise five genera. Spirodela is the phylogenetically oldest genus with only two mainly asexually propagating species: S. polyrhiza (2n = 40; 160 Mbp/1C) and S. intermedia (2n = 36; 160 Mbp/1C). This study combined comparative cytogenetics and de novo genome assembly based on PacBio, Illumina and Oxford Nanopore (ON) reads to obtain the first genome reference for S. intermedia and to compare its genomic features with those of the sister species S. polyrhiza. Both species’ genomes revealed little more than 20,000 putative protein-coding genes, very low rDNA copy numbers and a low amount of repetitive sequences, mainly Ty3/gypsy retroelements. The detection of a few new small chromosome rearrangements between both Spirodela species refined the karyotype and the chromosomal sequence assignment for S. intermedia.

s-aligner: a greedy algorithm for non-greedy de novo genome assembly

10.1101/2021.02.02.429443 ◽

2021 ◽

Author(s):

Juanjo Bermúdez

Keyword(s):

Genome Assembly ◽

De Novo ◽

Biological Research ◽

De Novo Genome Assembly ◽

Valid Conclusion ◽

Inconclusive Result ◽

Large Virus ◽

The Difference ◽

Virus Genomes ◽

Assembly Tool

Genome assembly is a fundamental tool for biological research. Particularly, in microbiology, where budgets per sample are often scarce, it can make the difference between an inconclusive result and a fully valid conclusion. Identifying new strains or estimating the relative abundance of quasi-species in a sample are some example tasks that can’t be properly accomplished without previously generating assemblies with little structure ambiguity and covering most of the genome. In this work, we present a new genome assembly tool based on a greedy strategy. We compare the results obtained applying this tool to the results obtained with previously existing software. We find that, when applied to viral studies, comparatively, the software we developed often gets far larger contigs and higher genome fraction coverage than previous software. We also find a significant advantage when applied to exceptionally large virus genomes.

A de novo genome assembly of the dwarfing pear rootstock Zhongai 1

Scientific Data ◽

10.1038/s41597-019-0291-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Chunqing Ou ◽

Fei Wang ◽

Jiahong Wang ◽

Song Li ◽

Yanjie Zhang ◽

...

Keyword(s):

De Novo ◽

Repetitive Sequences ◽

Draft Genome ◽

Genome Sequences ◽

Fruit Characteristics ◽

De Novo Genome Assembly ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Reads ◽

Cultivated Species

Abstract‘Zhongai 1’ [(Pyrus ussuriensis × communis) × spp.] is an excellent pear dwarfing rootstock common in China. It is dwarf itself and has high dwarfing efficiency on most of main Pyrus cultivated species when used as inter-stock. Here we describe the draft genome sequences of ‘Zhongai 1’ which was assembled using PacBio long reads, Illumina short reads and Hi-C technology. We estimated the genome size is approximately 511.33 Mb by K-mer analysis and obtained a final genome of 510.59 Mb with a contig N50 size of 1.28 Mb. Next, 506.31 Mb (99.16%) of contigs were clustered into 17 chromosomes with a scaffold N50 size of 23.45 Mb. We further predicted 309.86 Mb (60.68%) of repetitive sequences and 43,120 protein-coding genes. The assembled genome will be a valuable resource and reference for future pear breeding, genetic improvement, and comparative genomics among related species. Moreover, it will help identify genes involved in dwarfism, early flowering, stress tolerance, and commercially desirable fruit characteristics.