Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens

Xiaoping Chen; Hongjie Li; Manish K. Pandey; Qingli Yang; Xiyin Wang; Vanika Garg; Haifen Li; Xiaoyuan Chi; Dadakhalandar Doddamani; Yanbin Hong; Hari Upadhyaya; Hui Guo; Aamir W. Khan; Fanghe Zhu; Xiaoyan Zhang; Lijuan Pan; Gary J. Pierce; Guiyuan Zhou; Katta A. V. S. Krishnamohan; Mingna Chen; Ni Zhong; Gaurav Agarwal; Shuanzhu Li; Annapurna Chitikineni; Guo-Qiang Zhang; Shivali Sharma; Na Chen; Haiyan Liu; Pasupuleti Janila; Shaoxiong Li; Min Wang; Tong Wang; Jie Sun; Xingyu Li; Chunyan Li; Mian Wang; Lina Yu; Shijie Wen; Sube Singh; Zhen Yang; Jinming Zhao; Chushu Zhang; Yue Yu; Jie Bi; Xiaojun Zhang; Zhong-Jian Liu; Andrew H. Paterson; Shuping Wang; Xuanqiang Liang; Rajeev K. Varshney; Shanlin Yu

doi:10.1073/pnas.1600899113

Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1600899113 ◽

2016 ◽

Vol 113 (24) ◽

pp. 6785-6790 ◽

Cited By ~ 106

Author(s):

Xiaoping Chen ◽

Hongjie Li ◽

Manish K. Pandey ◽

Qingli Yang ◽

Xiyin Wang ◽

...

Keyword(s):

Draft Genome ◽

Gene Families ◽

Specific Gene ◽

Protein Coding ◽

Oil Biosynthesis ◽

Etiolated Seedlings ◽

Polyploid Formation ◽

A Genome ◽

Arachis Duranensis ◽

Staple Crop

Peanut or groundnut (Arachis hypogaea L.), a legume of South American origin, has high seed oil content (45–56%) and is a staple crop in semiarid tropical and subtropical regions, partially because of drought tolerance conferred by its geocarpic reproductive strategy. We present a draft genome of the peanut A-genome progenitor, Arachis duranensis, and 50,324 protein-coding gene models. Patterns of gene duplication suggest the peanut lineage has been affected by at least three polyploidizations since the origin of eudicots. Resequencing of synthetic Arachis tetraploids reveals extensive gene conversion in only three seed-to-seed generations since their formation by human hands, indicating that this process begins virtually immediately following polyploid formation. Expansion of some specific gene families suggests roles in the unusual subterranean fructification of Arachis. For example, the S1Fa-like transcription factor family has 126 Arachis members, in contrast to no more than five members in other examined plant species, and is more highly expressed in roots and etiolated seedlings than green leaves. The A. duranensis genome provides a major source of candidate genes for fructification, oil biosynthesis, and allergens, expanding knowledge of understudied areas of plant biology and human health impacts of plants, informing peanut genetic improvement and aiding deeper sequencing of Arachis diversity.

Download Full-text

Understanding the Early Evolutionary Stages of a Tandem Drosophilamelanogaster-Specific Gene Family: A Structural and Functional Population Study

Molecular Biology and Evolution ◽

10.1093/molbev/msaa109 ◽

2020 ◽

Vol 37 (9) ◽

pp. 2584-2600 ◽

Cited By ~ 3

Author(s):

Bryan D Clifton ◽

Jamie Jimenez ◽

Ashlyn Kimura ◽

Zeinab Chahine ◽

Pablo Librado ◽

...

Keyword(s):

Gene Family ◽

Sequence Similarity ◽

Gene Families ◽

Read Depth ◽

Specific Gene ◽

Protein Variant ◽

Protein Coding ◽

Expression Levels ◽

Number Variation ◽

Reference Quality

Abstract Gene families underlie genetic innovation and phenotypic diversification. However, our understanding of the early genomic and functional evolution of tandemly arranged gene families remains incomplete as paralog sequence similarity hinders their accurate characterization. The Drosophila melanogaster-specific gene family Sdic is tandemly repeated and impacts sperm competition. We scrutinized Sdic in 20 geographically diverse populations using reference-quality genome assemblies, read-depth methodologies, and qPCR, finding that ∼90% of the individuals harbor 3–7 copies as well as evidence of population differentiation. In strains with reliable gene annotations, copy number variation (CNV) and differential transposable element insertions distinguish one structurally distinct version of the Sdic region per strain. All 31 annotated copies featured protein-coding potential and, based on the protein variant encoded, were categorized into 13 paratypes differing in their 3′ ends, with 3–5 paratypes coexisting in any strain examined. Despite widespread gene conversion, the only copy present in all strains has functionally diverged at both coding and regulatory levels under positive selection. Contrary to artificial tandem duplications of the Sdic region that resulted in increased male expression, CNV in cosmopolitan strains did not correlate with expression levels, likely as a result of differential genome modifier composition. Duplicating the region did not enhance sperm competitiveness, suggesting a fitness cost at high expression levels or a plateau effect. Beyond facilitating a minimally optimal expression level, Sdic CNV acts as a catalyst of protein and regulatory diversity, showcasing a possible evolutionary path recently formed tandem multigene families can follow toward long-term consolidation in eukaryotic genomes.

Download Full-text

Draft Genome Sequence of a Novel Bacterium,Pseudomonassp. Strain MR 02, Capable of Pyomelanin Production, Isolated from the Mahananda River at Siliguri, West Bengal, India

Genome Announcements ◽

10.1128/genomea.01443-17 ◽

2018 ◽

Vol 6 (3) ◽

pp. e01443-17 ◽

Cited By ~ 1

Author(s):

Vivek Kumar Ranjan ◽

Tilak Saha ◽

Shriparna Mukherjee ◽

Ranadhir Chakraborty

Keyword(s):

Genome Sequence ◽

West Bengal ◽

Draft Genome ◽

Homogentisic Acid ◽

Draft Genome Sequence ◽

Gene Length ◽

Protein Coding ◽

Protein Coding Genes ◽

Novel Bacterium ◽

A Genome

ABSTRACTThe draft genome sequence of a novel strain,Pseudomonassp. MR 02, a pyomelanin-producing bacterium isolated from the Mahananda River at Siliguri, West Bengal, India, is reported here. This strain has a genome size of 5.94 Mb, with an overall G+C content of 62.6%. The draft genome reports 5,799 genes (mean gene length, 923 bp), among which 5,503 are protein-coding genes, including the genes required for the catabolism of tyrosine or phenylalanine for the characteristic production of homogentisic acid (HGA). Excess HGA, on excretion, auto-oxidizes and polymerizes to form pyomelanin.

Download Full-text

Draft Genome Assembly and Annotation of Red Raspberry Rubus Idaeus

10.1101/546135 ◽

2019 ◽

Cited By ~ 4

Author(s):

Haley Wight ◽

Junhui Zhou ◽

Muzi Li ◽

Sridhar Hannenhalli ◽

Stephen M. Mount ◽

...

Keyword(s):

De Novo ◽

Draft Genome ◽

Rubus Idaeus ◽

Slow Process ◽

Red Raspberry ◽

Protein Coding ◽

Draft Genome Assembly ◽

Protein Coding Genes ◽

A Genome ◽

Exceptional Value

AbstractThe red raspberry, Rubus idaeus, is widely distributed in all temperate regions of Europe, Asia, and North America and is a major commercial fruit valued for its taste, high antioxidant and vitamin content. However, Rubus breeding is a long and slow process hampered by limited genomic and molecular resources. Genomic resources such as a complete genome sequencing and transcriptome will be of exceptional value to improve research and breeding of this high value crop. Using a hybrid sequence assembly approach including data from both long and short sequence reads, we present the first assembly of the Rubus idaeus genome (Joan J. variety). The de novo assembled genome consists of 2,145 scaffolds with a genome completeness of 95.3% and an N50 score of 638 KB. Leveraging a linkage map, we anchored 80.1% of the genome onto seven chromosomes. Using over 1 billion paired-end RNAseq reads, we annotated 35,566 protein coding genes with a transcriptome completeness score of 97.2%. The Rubus idaeus genome provides an important new resource for researchers and breeders.

Download Full-text

Draft Genome Sequence of Novel Filterable Rhodospirillales Bacterium Strain TMPK1, Isolated from Soil

Microbiology Resource Announcements ◽

10.1128/mra.00393-21 ◽

2021 ◽

Vol 10 (28) ◽

Author(s):

Ryosuke Nakai ◽

Hiroyuki Kusada ◽

Fumihiro Sassa ◽

Susumu Morigasaki ◽

Hisayoshi Hayashi ◽

...

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Draft Genome ◽

Draft Genome Sequence ◽

Soil Suspension ◽

Protein Coding ◽

Coding Sequences ◽

Carotenoid Production ◽

A Genome ◽

Bacterium Strain

We report the draft genome sequence of a novel Rhodospirillales bacterium strain, TMPK1, isolated from a micropore-filtered soil suspension. This strain has a genome of 4,249,070 bp, comprising 4,151 protein-coding sequences. The genome sequence data further suggest that strain TMPK1 is an alphaproteobacterium capable of carotenoid production.

Download Full-text

Draft Genome Sequence of the UV-Resistant Antarctic Bacterium Sphingomonas sp. Strain UV9

Microbiology Resource Announcements ◽

10.1128/mra.01651-18 ◽

2019 ◽

Vol 8 (7) ◽

Cited By ~ 3

Author(s):

Juan J. Marizcurrena ◽

Danilo Morales ◽

Pablo Smircich ◽

Susana Castro-Sowinski

Keyword(s):

Genome Size ◽

Genome Sequence ◽

Draft Genome ◽

Gc Content ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences ◽

Antarctic Bacterium ◽

A Genome ◽

The Antarctic

We report the draft genome sequence of the Antarctic UV-resistant bacterium Sphingomonas sp. strain UV9. The strain has a genome size of 4.25 Mb, a 65.62% GC content, and 3,879 protein-coding sequences.

Download Full-text

How to make a rodent giant: Genomic basis and tradeoffs of gigantism in the capybara, the world’s largest rodent

Molecular Biology and Evolution ◽

10.1093/molbev/msaa285 ◽

2020 ◽

Author(s):

Santiago Herrera-Álvarez ◽

Elinor Karlsson ◽

Oliver A Ryder ◽

Kerstin Lindblad-Toh ◽

Andrew J Crawford

Keyword(s):

Body Size ◽

Draft Genome ◽

Large Body ◽

Comparative Genomic ◽

Specific Gene ◽

Large Body Size ◽

Synonymous Mutations ◽

Intragenomic Conflict ◽

Genome Wide ◽

A Genome

Abstract Gigantism results when one lineage within a clade evolves extremely large body size relative to its small-bodied ancestors, a common phenomenon in animals. Theory predicts that the evolution of giants should be constrained by two tradeoffs. First, because body size is negatively correlated with population size, purifying selection is expected to be less efficient in species of large body size, leading to increased mutational load. Second, gigantism is achieved through generating a higher number of cells along with higher rates of cell proliferation, thus increasing the likelihood of cancer. To explore the genetic basis of gigantism in rodents and uncover genomic signatures of gigantism-related tradeoffs, we assembled a draft genome of the capybara (Hydrochoerus hydrochaeris), the world’s largest living rodent. We found that the genome-wide ratio of non-synonymous to synonymous mutations (ω) is elevated in the capybara relative to other rodents, likely caused by a generation-time effect and consistent with a nearly-neutral model of molecular evolution. A genome-wide scan for adaptive protein evolution in the capybara highlighted several genes controlling post-natal bone growth regulation and musculoskeletal development, which are relevant to anatomical and developmental modifications for an increase in overall body size. Capybara-specific gene-family expansions included a putative novel anticancer adaptation that involves T cell-mediated tumor suppression, offering a potential resolution to the increased cancer risk in this lineage. Our comparative genomic results uncovered the signature of an intragenomic conflict where the evolution of gigantism in the capybara involved selection on genes and pathways that are directly linked to cancer.

Download Full-text

A draft genome sequence of the elusive giant squid, Architeuthis dux

GigaScience ◽

10.1093/gigascience/giz152 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Rute R da Fonseca ◽

Alvarina Couto ◽

Andre M Machado ◽

Brona Brejova ◽

Carolin B Albertin ◽

...

Keyword(s):

Deep Sea ◽

Draft Genome ◽

Deep Ocean ◽

High Arctic ◽

Dosidicus Gigas ◽

Final Size ◽

Protein Coding ◽

Draft Genome Assembly ◽

Giant Squid ◽

A Genome

ABSTRACT Background The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusc with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea–dwelling species will allow several pending evolutionary questions to be unlocked. Findings We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome. Conclusions This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text

Whole Genome Sequencing of the Blue Tilapia (Oreochromis aureus) Provides a Valuable Genetic Resource for Biomedical Research on Tilapias

Marine Drugs ◽

10.3390/md17070386 ◽

2019 ◽

Vol 17 (7) ◽

pp. 386 ◽

Cited By ~ 6

Author(s):

Chao Bian ◽

Jia Li ◽

Xueqiang Lin ◽

Xiyang Chen ◽

Yunhai Yi ◽

...

Keyword(s):

Genome Assembly ◽

Genetic Resource ◽

Draft Genome ◽

Close Relative ◽

Protein Coding ◽

Oreochromis Aureus ◽

Genome Wide ◽

A Genome ◽

Blue Tilapia ◽

First Time

Blue tilapia (Oreochromis aureus) has been an economically important fish in Asian countries. It can grow and reproduce in both freshwater and brackish water conditions, whereas it is also considered as a significant invasive species around the world. This species has been widely used as the hybridization parent(s) for tilapia breeding with a major aim to produce novel strains. However, available genomic resources are still limited for this important tilapia species. Here, we for the first time sequenced and assembled a draft genome for a seawater cultured blue tilapia (0.92 Gb), with 97.8% completeness and a scaffold N50 of 1.1 Mb, which suggests a relatively high quality of this genome assembly. We also predicted 23,117 protein-coding genes in the blue tilapia genome. Comparisons of predicted antimicrobial peptides between the blue tilapia and its close relative Nile tilapia proved that these immunological genes are highly similar with a genome-wide scattering distribution. As a valuable genetic resource, our blue tilapia genome assembly will benefit for biomedical researches and practical molecular breeding for high resistance to various diseases, which have been a critical problem in the aquaculture of tilapias.

Download Full-text

A high-quality genome assembly for the endangered golden snub-nosed monkey (Rhinopithecus roxellana)

GigaScience ◽

10.1093/gigascience/giz098 ◽

2019 ◽

Vol 8 (8) ◽

Cited By ~ 5

Author(s):

Lu Wang ◽

Jinwei Wu ◽

Xiaomei Liu ◽

Dandan Di ◽

Yuhong Liang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Gene Families ◽

Rhinopithecus Roxellana ◽

High Quality ◽

Chromosome Conformation ◽

Protein Coding ◽

A Genome ◽

Close Relationship ◽

High Quality Genome

Abstract Background The golden snub-nosed monkey (Rhinopithecus roxellana) is an endangered colobine species endemic to China, which has several distinct traits including a unique social structure. Although a genome assembly for R. roxellana is available, it is incomplete and fragmented because it was constructed using short-read sequencing technology. Thus, important information such as genome structural variation and repeat sequences may be absent. Findings To obtain a high-quality chromosomal assembly for R. roxellana qinlingensis, we used 5 methods: Pacific Bioscience single-molecule real-time sequencing, Illumina paired-end sequencing, BioNano optical maps, 10X Genomics link-reads, and high-throughput chromosome conformation capture. The assembled genome was ∼3.04 Gb, with a contig N50 of 5.72 Mb and a scaffold N50 of 144.56 Mb. This represented a 100-fold improvement over the previously published genome. In the new genome, 22,497 protein-coding genes were predicted, of which 22,053 were functionally annotated. Gene family analysis showed that 993 and 2,745 gene families were expanded and contracted, respectively. The reconstructed phylogeny recovered a close relationship between R. rollexana and Macaca mulatta, and these 2 species diverged ∼13.4 million years ago. Conclusion We constructed a high-quality genome assembly of the Qinling golden snub-nosed monkey; it had superior continuity and accuracy, which might be useful for future genetic studies in this species and as a new standard reference genome for colobine primates. In addition, the updated genome assembly might improve our understanding of this species and could assist conservation efforts.

Download Full-text