Whole genome sequences of 23 species from the Drosophila montium species group (Diptera: Drosophilidae): A resource for testing evolutionary hypotheses

AbstractLarge groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is uniquely positioned for comparative studies. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean=196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min=18 kb, max=390 kb, mean=74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5 - 15 %. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96 % of known single-copy Dipteran genes (BUSCOs, n=2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n=3,457), we showed that each montium assembly contains orthologs for at least 91 % of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.

Download Full-text

Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400959 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1443-1455 ◽

Cited By ~ 2

Author(s):

Michael J. Bronski ◽

Ciera C. Martinez ◽

Holli A. Weld ◽

Michael B. Eisen

Keyword(s):

Large Scale ◽

Single Copy ◽

Species Group ◽

Comparative Genomic ◽

Regulatory Sequences ◽

Large Set ◽

Long Distance ◽

Protein Coding ◽

Distance Information ◽

Repeat Content

Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5–15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.

Download Full-text

Evolution of Rosaceae Plastomes Highlights Unique Cerasus Diversification and Independent Origins of Fruiting Cherry

Frontiers in Plant Science ◽

10.3389/fpls.2021.736053 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jing Zhang ◽

Yan Wang ◽

Tao Chen ◽

Qing Chen ◽

Lei Wang ◽

...

Keyword(s):

Single Copy ◽

Genomic Variation ◽

Molecular Dating ◽

Divergent Evolution ◽

Comparative Genomic ◽

Evolutionary Patterns ◽

Protein Coding ◽

Longmenshan Fault ◽

Plastid Protein ◽

Phylogenomic Analyses

Rosaceae comprises numerous types of economically important fruits, ornamentals, and timber. The lack of plastome characteristics has blocked our understanding of the evolution of plastome and plastid genes of Rosaceae crops. Using comparative genomics and phylogenomics, we analyzed 121 Rosaceae plastomes of 54 taxa from 13 genera, predominantly including Cerasus (true cherry) and its relatives. To our knowledge, we generated the first comprehensive map of genomic variation across Rosaceae plastomes. Contraction/expansion of inverted repeat regions and sequence losses of the two single-copy regions underlie large genomic variations in size among Rosaceae plastomes. Plastid protein-coding genes were characterized with a high proportion (over 50%) of synonymous variants and insertion-deletions with multiple triplets. Five photosynthesis-related genes were specially selected in perennial woody trees. Comparative genomic analyses implied divergent evolutionary patterns between pomaceous and drupaceous trees. Across all examined plastomes, unique and divergent evolution was detected in Cerasus plastomes. Phylogenomic analyses and molecular dating highlighted the relatively distant phylogenetic relationship between Cerasus and relatives (Microcerasus, Amygdalus, Prunus, and Armeniaca), which strongly supported treating the monophyletic true cherry group as a separate genus excluding dwarf cherry. High genetic differentiation and distinct phylogenetic relationships implied independent origins and domestication between fruiting cherries, particularly between Prunus pseudocerasus (Cerasus pseudocerasus) and P. avium (C. avium). Well-resolved maternal phylogeny suggested that cultivated P. pseudocerasus originated from Longmenshan Fault zone, the eastern edge of Himalaya-Hengduan Mountains, where it was subjected to frequent genomic introgression between its presumed wild ancestors and relatives.

Download Full-text

Functional Characterization of Enhancer Evolution in the Primate Lineage

10.1101/283168 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jason C. Klein ◽

Aidan Keith ◽

Vikram Agarwal ◽

Timothy Durham ◽

Jay Shendure

Keyword(s):

Large Scale ◽

Morphological Evolution ◽

Functional Divergence ◽

Functional Characterization ◽

Comparative Genomic ◽

Regulatory Sequences ◽

Primate Phylogeny ◽

Cytosine Deamination ◽

Functional Changes ◽

Ancestral Sequences

BackgroundEnhancers play an important role in morphological evolution and speciation by controlling the spatiotemporal expression of genes. Due to technological limitations, previous efforts to understand the evolution of enhancers in primates have typically studied many enhancers at low resolution, or single enhancers at high resolution. Although comparative genomic studies reveal large-scale turnover of enhancers, a specific understanding of the molecular steps by which mammalian or primate enhancers evolve remains elusive.ResultsWe identified candidate hominoid-specific liver enhancers from H3K27ac ChIP-seq data. After locating orthologs in 11 primates spanning ∼40 million years, we synthesized all orthologs as well as computational reconstructions of 9 ancestral sequences for 348 “active tiles” of 233 putative enhancers. We concurrently tested all sequences (20 per tile) for regulatory activity with STARR-seq in HepG2 cells, with the goal of characterizing the evolutionary-functional trajectories of each enhancer. We observe groups of enhancer tiles with coherent trajectories, most of which can be explained by one or two mutational events per tile. We quantify the correlation between the number of mutations along a branch and the magnitude of change in functional activity. Finally, we identify 57 mutations that correlate with functional changes; these are enriched for cytosine deamination events within CpGs, compared to background events.ConclusionsWe characterized the evolutionary-functional trajectories of hundreds of liver enhancers throughout the primate phylogeny. We observe subsets of regulatory sequences that appear to have gained or lost activity at various positions in the primate phylogeny. We use these data to quantify the relationship between sequence and functional divergence, and to identify CpG deamination as a potentially important force in driving changes in enhancer activity during primate evolution.

Download Full-text

Complete Chloroplast Genomes of 14 Mangroves: Phylogenetic and Comparative Genomic Analyses

BioMed Research International ◽

10.1155/2020/8731857 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Chengcheng Shi ◽

Kai Han ◽

Liangwei Li ◽

Inge Seim ◽

Simon Ming-Yuen Lee ◽

...

Keyword(s):

Single Copy ◽

Avicennia Marina ◽

Comparative Genomic ◽

Mangrove Species ◽

Rhizophora Stylosa ◽

Protein Coding ◽

Excoecaria Agallocha ◽

Thespesia Populnea ◽

Chloroplast Genomes ◽

Genome Features

Mangroves are a group of plant species that occupy the coastal intertidal zone and are major components of this ecologically important ecosystem. Mangroves belong to about twenty diverse families. Here, we sequenced and assembled chloroplast genomes of 14 mangrove species from eight families spanning five rosid orders and one asterid order: Fabales (Pongamia pinnata), Lamiales (Avicennia marina), Malpighiales (Excoecaria agallocha, Bruguiera sexangula, Kandelia obovata, Rhizophora stylosa, and Ceriops tagal), Malvales (Hibiscus tiliaceus, Heritiera littoralis, and Thespesia populnea), Myrtales (Laguncularia racemosa, Sonneratia ovata, and Pemphis acidula), and Sapindales (Xylocarpus moluccensis). These chloroplast genomes range from 149 kb to 168 kb in length. A conserved structure of two inverted repeats (IRa and IRb, ~25.8 kb), one large single-copy region (LSC, ~89.0 kb), and one short single-copy region (SSC, ~18.9 kb) as well as ~130 genes (85 protein-coding, 37 tRNAs, and 8 rRNAs) was observed. We found the lowest divergence in the IR regions among the four regions. We also identified simple sequence repeats (SSRs), which were found to be variable in numbers. Most chloroplast genes are highly conserved, with only four genes under positive selection or relaxed pressure. Combined with publicly available chloroplast genomes, we carried out phylogenetic analysis and confirmed the previously reported phylogeny within rosids, including the positioning of obscure families in Malpighiales. Our study reports 14 mangrove chloroplast genomes and illustrates their genome features and evolution.

Download Full-text

Complete plastomes of six species of Wikstroemia (Thymelaeaceae) reveal paraphyly with the monotypic genus Stellera

Scientific Reports ◽

10.1038/s41598-021-93057-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Liefen He ◽

Yonghong Zhang ◽

Shiou Yih Lee

Keyword(s):

Genome Structure ◽

Sequence Divergence ◽

Single Copy ◽

Rrna Genes ◽

Comparative Genomic ◽

Trna Genes ◽

Protein Coding ◽

Current State ◽

Taxonomic Implications ◽

Stellera Chamaejasme

AbstractWikstroemia (Thymelaeaceae) is a diverse genus that extends from Asia to Australia and has been recorded on the Hawaiian Islands. Despite its medicinal properties and resource utilization in pulp production, genetic studies of the species in this important genus have been neglected. In this study, the plastome sequences of six species of Wikstroemia were sequenced and analysed. The plastomes ranged in size between 172,610 bp (W. micrantha) and 173,697 bp (W. alternifolia) and exhibited a typical genome structure consisting of a pair of inverted repeat (IR) regions separated by a large single-copy (LSC) region and a small single-copy (SSC) region. The six plastomes were similar in the 138 or 139 genes predicted, which consisted of 92 or 93 protein-coding genes, 38 tRNA genes, and 8 rRNA genes. The overall GC contents were identical (36.7%). Comparative genomic analyses were conducted with the inclusion of two additional published species of Wikstroemia in which the sequence divergence and expansion of IRs in the plastomes were determined. When compared to the coding sequences (CDSs) of Aquilaria sinensis, five genes, namely, rpl2, rps7, rps18, ycf1 and ycf2, indicated positive selection in W. capitata. The plastome-based phylogenetic analysis inferred that Wikstroemia in its current state is paraphyletic to Stellera chamaejasme, while the ITS-based tree analyses could not properly resolve the phylogenetic relationship between Stellera and Wikstroemia. This finding rekindled interest in the proposal to synonymize Stellera with Wikstroemia, which was previously proposed but rejected due to taxonomic conflicts. Nevertheless, this study provides valuable genomic information to aid in the taxonomic implications and phylogenomic reconstruction of Thymelaeaceae.

Download Full-text

Comparative Chloroplast Genomes of Four Lycoris Species (Amaryllidaceae) Provides New Insight into Interspecific Relationship and Phylogeny

Biology ◽

10.3390/biology10080715 ◽

2021 ◽

Vol 10 (8) ◽

pp. 715

Author(s):

Fengjiao Zhang ◽

Ning Wang ◽

Guanghao Cheng ◽

Xiaochun Shu ◽

Tao Wang ◽

...

Keyword(s):

Large Scale ◽

Phylogenetic Analyses ◽

Gc Content ◽

Natural Hybridization ◽

Comparative Genomic ◽

Protein Coding ◽

Chloroplast Genomes ◽

A Genome ◽

Cp Genome ◽

Conserved Gene

The genus Lycoris (Amaryllidaceae) consists of about 20 species, which is endemic to East Asia. Although the Lycoris species is of great horticultural and medical importance, challenges in accurate species identification persist due to frequent natural hybridization and large-scale intraspecific variation. In this study, we sequenced chloroplast genomes of four Lycoris species and retrieved seven published chloroplast (cp) genome sequences in this genus for comparative genomic and phylogenetic analyses. The cp genomes of these four newly sequenced species were found to be 158,405–158,498 bp with the same GC content of 37.8%. The structure of the genomes exhibited the typical quadripartite structure with conserved gene order and content. A total of 113 genes (20 duplicated) were identified, including 79 protein-coding genes (PCGs), 30 tRNAs, and 4 rRNAs. Phylogenetic analysis showed that the 11 species were clustered into three main groups, and L. sprengeri locate at the base of Lycoriss. The L. radiata was suggested to be the female donor of the L. incarnata, L. shaanxiensis, and L. squamigera. The L. straminea and L. houdyshelii may be derived from L. anhuiensis, L. chinensis, or L. longituba. These results could not only offer a genome-scale platform for identification and utilization of Lycoris but also provide a phylogenomic framework for future studies in this genus.

Download Full-text

Phylogenomics and the evolution of hemipteroid insects

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1815820115 ◽

2018 ◽

Vol 115 (50) ◽

pp. 12775-12780 ◽

Cited By ~ 76

Author(s):

Kevin P. Johnson ◽

Christopher H. Dietrich ◽

Frank Friedrich ◽

Rolf G. Beutel ◽

Benjamin Wipfler ◽

...

Keyword(s):

Large Scale ◽

Phylogenetic Analyses ◽

Sister Group ◽

Single Copy ◽

Molecular Dating ◽

Protein Coding ◽

Fossil Calibration ◽

Large Scale Analysis ◽

Phylogenetic Framework ◽

Phylogenomic Analyses

Hemipteroid insects (Paraneoptera), with over 10% of all known insect diversity, are a major component of terrestrial and aquatic ecosystems. Previous phylogenetic analyses have not consistently resolved the relationships among major hemipteroid lineages. We provide maximum likelihood-based phylogenomic analyses of a taxonomically comprehensive dataset comprising sequences of 2,395 single-copy, protein-coding genes for 193 samples of hemipteroid insects and outgroups. These analyses yield a well-supported phylogeny for hemipteroid insects. Monophyly of each of the three hemipteroid orders (Psocodea, Thysanoptera, and Hemiptera) is strongly supported, as are most relationships among suborders and families. Thysanoptera (thrips) is strongly supported as sister to Hemiptera. However, as in a recent large-scale analysis sampling all insect orders, trees from our data matrices support Psocodea (bark lice and parasitic lice) as the sister group to the holometabolous insects (those with complete metamorphosis). In contrast, four-cluster likelihood mapping of these data does not support this result. A molecular dating analysis using 23 fossil calibration points suggests hemipteroid insects began diversifying before the Carboniferous, over 365 million years ago. We also explore implications for understanding the timing of diversification, the evolution of morphological traits, and the evolution of mitochondrial genome organization. These results provide a phylogenetic framework for future studies of the group.

Download Full-text

Going the Distance

10.23943/princeton/9780691150772.001.0001 ◽

2020 ◽

Author(s):

Ron Harris

Keyword(s):

Large Scale ◽

Western Europe ◽

Family Firms ◽

Global Trade ◽

Long Distance ◽

Business Corporation ◽

Modern Economy ◽

Key Factor ◽

Silk Route ◽

Passive Investors

Before the seventeenth century, trade across Eurasia was mostly conducted in short segments along the Silk Route and Indian Ocean. Business was organized in family firms, merchant networks, and state-owned enterprises, and dominated by Chinese, Indian, and Arabic traders. However, around 1600 the first two joint-stock corporations, the English and Dutch East India Companies, were established. This book tells the story of overland and maritime trade without Europeans, of European Cape Route trade without corporations, and of how new, large-scale, and impersonal organizations arose in Europe to control long-distance trade for more than three centuries. It shows that by 1700, the scene and methods for global trade had dramatically changed: Dutch and English merchants shepherded goods directly from China and India to northwestern Europe. To understand this transformation, the book compares the organizational forms used in four major regions: China, India, the Middle East, and Western Europe. The English and Dutch were the last to leap into Eurasian trade, and they innovated in order to compete. They raised capital from passive investors through impersonal stock markets and their joint-stock corporations deployed more capital, ships, and agents to deliver goods from their origins to consumers. The book explores the history behind a cornerstone of the modern economy, and how this organizational revolution contributed to the formation of global trade and the creation of the business corporation as a key factor in Europe's economic rise.

Download Full-text

Adsorption Isotherm Predictions for Multiple Molecules in MOFs Using the Same Deep Learning Model

10.26434/chemrxiv.9894224.v1 ◽

2019 ◽

Author(s):

Ryther Anderson ◽

Achay Biong ◽

Diego Gómez-Gualdrón

Keyword(s):

Neural Network ◽

Machine Learning ◽

Molecular Simulation ◽

Large Scale ◽

Learning Model ◽

Operating Conditions ◽

Small Subset ◽

Screening Methods ◽

Large Set ◽

Metal Organic

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>

Download Full-text

Mutational patterns and clonal evolution from diagnosis to relapse in pediatric acute lymphoblastic leukemia

Scientific Reports ◽

10.1038/s41598-021-95109-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Shumaila Sayyab ◽

Anders Lundmark ◽

Malin Larsson ◽

Markus Ringnér ◽

Sara Nystedt ◽

...

Keyword(s):

Acute Lymphoblastic Leukemia ◽

Large Scale ◽

Somatic Mutations ◽

Lymphoblastic Leukemia ◽

Clonal Evolution ◽

Point Mutations ◽

Driver Genes ◽

Protein Coding ◽

Pediatric Acute Lymphoblastic Leukemia ◽

Evolutionary Trajectories

AbstractThe mechanisms driving clonal heterogeneity and evolution in relapsed pediatric acute lymphoblastic leukemia (ALL) are not fully understood. We performed whole genome sequencing of samples collected at diagnosis, relapse(s) and remission from 29 Nordic patients. Somatic point mutations and large-scale structural variants were called using individually matched remission samples as controls, and allelic expression of the mutations was assessed in ALL cells using RNA-sequencing. We observed an increased burden of somatic mutations at relapse, compared to diagnosis, and at second relapse compared to first relapse. In addition to 29 known ALL driver genes, of which nine genes carried recurrent protein-coding mutations in our sample set, we identified putative non-protein coding mutations in regulatory regions of seven additional genes that have not previously been described in ALL. Cluster analysis of hundreds of somatic mutations per sample revealed three distinct evolutionary trajectories during ALL progression from diagnosis to relapse. The evolutionary trajectories provide insight into the mutational mechanisms leading relapse in ALL and could offer biomarkers for improved risk prediction in individual patients.

Download Full-text