scholarly journals A high-quality genome assembly of the North American Song Sparrow, Melospiza melodia

2019 ◽  
Author(s):  
Swarnali Louha ◽  
David A. Ray ◽  
Kevin Winker ◽  
Travis Glenn

AbstractThe song sparrow, Melospiza melodia, is one of the most widely distributed species of songbirds found in North America. It has been used in a wide range of behavioral and ecological studies. This species’ pronounced morphological and behavioral diversity across populations makes it a favorable candidate in several areas of biomedical research. We have generated a high-quality de novo genome assembly of M. melodia using Illumina short read sequences from genomic and in vitro proximity-ligation libraries. The assembled genome is 978.3 Mb, with a coverage of 24.9×, N50 scaffold size of 5.6 Mb and N50 contig size of 31.7 Kb. Genes within our genome assembly are largely complete, with 87.5% full-length genes present out of a set of 4,915 universal single-copy orthologs present in most avian genomes. We annotated our genome assembly and constructed 15,086 gene models, a majority of which have high homology to related birds, Taeniopygia guttata and Junco hyemalis. In total, 83% of the annotated genes are assigned with putative functions. Furthermore, only ~7% of the genome is found to be repetitive; these regions and other non-coding functional regions are also identified. The high-quality M. melodia genome assembly and annotations we report will serve as a valuable resource for facilitating studies on genome structure and evolution that can contribute to biomedical research and serve as a reference in population genomic and comparative genomic studies of closely related species.

2020 ◽  
Vol 10 (4) ◽  
pp. 1159-1166 ◽  
Author(s):  
Swarnali Louha ◽  
David A. Ray ◽  
Kevin Winker ◽  
Travis C. Glenn

The song sparrow, Melospiza melodia, is one of the most widely distributed species of songbirds found in North America. It has been used in a wide range of behavioral and ecological studies. This species’ pronounced morphological and behavioral diversity across populations makes it a favorable candidate in several areas of biomedical research. We have generated a high-quality de novo genome assembly of M. melodia using Illumina short read sequences from genomic and in vitro proximity-ligation libraries. The assembled genome is 978.3 Mb, with a physical coverage of 24.9×, N50 scaffold size of 5.6 Mb and N50 contig size of 31.7 Kb. Our genome assembly is highly complete, with 87.5% full-length genes present out of a set of 4,915 universal single-copy orthologs present in most avian genomes. We annotated our genome assembly and constructed 15,086 gene models, a majority of which have high homology to related birds, Taeniopygia guttata and Junco hyemalis. In total, 83% of the annotated genes are assigned with putative functions. Furthermore, only ∼7% of the genome is found to be repetitive; these regions and other non-coding functional regions are also identified. The high-quality M. melodia genome assembly and annotations we report will serve as a valuable resource for facilitating studies on genome structure and evolution that can contribute to biomedical research and serve as a reference in population genomic and comparative genomic studies of closely related species.


Gigabyte ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Julia Voelker ◽  
Mervyn Shepherd ◽  
Ramil Mauleon

The economically important Melaleuca alternifolia (tea tree) is the source of a terpene-rich essential oil with therapeutic and cosmetic uses around the world. Tea tree has been cultivated and bred in Australia since the 1990s. It has been extensively studied for the genetics and biochemistry of terpene biosynthesis. Here, we report a high quality de novo genome assembly using Pacific Biosciences and Illumina sequencing. The genome was assembled into 3128 scaffolds with a total length of 362 Mb (N50  = 1.9 Mb), with significantly higher contiguity than a previous assembly (N50  = 8.7 Kb). Using a homology-based, RNA-seq evidence-based and ab initio prediction approach, 37,226 protein-coding genes were predicted. Genome assembly and annotation exhibited high completeness scores of 98.1% and 89.4%, respectively. Sequence contiguity was sufficient to reveal extensive gene order conservation and chromosomal rearrangements in alignments with Eucalyptus grandis and Corymbia citriodora genomes. This new genome advances currently available resources to investigate the genome structure and gene family evolution of M. alternifolia. It will enable further comparative genomic studies in Myrtaceae to elucidate the genetic foundations of economically valuable traits in this crop.


2019 ◽  
Author(s):  
Arnab Ghosh ◽  
Matthew G. Johnson ◽  
Austin B. Osmanski ◽  
Swarnali Louha ◽  
Natalia J. Bayona-Vásquez ◽  
...  

AbstractCrocodilians are an economically, culturally, and biologically important group. To improve researchers’ ability to study genome structure, evolution, and gene regulation in the clade, we generated a high-quality de novo genome assembly of the saltwater crocodile, Crocodylus porosus, from Illumina short read data from genomic libraries and in vitro proximity-ligation libraries. The assembled genome is 2,123.5 Mb, with N50 scaffold size of 17.7 Mb and N90 scaffold size of 3.8 Mb. We then annotated this new assembly, increasing the number of annotated genes by 74%. In total, 96% of 23,242 annotated genes were associated with a functional protein domain. Furthermore, multiple non-coding functional regions and mappable genetic markers were identified. Upon analysis and overlapping the results of branch length estimation and site selection tests for detecting potential selection, we found 16 putative genes under positive selection in crocodilians, ten in C. porosus and six in A. mississippiensis. The annotated C. porosus genome will serve as an important platform for osmoregulatory, physiological and sex determination studies, as well as an important reference in investigating the phylogenetic relationships of crocodilians, birds, and other tetrapods.


2021 ◽  
Author(s):  
Xinxin Yi ◽  
Jing Liu ◽  
Shengcai Chen ◽  
Hao Wu ◽  
Min Liu ◽  
...  

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.


2019 ◽  
Vol 12 (1) ◽  
pp. 3635-3646 ◽  
Author(s):  
Arnab Ghosh ◽  
Matthew G Johnson ◽  
Austin B Osmanski ◽  
Swarnali Louha ◽  
Natalia J Bayona-Vásquez ◽  
...  

Abstract Crocodilians are an economically, culturally, and biologically important group. To improve researchers’ ability to study genome structure, evolution, and gene regulation in the clade, we generated a high-quality de novo genome assembly of the saltwater crocodile, Crocodylus porosus, from Illumina short read data from genomic libraries and in vitro proximity-ligation libraries. The assembled genome is 2,123.5 Mb, with N50 scaffold size of 17.7 Mb and N90 scaffold size of 3.8 Mb. We then annotated this new assembly, increasing the number of annotated genes by 74%. In total, 96% of 23,242 annotated genes were associated with a functional protein domain. Furthermore, multiple noncoding functional regions and mappable genetic markers were identified. Upon analysis and overlapping the results of branch length estimation and site selection tests for detecting potential selection, we found 16 putative genes under positive selection in crocodilians, 10 in C. porosus and 6 in Alligator mississippiensis. The annotated C. porosus genome will serve as an important platform for osmoregulatory, physiological, and sex determination studies, as well as an important reference in investigating the phylogenetic relationships of crocodilians, birds, and other tetrapods.


2020 ◽  
Author(s):  
Xinxin Yi ◽  
Jing Liu ◽  
Shengcai Chen ◽  
Hao Wu ◽  
Min Liu ◽  
...  

Abstract BackgroundCultivated soybean (Glycine max) is an important source for protein and oil. Each soybean strain has its own genetic diversity, and the availability of more soybean genomes may enhance comparative genomic analysis of soybean.ResultsIn this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with high contiguity, completeness, and accuracy. We annotated 59,629 gene models and reconstructed 235,109 high-quality full-length transcripts. We have molecularly characterized the genotypes of some important agronomic traits of JD17 by taking advantage of these newly established genomic resources.ConclusionsWe reported a high-quality genome and annotations of a wide range of cultivars, and used them to analyze the genotypes of genes related to important agronomic traits of soybean in JD17. We have demonstrated that high-quality genome assembly can serve as a valuable reference for soybean genomics and breeding research community.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Fei Chen ◽  
Liyao Su ◽  
Shuaiya Hu ◽  
Jia-Yu Xue ◽  
Hui Liu ◽  
...  

AbstractRosa rugosa, commonly known as rugged rose, is a perennial ornamental shrub. It produces beautiful flowers with a mild fragrance and colorful seed pods. Unlike many other cultivated roses, R. rugosa adapts to a wide range of habitat types and harsh environmental conditions such as salinity, alkaline, shade, drought, high humidity, and frigid temperatures. Here, we produced and analyzed a high-quality genome sequence for R. rugosa to understand its ecology, floral characteristics and evolution. PacBio HiFi reads were initially used to construct the draft genome of R. rugosa, and then Hi-C sequencing was applied to assemble the contigs into 7 chromosomes. We obtained a 382.6 Mb genome encoding 39,704 protein-coding genes. The genome of R. rugosa appears to be conserved with no additional whole-genome duplication after the gamma whole-genome triplication (WGT), which occurred ~100 million years ago in the ancestor of core eudicots. Based on a comparative analysis of the high-quality genome assembly of R. rugosa and other high-quality Rosaceae genomes, we found a unique large inverted segment in the Chinese rose R. chinensis and a retroposition in strawberry caused by post-WGT events. We also found that floral development- and stress response signaling-related gene modules were retained after the WGT. Two MADS-box genes involved in floral development and the stress-related transcription factors DREB2A-INTERACTING PROTEIN 2 (DRIP2) and PEPTIDE TRANSPORTER 3 (PTR3) were found to be positively selected in evolution, which may have contributed to the unique ability of this plant to adapt to harsh environments. In summary, the high-quality genome sequence of R. rugosa provides a map for genetic studies and molecular breeding of this plant and enables comparative genomic studies of Rosa in the near future.


Science ◽  
2018 ◽  
Vol 362 (6415) ◽  
pp. 705-709 ◽  
Author(s):  
Hao Shen ◽  
Jorge A. Fallas ◽  
Eric Lynch ◽  
William Sheffler ◽  
Bradley Parry ◽  
...  

We describe a general computational approach to designing self-assembling helical filaments from monomeric proteins and use this approach to design proteins that assemble into micrometer-scale filaments with a wide range of geometries in vivo and in vitro. Cryo–electron microscopy structures of six designs are close to the computational design models. The filament building blocks are idealized repeat proteins, and thus the diameter of the filaments can be systematically tuned by varying the number of repeat units. The assembly and disassembly of the filaments can be controlled by engineered anchor and capping units built from monomers lacking one of the interaction surfaces. The ability to generate dynamic, highly ordered structures that span micrometers from protein monomers opens up possibilities for the fabrication of new multiscale metamaterials.


Author(s):  
Valentina Peona ◽  
Mozes P.K. Blom ◽  
Luohao Xu ◽  
Reto Burri ◽  
Shawn Sullivan ◽  
...  

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.


2019 ◽  
Author(s):  
Kenta Shirasawa ◽  
Akifumi Azuma ◽  
Fumiya Taniguchi ◽  
Toshiya Yamamoto ◽  
Akihiko Sato ◽  
...  

AbstractThis study presents the first genome sequence of an interspecific grape hybrid, ‘Shine Muscat’ (Vitis labruscana × V. vinifera), an elite table grape cultivar bred in Japan. The complexity of the genome structure, arising from the interspecific hybridization, necessitated the use of a sophisticated genome assembly pipeline with short-read genome sequence data. The resultant genome assemblies consisted of two types of sequences: a haplotype-phased sequence of the highly heterozygous genomes and an unphased sequence representing a “haploid” genome. The unphased sequences spanned 490.1 Mb in length, 99.4% of the estimated genome size, with 8,696 scaffold sequences with an N50 length of 13.2 Mb. The phased sequences had 15,650 scaffolds spanning 1.0 Gb with N50 of 4.2 Mb. The two sequences comprised 94.7% and 96.3% of the core eukaryotic genes, indicating that the entire genome of ‘Shine Muscat’ was represented. Examination of genome structures revealed possible genome rearrangements between the genomes of ‘Shine Muscat’ and a V. vinifera line. Furthermore, full-length transcriptome sequencing analysis revealed 13,947 gene loci on the ‘Shine Muscat’ genome, from which 26,199 transcript isoforms were transcribed. These genome resources provide new insights that could help cultivation and breeding strategies produce more high-quality table grapes such as ‘Shine Muscat’.


Sign in / Sign up

Export Citation Format

Share Document