Chromosome-Level Genome Assemblies Expand Capabilities of Genomics for Conservation Biology

Genome assemblies are in the process of becoming an increasingly important tool for understanding genetic diversity in threatened species. Unfortunately, due to limited budgets typical for the area of conservation biology, genome assemblies of threatened species, when available, tend to be highly fragmented, represented by tens of thousands of scaffolds not assigned to chromosomal locations. The recent advent of high-throughput chromosome conformation capture (Hi-C) enables more contiguous assemblies containing scaffolds spanning the length of entire chromosomes for little additional cost. These inexpensive contiguous assemblies can be generated using Hi-C scaffolding of existing short-read draft assemblies, where N50 of the draft contigs is larger than 0.1% of the estimated genome size and can greatly improve analyses and facilitate visualization of genome-wide features including distribution of genetic diversity in markers along chromosomes or chromosome-length scaffolds. We compared distribution of genetic diversity along chromosomes of eight mammalian species, including six listed as threatened by IUCN, where both draft genome assemblies and newer chromosome-level assemblies were available. The chromosome-level assemblies showed marked improvement in localization and visualization of genetic diversity, especially where the distribution of low heterozygosity across the genomes of threatened species was not uniform.

Download Full-text

Chromosome-Level Genome Assemblies: Expanded Capabilities for Conservation Biology Research

Proceedings ◽

10.3390/iecge-07149 ◽

2020 ◽

Vol 76 (1) ◽

pp. 10

Author(s):

Azamat Totikov ◽

Andrey Tomarovsky ◽

Lorena Derezanin ◽

Olga Dudchenko ◽

Erez Lieberman-Aiden ◽

...

Keyword(s):

Genetic Diversity ◽

High Throughput ◽

Conservation Biology ◽

Threatened Species ◽

Draft Genome ◽

Chromosome Conformation ◽

Genome Wide ◽

Genome Assemblies ◽

Biology Research ◽

Chromosome Level

Genome assemblies are becoming increasingly important for understanding genetic diversity in threatened species. However, due to limited budgets in the area of conservation biology, genome assemblies, when available, tend to be highly fragmented with tens of thousands of scaffolds. The recent advent of high throughput chromosome conformation capture (Hi-C) makes it possible to generate more contiguous assemblies containing scaffolds that are length of entire chromosomes. Such assemblies greatly facilitate analyses and visualization of genome-wide features. We compared genetic diversity in seven threatened species that had both draft genome assemblies and newer chromosome-level assemblies available. Chromosome-level assemblies allowed better estimation of genetic diversity, localization, and, especially, visualization of low heterozygosity regions in the genomes.

Download Full-text

Sea anemone genomes reveal ancestral metazoan chromosomal macrosynteny

10.21203/rs.3.rs-796229/v1 ◽

2021 ◽

Author(s):

Ulrich Technau ◽

Sophia Robb ◽

Grigory Genikhovich ◽

Juan Montenegro ◽

Witney Fropf ◽

...

Keyword(s):

Gene Regulation ◽

Draft Genome ◽

Gene Clusters ◽

Sea Anemone ◽

Gene Repertoire ◽

Sea Anemones ◽

Animal Evolution ◽

Topologically Associated Domains ◽

Genome Assemblies ◽

Chromosome Level

Abstract Draft genome sequences of non-bilaterian species have provided important insights into the evolution of the metazoan gene repertoire. However, there is little information about the evolution of gene clusters, genome architectures and karyotypes during animal evolution. Here we report chromosome-level genome assemblies of two related anthozoan cnidarians, the sea anemones, Nematostella vectensis and Scolanthus callimorphus. We find a robust set of 15 chromosomes with a clear one-to-one correspondence of the chromosomes between the two species. We show that, in contrast to Bilateria, Hox and NK clusters of investigated cnidarians are disintegrated, indicating that microsynteny conservation is largely lost. In line with that, we find no evidence for topologically associated domains, suggesting fundamental difference in long-range gene regulation compared to vertebrates. However, both sea anemone genomes show remarkable chromosomal conservation with other cnidarians, several bilaterians and the sponge Ephydatia muelleri, allowing us to reconstruct the putative cnidarian and metazoan chromosomes, consisting of 19 and 16 ancestral linkage groups, respectively. These data suggest that large parts of the ancestral metazoan genome have been retained in chromosomes of some extant lineages, yet, higher order gene regulation may have evolved only after the cnidarian-bilaterian split.

Download Full-text

Practical guide for obtaining and validating chromosome-scale genome assemblies with Hi-C scaffolding

10.22541/au.160968142.22103947/v1 ◽

2021 ◽

Author(s):

Kazuaki Yamaguchi ◽

Mitsutaka Kadota ◽

Osamu Nishimura ◽

Yuta Ohishi ◽

Yuki Naito ◽

...

Keyword(s):

Single Copy ◽

Test Case ◽

Range Interaction ◽

Chromosome Conformation ◽

Genome Sequence Assembly ◽

Genome Wide ◽

Reptile Species ◽

Genome Assemblies ◽

Massive Information ◽

Completeness Assessment

Recent development of ecological studies has been fueled by the introduction of massive information based on chromosome-scale genome sequences, even for species whose genetic linkage was previously not accessible. This was enabled mainly by the application of Hi-C, a method for genome-wide chromosome conformation capture which was originally developed for investigating long-range interaction of chromatins. Performing genomic scaffolding using Hi-C data is highly resource-demanding in elaborate laboratory steps for sequencing sample preparation, building primary genome sequence assembly as an input, and computation for genome scaffolding using Hi-C data, followed by careful validation. This article summarizes existing solutions for these steps and provides a test case of its application to a reptile species, the Madagascar ground gecko (Paroedura picta). Among frequently exerted metrics for evaluating scaffolding results, we investigate the validity of completeness assessment using single-copy reference orthologs and report problems with the widely used program pipeline BUSCO.

Download Full-text

Comparative analysis of corrected tiger genome provides clues to its neuronal evolution

Scientific Reports ◽

10.1038/s41598-019-54838-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

Parul Mittal ◽

Shubham K. Jaiswal ◽

Nagarjun Vijay ◽

Rituja Saxena ◽

Vineet K. Sharma

Keyword(s):

Neuronal Development ◽

Mammalian Species ◽

Draft Genome ◽

Evolutionary Analysis ◽

Coding Region ◽

Genetic Changes ◽

Gene Sets ◽

Development Processes ◽

Neuronal Functions ◽

Genome Assemblies

AbstractThe availability of completed and draft genome assemblies of tiger, leopard, and other felids provides an opportunity to gain comparative insights on their unique evolutionary adaptations. However, genome-wide comparative analyses are susceptible to errors in genome sequences and thus require accurate genome assemblies for reliable evolutionary insights. In this study, while analyzing the tiger genome, we found almost one million erroneous substitutions in the coding and non-coding region of the genome affecting 4,472 genes, hence, biasing the current understanding of tiger evolution. Moreover, these errors produced several misleading observations in previous studies. Thus, to gain insights into the tiger evolution, we corrected the erroneous bases in the genome assembly and gene set of tiger using ‘SeqBug’ approach developed in this study. We sequenced the first Bengal tiger genome and transcriptome from India to validate these corrections. A comprehensive evolutionary analysis was performed using 10,920 orthologs from nine mammalian species including the corrected gene sets of tiger and leopard and using five different methods at three hierarchical levels, i.e. felids, Panthera, and tiger. The unique genetic changes in tiger revealed that the genes showing signatures of adaptation in tiger were enriched in development and neuronal functioning. Specifically, the genes belonging to the Notch signalling pathway, which is among the most conserved pathways involved in embryonic and neuronal development, were found to have significantly diverged in tiger in comparison to the other mammals. Our findings suggest the role of adaptive evolution in neuronal functions and development processes, which correlates well with the presence of exceptional traits such as sensory perception, strong neuro-muscular coordination, and hypercarnivorous behaviour in tiger.

Download Full-text

Whole-Genome Assemblies for Three Yersinia pestis Strains Isolated in Erenhot, China

Microbiology Resource Announcements ◽

10.1128/mra.01084-20 ◽

2020 ◽

Vol 9 (45) ◽

Author(s):

Jing Wang ◽

Xifeng Yang ◽

Hongyuan Zheng ◽

Li Tian ◽

Qi Shi ◽

...

Keyword(s):

Genetic Diversity ◽

Yersinia Pestis ◽

Draft Genome ◽

Meriones Unguiculatus ◽

Whole Genome ◽

Genome Sequences ◽

Content Type ◽

Genome Assemblies

ABSTRACT To explore the genetic diversity of Yersinia pestis strains in Erenhot, China, and their relationship with Mongolian strains, we collected and sequenced three Y. pestis strains from Erenhot, China, in 2018. Here, we report the draft genome sequences of three Y. pestis bv. Medievalis strains belonging to the 2.MED phylogroup that were circulating in Meriones unguiculatus populations.

Download Full-text

Genome Wide SSR Development and Their Application in Genetic Diversity Analysis in Wax Gourd

10.21203/rs.3.rs-147921/v1 ◽

2021 ◽

Author(s):

Qianmei Hu ◽

Haiping Wang ◽

Biao Jiang ◽

Huayu Zhu ◽

Xiaoming He ◽

...

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Ssr Markers ◽

Molecular Mapping ◽

Draft Genome ◽

Diversity Analysis ◽

Genetic Diversity Analysis ◽

Genome Wide ◽

Benincasa Hispida ◽

Wax Gourd

Abstract Background: Wax gourd (Benincasa hispida Cong., 2n=2x=24) is one of the most important winter vegetables of the Cucurbitaceae family. There are only limited markers available for this crop and the draft genome of wax gourd provides a powerful tool for SSR marker development.Results: In this study, we developed genome-wide SSR markers from wax gourd genome and characterized their distribution and frequency of different motifs and repeats. A total of 52,431 microsatellites from wax gourd genome were identified, of which 39,319 SSR markers were developed. 1,152 non-wax gourd SSR markers were selected from cucumber, melon, watermelon and pumpkin to test their transferability in wax gourd. 580 SSR markers could be transferable in wax gourd, and 42 of them were detected with polymorphic in 11 tested accessions of wax gourd. In addition, 11 good polymorphic transferrable SSR markers and 21 SSR markers of wax gourd were selected to investigate the genetic diversity and population structure of 129 wax gourd accessions. 112 alleles were detected by these 32 SSR markers. The result of population structure showed that the 129 wax gourd accessions were divided into two main populations, and the genetic diversity analysis separated them into two clusters. Conclusions: The large number of wax gourd SSR markers developed in this study provides a valuable resource for genetic linkage map construction, molecular mapping, and marker-assisted selection (MAS) in wax gourd.

Download Full-text

Genome-wide characterization and analysis of microsatellite sequences in camelid species

Mammal Research ◽

10.1007/s13364-019-00458-x ◽

2019 ◽

Vol 65 (2) ◽

pp. 359-373

Author(s):

Manee M. Manee ◽

Abdulmalek T. Algarni ◽

Sultan N. Alharbi ◽

Badr M. Al-Shomrani ◽

Mohanad A. Ibrahim ◽

...

Keyword(s):

Draft Genome ◽

Gc Content ◽

Paternity Testing ◽

Breeding Programs ◽

Coding Regions ◽

Systemic Analysis ◽

Genome Wide ◽

Vicugna Pacos ◽

Arabian Camel ◽

Genome Assemblies

AbstractMicrosatellites or simple sequence repeats (SSRs) are among the genetic markers most widely utilized in research. This includes applications in numerous fields such as genetic conservation, paternity testing, and molecular breeding. Though ordered draft genome assemblies of camels have been announced, including for the Arabian camel, systemic analysis of camel SSRs is still limited. The identification and development of informative and robust molecular SSR markers are essential for marker assisted breeding programs and paternity testing. Here we searched and compared perfect SSRs with 1–6 bp nucleotide motifs to characterize microsatellites for draft genome sequences of the Camelidae. We analyzed and compared the occurrence, relative abundance, relative density, and guanine-cytosine (GC) content in four taxonomically different camelid species: Camelus dromedarius, C. bactrianus, C. ferus, and Vicugna pacos. A total of 546762, 544494, 547974, and 437815 SSRs were mined, respectively. Mononucleotide SSRs were the most frequent in the four genomes, followed in descending order by di-, tetra-, tri-, penta-, and hexanucleotide SSRs. GC content was highest in dinucleotide SSRs and lowest in mononucleotide SSRs. Our results provide further evidence that SSRs are more abundant in noncoding regions than in coding regions. Similar distributions of microsatellites were found in all four species, which indicates that the pattern of microsatellites is conserved in family Camelidae.

Download Full-text

HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data

Bioinformatics ◽

10.1093/bioinformatics/btaa923 ◽

2020 ◽

Author(s):

Dhawal Jain ◽

Chong Chu ◽

Burak Han Alver ◽

Soohyun Lee ◽

Eunjung Alice Lee ◽

...

Keyword(s):

Transposable Element ◽

Large Scale ◽

Human Cell Line ◽

Chromosome Length ◽

Supplementary Information ◽

Structural Variations ◽

Range Interaction ◽

Genome Wide ◽

Common Technique ◽

Genome Assemblies

ABSTRACT Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples. Availability and implementation HiTea is available at https://github.com/parklab/HiTea and as a Docker image. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Scaffolding of long read assemblies using long range contact information

10.1101/083964 ◽

2016 ◽

Cited By ~ 1

Author(s):

Jay Ghurye ◽

Mihai Pop ◽

Sergey Koren ◽

Chen-Shan Chin

Keyword(s):

De Novo ◽

Chromatin Interaction ◽

Interaction Data ◽

Computationally Efficient ◽

Short Read ◽

Contact Information ◽

Genome Wide ◽

Long Read ◽

Genome Assemblies ◽

Chromosome Level

AbstractMotivationLong read technologies have made a revolution in de novo genome assembly by generating contigs of size orders of magnitude more than that of short read assemblies. Although the assembly contiguity has increased, it still does not span a chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To address this problem, we develop a scalable and computationally efficient scaffolding method that can boost the contiguity of the assembly by a large extent using genome wide chromatin interaction data such as Hi-C. Particularly, we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies.ResultsWe tested our methods on two long read assemblies of different organisms. We compared our method with previously developed method and show that our approach performs better in terms of accuracy of scaffolding.AvailabilityThe software is available for free use and can be downloaded from here: https://github.com/machinegun/[email protected]

Download Full-text

HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data

10.1101/2020.04.27.060145 ◽

2020 ◽

Author(s):

Dhawal Jain ◽

Chong Chu ◽

Burak Han Alver ◽

Soohyun Lee ◽

Eunjung Alice Lee ◽

...

Keyword(s):

Transposable Element ◽

Large Scale ◽

Three Dimensional ◽

Human Cell Line ◽

Chromosome Length ◽

Structural Variations ◽

Range Interaction ◽

Genome Wide ◽

Common Technique ◽

Genome Assemblies

AbstractHi-C is a common technique for assessing three-dimensional chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline HiTea (Hi-C based Transposable element analyzer) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE insertion landscape. We employ the pipeline to identify TE insertions from human cell-line Hi-C samples. HiTea is available at https://github.com/parklab/HiTea and as a Docker image.

Download Full-text