Cytogenetic Mapping of 35 New Markers in the Alpaca (Vicugna pacos)

Alpaca is a camelid species of broad economic, biological and biomedical interest, and an essential part of the cultural and historical heritage of Peru. Recently, efforts have been made to improve knowledge of the alpaca genome, and its genetics and cytogenetics, to develop molecular tools for selection and breeding. Here, we report cytogenetic mapping of 35 new markers to 19 alpaca autosomes and the X chromosome. Twenty-eight markers represent alpaca SNPs, of which 17 are located inside or near protein-coding genes, two are in ncRNA genes and nine are intergenic. The remaining seven markers correspond to candidate genes for fiber characteristics (BMP4, COL1A2, GLI1, SFRP4), coat color (TYR) and development (CHD7, PAX7). The results take the tally of cytogenetically mapped markers in alpaca to 281, covering all 36 autosomes and the sex chromosomes. The new map assignments overall agree with human–camelid conserved synteny data, except for mapping BMP4 to VPA3, suggesting a hitherto unknown homology with HSA14. The findings validate, refine and correct the current alpaca assembly VicPac3.1 by anchoring unassigned sequence scaffolds, and ordering and orienting assigned scaffolds. The study contributes to the improvement in the alpaca reference genome and advances camelid molecular cytogenetics.

Download Full-text

A Chromosome-Scale Genome Assembly Resource for Myriosclerotinia sulcatula Infecting Sedge Grass (Carex sp.)

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-03-20-0060-a ◽

2020 ◽

Vol 33 (7) ◽

pp. 880-883

Author(s):

Stefan Kusch ◽

Heba M. M. Ibrahim ◽

Catherine Zanchetta ◽

Celine Lopez-Roques ◽

Cecile Donnadieu ◽

...

Keyword(s):

Host Range ◽

Sclerotinia Sclerotiorum ◽

Genome Assembly ◽

Plant Pathogens ◽

Reference Genome ◽

Close Relative ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Reference Genome Assembly

The fungus Myriosclerotinia sulcatula is a close relative of the notorious polyphagous plant pathogens Botrytis cinerea and Sclerotinia sclerotiorum but exhibits a host range restricted to plants from the Carex genus (Cyperaceae family). To date, there are no genomic resources available for fungi in the Myriosclerotinia genus. Here, we present a chromosome-scale reference genome assembly for M. sulcatula. The assembly contains 24 contigs with a total length of 43.53 Mbp, with scaffold N50 of 2,649.7 kbp and N90 of 1,133.1 kbp. BRAKER-predicted gene models were manually curated using WebApollo, resulting in 11,275 protein-coding genes that we functionally annotated. We provide a high-quality reference genome assembly and annotation for M. sulcatula as a resource for studying evolution and pathogenicity in fungi from the Sclerotiniaceae family.

Download Full-text

The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0194-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Baohua Chen ◽

Zhixiong Zhou ◽

Qiaozhen Ke ◽

Yidi Wu ◽

Huaqiang Bai ◽

...

Keyword(s):

Marine Fish ◽

Single Molecule ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Larimichthys Crocea ◽

Chromosome Conformation ◽

Protein Coding ◽

Total Length ◽

Chromosome Level

Abstract Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86 Mb and a contig N50 length of 2.83 Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67 Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.

Download Full-text

Recovery of non-reference sequences missing from the human reference genome

BMC Genomics ◽

10.1186/s12864-019-6107-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Ran Li ◽

Xiaomeng Tian ◽

Peng Yang ◽

Yingzhi Fan ◽

Ming Li ◽

...

Keyword(s):

Human Genome ◽

Tandem Repeats ◽

Reference Genome ◽

De Novo ◽

Precise Location ◽

Protein Coding ◽

Human Reference Genome ◽

Mhc Haplotype ◽

Reference Sequences ◽

Flanking Regions

Abstract Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.

Download Full-text

GENCODE 2021

Nucleic Acids Research ◽

10.1093/nar/gkaa1087 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D916-D923

Author(s):

Adam Frankish ◽

Mark Diekhans ◽

Irwin Jungreis ◽

Julien Lagarde ◽

Jane E Loveland ◽

...

Keyword(s):

Reference Genome ◽

Ucsc Genome Browser ◽

Primary Data ◽

Protein Coding ◽

Bioinformatic Tools ◽

Automated Annotation ◽

First Pass ◽

Mouse Reference Genome ◽

Human And Mouse

Abstract The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Download Full-text

De novo assembly of the cattle reference genome with single-molecule sequencing

GigaScience ◽

10.1093/gigascience/giaa021 ◽

2020 ◽

Vol 9 (3) ◽

Cited By ~ 35

Author(s):

Benjamin D Rosen ◽

Derek M Bickhart ◽

Robert D Schnabel ◽

Sergey Koren ◽

Christine G Elsik ◽

...

Keyword(s):

Single Molecule ◽

De Novo Assembly ◽

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Future Research ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Assembly Accuracy ◽

Genomic Tools

Abstract Background Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. Results We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. Conclusions We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

Download Full-text

Liftoff: accurate mapping of gene annotations

Bioinformatics ◽

10.1093/bioinformatics/btaa1016 ◽

2020 ◽

Author(s):

Alaina Shumate ◽

Steven L Salzberg

Keyword(s):

Reference Genome ◽

Supplementary Information ◽

Closely Related Species ◽

Protein Coding ◽

Human Reference Genome ◽

Sequence Identity ◽

Gene Annotations ◽

Genome Assemblies ◽

Average Sequence Identity ◽

High Quality Genome

Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and Implementation Liftoff can be installed via bioconda and PyPI. Additionally, the source code for Liftoff is available at https://github.com/agshumate/Liftoff Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A mini-atlas of gene expression for the domestic goat (Capra hircus) reveals transcriptional differences in immune signatures between sheep and goats

10.1101/711127 ◽

2019 ◽

Cited By ~ 2

Author(s):

Charity Muriuki ◽

Stephen J. Bush ◽

Mazdak Salavati ◽

Mary E.B. McCulloch ◽

Zofia M. Lisowski ◽

...

Keyword(s):

Gene Expression ◽

Reference Genome ◽

Expression Profiles ◽

Expression Patterns ◽

Small Ruminants ◽

Capra Hircus ◽

Specific Cell ◽

Domestic Goat ◽

Protein Coding ◽

Genomic Resources

AbstractGoats (Capra hircus) are an economically important livestock species providing meat and milk across the globe. They are of particular importance in tropical agri-systems contributing to sustainable agriculture, alleviation of poverty, social cohesion and utilisation of marginal grazing. There are excellent genetic and genomic resources available for goats, including a highly contiguous reference genome (ARS1). However, gene expression information is limited in comparison to other ruminants. To support functional annotation of the genome and comparative transcriptomics we created a mini-atlas of gene expression for the domestic goat. RNA-Seq analysis of 22 transcriptionally rich tissues and cell-types detected the majority (90%) of predicted protein-coding transcripts and assigned informative gene names to more than 1000 previously unannotated protein-coding genes in the current reference genome for goat (ARS1). Using network-based cluster analysis we grouped genes according to their expression patterns and assigned those groups of co-expressed genes to specific cell populations or pathways. We describe clusters of genes expressed in the gastro-intestinal tract and provide the expression profiles across tissues of a subset of genes associated with functional traits. Comparative analysis of the goat atlas with the larger sheep gene expression atlas dataset revealed transcriptional differences between the two species in macrophage-associated signatures. The goat transcriptomic resource complements the large gene expression dataset we have generated for sheep and contributes to the available genomic resources for interpretation of the relationship between genotype and phenotype in small ruminants.

Download Full-text

Chromosome-scale assembly of wild barley accession ‘OUH602’

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab244 ◽

2021 ◽

Author(s):

Kazuhiro Sato ◽

Martin Mascher ◽

Axel Himmelbach ◽

Georg Haberer ◽

Manuel Spannagl ◽

...

Keyword(s):

Reference Genome ◽

Wild Barley ◽

Artificial Chromosome ◽

Gene Identification ◽

Barley Accession ◽

Genomic Information ◽

Protein Coding ◽

Fertile Crescent ◽

Tropical Areas ◽

Chromosome Level

Abstract Barley (Hordeum vulgare) was domesticated from its wild ancestral form ca. 10,000 years ago in the Fertile Crescent and is widely cultivated throughout the world, except for in tropical areas. The genome size of both cultivated barley and its conspecific wild ancestor is approximately 5 Gb. High-quality chromosome-level assemblies of 19 cultivated and one wild barley genotype were recently established by pan-genome analysis. Here, we release another equivalent short-read assembly of the wild barley accession ‘OUH602’. A series of genetic and genomic resources were developed for this genotype in prior studies. Our assembly contains more than 4.4 Gb of sequence, with a scaffold N50 value of over 10 Mb. The haplotype shows high collinearity with the most recently updated barley reference genome, ‘Morex’ V3, with some inversions. Gene projections based on ‘Morex’ gene models revealed 46,807 protein-coding sequences and 43,375 protein coding genes. Alignments to publicly available sequences of bacterial artificial chromosome (BAC) clones of ‘OUH602’ confirm the high accuracy of the assembly. Since more loci of interest have been identified in ‘OUH602’, the release of this assembly, with detailed genomic information, should accelerate gene identification and the utilization of this key wild barley accession.

Download Full-text

Assembly and Annotation of an Ashkenazi Human Reference Genome

10.1101/2020.03.18.997395 ◽

2020 ◽

Cited By ~ 2

Author(s):

Alaina Shumate ◽

Aleksey V. Zimin ◽

Rachel M. Sherman ◽

Daniela Puiu ◽

Justin M. Wagner ◽

...

Keyword(s):

Dna Sequences ◽

Reference Genome ◽

Gene Families ◽

Gene Content ◽

Specific Reference ◽

Protein Coding ◽

Human Reference Genome ◽

Protein Coding Genes ◽

Reference Genomes ◽

Similar Gene

AbstractHere we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are >99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. 40 of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. 11 genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.

Download Full-text

A Reference Genome Sequence for Giant Sequoia

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401612 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3907-3919

Author(s):

Alison D. Scott ◽

Aleksey V. Zimin ◽

Daniela Puiu ◽

Rachael Workman ◽

Monica Britton ◽

...

Keyword(s):

Sierra Nevada ◽

Genome Sequence ◽

Reference Genome ◽

Chromosome Conformation ◽

Protein Coding ◽

Reference Genome Sequence ◽

Oxford Nanopore ◽

Sierra Nevada Mountains ◽

Genomic Tools ◽

Giant Sequoia

The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. Genomic data are limited in giant sequoia and producing a reference genome sequence has been an important goal to allow marker development for restoration and management. Using deep-coverage Illumina and Oxford Nanopore sequencing, combined with Dovetail chromosome conformation capture libraries, the genome was assembled into eleven chromosome-scale scaffolds containing 8.125 Gbp of sequence. Iso-Seq transcripts, assembled from three distinct tissues, were used as evidence to annotate a total of 41,632 protein-coding genes. The genome was found to contain, distributed unevenly across all 11 chromosomes and in 63 orthogroups, over 900 complete or partial predicted NLR genes, of which 375 are supported by annotation derived from protein evidence and gene modeling. This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management.

Download Full-text