EquCab3, an Updated Reference Genome for the Domestic Horse

Mapping Intimacies ◽

10.1101/306928 ◽

2018 ◽

Cited By ~ 9

Author(s):

Theodore S. Kalbfleisch ◽

Edward S. Rice ◽

Michael S. DePriest ◽

Brian P. Walenz ◽

Matthew S. Hestand ◽

...

Keyword(s):

Reference Genome ◽

Reference Sequence ◽

Large Animal ◽

Domestic Horse ◽

Sequencing Technology ◽

Proximity Ligation ◽

Genomics Research ◽

Long Read ◽

Solid Foundation ◽

Work Done

AbstractEquCab2, a high-quality reference genome for the domestic horse, was released in 2007. Since then, it has served as the foundation for nearly all genomic work done in equids. Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference assemblies of large animal and plant genomes in terms of contiguity and composition. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. The result, EquCab3, is presented here. The count of non-N bases in the incorporated chromosomes is improved from 2.33Gb in EquCab2 to 2.41Gb from EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.

QAlign: Aligning nanopore reads accurately using current-level modeling

10.1101/862813 ◽

2019 ◽

Author(s):

Dhaivat Joshi ◽

Shunfu Mao ◽

Sreeram Kannan ◽

Suhas Diggavi

Keyword(s):

Reference Genome ◽

Genomic Analysis ◽

Vital Role ◽

High Error Rate ◽

Sequencing Technology ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Nanopore Sequencer ◽

Sequencing Process

AbstractMotivationEfficient and accurate alignment of DNA / RNA sequence reads to each other or to a reference genome / transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this paper, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome / transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.ResultsWe show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2%, 2.5% and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.Availabilityhttps://github.com/joshidhaivat/QAlign.git

An improved pig reference genome sequence to enable pig genetics and genomics research

GigaScience ◽

10.1093/gigascience/giaa051 ◽

2020 ◽

Vol 9 (6) ◽

Cited By ~ 12

Author(s):

Amanda Warr ◽

Nabeel Affara ◽

Bronwen Aken ◽

Hamid Beiki ◽

Derek M Bickhart ◽

...

Keyword(s):

Reference Genome ◽

Genomic Research ◽

Biomedical Model ◽

Model Species ◽

Domestic Pig ◽

Genomics Research ◽

Long Read ◽

Genetics And Genomics ◽

Genome Assemblies ◽

Chromosome Level

Abstract Background The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility. Results We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2. Conclusions These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.

Rapid, raw-read reference and identification (R4IDs): A flexible platform for rapid generic species ID using long-read sequencing technology

10.1101/281048 ◽

2018 ◽

Cited By ~ 2

Author(s):

Joe Parker ◽

Andrew Helmstetter ◽

James Crowe ◽

John Iacona ◽

Dion Devey ◽

...

Keyword(s):

Dna Sequencing ◽

Species Identification ◽

Sequence Data ◽

Vascular Plant ◽

Reference Sequence ◽

Read Length ◽

Reference Database ◽

Sequencing Technology ◽

Long Read ◽

Suitable Reference

AbstractThe versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publically available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.

An improved pig reference genome sequence to enable pig genetics and genomics research

10.1101/668921 ◽

2019 ◽

Cited By ~ 16

Author(s):

Amanda Warr ◽

Nabeel Affara ◽

Bronwen Aken ◽

H. Beiki ◽

Derek M. Bickhart ◽

...

Keyword(s):

Reference Genome ◽

Genomic Research ◽

Biomedical Model ◽

Model Species ◽

Domestic Pig ◽

Genomics Research ◽

Long Read ◽

Genetics And Genomics ◽

Genome Assemblies ◽

Chromosome Level

AbstractThe domestic pig (Sus scrofa) is important both as a food source and as a biomedical model with high anatomical and immunological similarity to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete and unresolved redundancies, short range order and orientation errors and associated misassembled genes limited its utility. We present two annotated highly contiguous chromosome-level genome assemblies created with more recent long read technologies and a whole genome shotgun strategy, one for the same Duroc female (Sscrofa11.1) and one for an outbred, composite breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2. These highly contiguous assemblies plus annotation of a further 11 short read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.

QAlign: aligning nanopore reads accurately using current-level modeling

Bioinformatics ◽

10.1093/bioinformatics/btaa875 ◽

2020 ◽

Author(s):

Dhaivat Joshi ◽

Shunfu Mao ◽

Sreeram Kannan ◽

Suhas Diggavi

Keyword(s):

Reference Genome ◽

Genomic Analysis ◽

Vital Role ◽

Supplementary Information ◽

Sequencing Technology ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Nanopore Sequencer ◽

Sequencing Process

Abstract Motivation Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. Results We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. Availability and implementation https://github.com/joshidhaivat/QAlign.git. Supplementary information Supplementary data are available at Bioinformatics online.

A chromosome-scale reference genome for Giardia intestinalis WB

Scientific Data ◽

10.1038/s41597-020-0377-y ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 6

Author(s):

Feifei Xu ◽

Aaron Jex ◽

Staffan G. Svärd

Keyword(s):

Reference Genome ◽

Gene Families ◽

Giardia Intestinalis ◽

Valuable Resource ◽

Sequencing Technology ◽

Chromosomal Structure ◽

Long Read ◽

Optical Maps

AbstractGiardia intestinalis is a protist causing diarrhea in humans. The first G. intestinalis genome, from the WB isolate, was published more than ten years ago, and has been widely used as the reference genome for Giardia research. However, the genome is fragmented, thus hindering research at the chromosomal level. We re-sequenced the Giardia genome with Pacbio long-read sequencing technology and obtained a new reference genome, which was assembled into near-complete chromosomes with only four internal gaps at long repeats. This new genome is not only more complete but also better annotated at both structural and functional levels, providing more details about gene families, gene organizations and chromosomal structure. This near-complete reference genome will be a valuable resource for the Giardia community and protist research. It also showcases how a fragmented genome can be improved with long-read sequencing technology completed with optical maps.

Faculty Opinions recommendation of MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732346961.793543720 ◽

2018 ◽

Author(s):

Charles Baer

Keyword(s):

Caenorhabditis Elegans ◽

Reference Genome ◽

Long Read

The chromosome-level reference genome assembly for Dendrobium officinale and its utility of functional genomics research and molecular breeding study

Acta Pharmaceutica Sinica B ◽

10.1016/j.apsb.2021.01.019 ◽

2021 ◽

Author(s):

Zhitao Niu ◽

Fei Zhu ◽

Yajuan Fan ◽

Chao Li ◽

Benhou Zhang ◽

...

Keyword(s):

Functional Genomics ◽

Genome Assembly ◽

Molecular Breeding ◽

Reference Genome ◽

Dendrobium Officinale ◽

Reference Genome Assembly ◽

Genomics Research ◽

Chromosome Level

Author Correction: Improved reference genome for the domestic horse increases assembly contiguity and composition

Communications Biology ◽

10.1038/s42003-019-0591-3 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Theodore S. Kalbfleisch ◽

Edward S. Rice ◽

Michael S. DePriest ◽

Brian P. Walenz ◽

Matthew S. Hestand ◽

...

Keyword(s):

Reference Genome ◽

Domestic Horse

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Cloning, expression, and analysis of the group 2 allergen from Dermatophagoides farinae from China

Anais da Academia Brasileira de Ciências ◽

10.1590/s0001-37652010000400017 ◽

2010 ◽

Vol 82 (4) ◽

pp. 941-951 ◽

Cited By ~ 2

Author(s):

Cui Yu-bao ◽

Ying Zhou ◽

Shi Weihong ◽

Ma Guifang ◽

Li Yang ◽

...

Keyword(s):

Large Scale ◽

Alpha Helix ◽

Random Coil ◽

Reference Sequence ◽

Scale Production ◽

Dermatophagoides Farinae ◽

E Coli ◽

Large Scale Production ◽

Solid Foundation ◽

Group 2

To obtain the recombinant group 2 allergen product of Dermatophagoides farinae (Der f 2), the Der f 2 gene was synthesized by RT-PCR. The full-length cDNA comprised 441 nucleotides and was 99.3% identical to the reference sequence (GenBank AB195580). The cDNA was bound to vector pET28a to construct plasmid pET28a(+)-Der f 2, which was transformed into E. coli BL21 and induced by IPTG. SDS-PAGE showed a specific band of about 14kDa in the hole cell lysate. s estiated by chroatography, about 3.86 g of the recobinant product as obtained, which conjugated with serum IgE from asthmatic children. The protein had a signal peptide of 17 amino acids. Its secondary structure comprised an alpha helix (19.86%), an extended strand (30.82%), and a random coil (49.32%). The subcellular localization of this allergen was predicted to be at mitochondria. Furthermore, its function was shown to be associated with an MD-2-related lipid-recognition (ML) domain. The results of this study provide a solid foundation for large-scale production of the allergen for clinical diagnosis and treatent of allergic disorders.