Identification of high-efficiency 3’GG gRNA motifs in indexed FASTA files with ngg2

Mapping Intimacies ◽

10.7287/peerj.preprints.969v1 ◽

2015 ◽

Author(s):

Elisha D Roberson

Keyword(s):

High Efficiency ◽

Homo Sapiens ◽

Proof Of Concept ◽

Model Species ◽

Protein Coding ◽

Protein Coding Genes ◽

Genome Modification ◽

Starting Point ◽

Command Line Tool ◽

Reference Genomes

CRISPR/Cas9 is emerging as one of the most used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3’GG motif, which substantially increases the efficiency of editing at all sites tested. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3’GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six genomes: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens. I identified more than 24 million single match 3’GG motifs in these reference genomes. Greater than 87% of all protein coding genes in the six reference genomes had at least one overlapping unique 3’GG gRNA site. In particular, more than 96% of mouse and 99% of human protein coding genes have at least one unique, overlapping 3’GG gRNA. These identified sites can be used as a starting point in gRNA design, and the ngg2 tool provides an important ability to identify high-efficiency editing sites in non-model species.

Download Full-text

Identification of high-efficiency 3′GG gRNA motifs in indexed FASTA files with ngg2

PeerJ Computer Science ◽

10.7717/peerj-cs.33 ◽

2015 ◽

Vol 1 ◽

pp. e33 ◽

Cited By ~ 2

Author(s):

Elisha D. Roberson

Keyword(s):

High Efficiency ◽

Homo Sapiens ◽

Model Organisms ◽

Proof Of Concept ◽

Protein Coding ◽

C Elegans ◽

Protein Coding Genes ◽

Starting Point ◽

Command Line Tool ◽

Reference Genomes

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3′GG motif, which substantially increases the efficiency of editing at all sites tested inC. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a Python command-line tool, ngg2, to identify 3′GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes:Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster,Danio rerio,Mus musculus, andHomo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3′GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3′GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3′GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3′GG editing sites in any species with an available genome sequence.

Download Full-text

Identification of high-efficiency 3’GG gRNA motifs in indexed FASTA files with ngg2

10.7287/peerj.preprints.969v2 ◽

2015 ◽

Author(s):

Elisha D Roberson

Keyword(s):

High Efficiency ◽

Homo Sapiens ◽

Model Organisms ◽

Proof Of Concept ◽

Protein Coding ◽

C Elegans ◽

Protein Coding Genes ◽

Starting Point ◽

Command Line Tool ◽

Reference Genomes

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3’GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3’GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3’GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3’GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3’GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3'GG editing sites in any species with an available genome sequence.

Download Full-text

Identification of high-efficiency 3’GG gRNA motifs in indexed FASTA files with ngg2

10.7287/peerj.preprints.969 ◽

2015 ◽

Author(s):

Elisha D Roberson

Keyword(s):

High Efficiency ◽

Homo Sapiens ◽

Model Organisms ◽

Proof Of Concept ◽

Protein Coding ◽

C Elegans ◽

Protein Coding Genes ◽

Starting Point ◽

Command Line Tool ◽

Reference Genomes

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3’GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans. Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3’GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens. I also scanned the genomes of pig (Sus scrofa) and African elephant (Loxodonta africana) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3’GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3’GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3’GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3'GG editing sites in any species with an available genome sequence.

Download Full-text

A draft genome assembly of the eastern banjo frog Limnodynastes dumerilii dumerilii (Anura: Limnodynastidae)

10.1101/2020.03.03.971721 ◽

2020 ◽

Cited By ~ 1

Author(s):

Qiye Li ◽

Qunfei Guo ◽

Yang Zhou ◽

Huishuang Tan ◽

Terry Bertozzi ◽

...

Keyword(s):

Genome Assembly ◽

Draft Genome ◽

Protein Coding ◽

Large Genome ◽

Draft Genome Assembly ◽

Protein Coding Genes ◽

Repeat Content ◽

Australian Continent ◽

Large Genome Size ◽

Reference Genomes

AbstractAmphibian genomes are usually challenging to assemble due to large genome size and high repeat content. The Limnodynastidae is a family of frogs native to Australia, Tasmania and New Guinea. As an anuran lineage that successfully diversified on the Australian continent, it represents an important lineage in the amphibian tree of life but lacks reference genomes. Here we sequenced and annotated the genome of the eastern banjo frog Limnodynastes dumerilii dumerilii to fill this gap. The total length of the genome assembly is 2.38 Gb with a scaffold N50 of 285.9 kb. We identified 1.21 Gb of non-redundant sequences as repetitive elements and annotated 24,548 protein-coding genes in the assembly. BUSCO assessment indicated that more than 94% of the expected vertebrate genes were present in the genome assembly and the gene set. We anticipate that this annotated genome assembly will advance the future study of anuran phylogeny and amphibian genome evolution.

Download Full-text

Assembly and Annotation of an Ashkenazi Human Reference Genome

10.1101/2020.03.18.997395 ◽

2020 ◽

Cited By ~ 2

Author(s):

Alaina Shumate ◽

Aleksey V. Zimin ◽

Rachel M. Sherman ◽

Daniela Puiu ◽

Justin M. Wagner ◽

...

Keyword(s):

Dna Sequences ◽

Reference Genome ◽

Gene Families ◽

Gene Content ◽

Specific Reference ◽

Protein Coding ◽

Human Reference Genome ◽

Protein Coding Genes ◽

Reference Genomes ◽

Similar Gene

AbstractHere we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are >99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. 40 of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. 11 genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.

Download Full-text

Whole Genome Sequence of the Commercially Relevant Mushroom Strain Agaricus bisporus var. bisporus ARP23

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400563 ◽

2019 ◽

Vol 9 (10) ◽

pp. 3057-3066 ◽

Cited By ~ 2

Author(s):

Eoin O’Connor ◽

Jamie McGowan ◽

Charley G. P. McCarthy ◽

Aniça Amini ◽

Helen Grogan ◽

...

Keyword(s):

Genome Sequence ◽

Agaricus Bisporus ◽

Genomic Analysis ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Protein Coding ◽

Single Strain ◽

Protein Coding Genes ◽

Starting Point

Agaricus bisporus is an extensively cultivated edible mushroom. Demand for cultivation is continuously growing and difficulties associated with breeding programs now means strains are effectively considered monoculture. While commercial growing practices are highly efficient and tightly controlled, the over-use of a single strain has led to a variety of disease outbreaks from a range of pathogens including bacteria, fungi and viruses. To address this, the Agaricus Resource Program (ARP) was set up to collect wild isolates from diverse geographical locations through a bounty-driven scheme to create a repository of wild Agaricus germplasm. One of the strains collected, Agaricus bisporus var. bisporus ARP23, has been crossed extensively with white commercial varieties leading to the generation of a novel hybrid with a dark brown pileus commonly referred to as ‘Heirloom’. Heirloom has been successfully implemented into commercial mushroom cultivation. In this study the whole genome of Agaricus bisporus var. bisporus ARP23 was sequenced and assembled with Illumina and PacBio sequencing technology. The final genome was found to be 33.49 Mb in length and have significant levels of synteny to other sequenced Agaricus bisporus strains. Overall, 13,030 putative protein coding genes were located and annotated. Relative to the other A. bisporus genomes that are currently available, Agaricus bisporus var. bisporus ARP23 is the largest A. bisporus strain in terms of gene number and genetic content sequenced to date. Comparative genomic analysis shows that the A. bisporus mating loci in unifactorial and unsurprisingly highly conserved between strains. The lignocellulolytic gene content of all A. bisporus strains compared is also very similar. Our results show that the pangenome structure of A. bisporus is quite diverse with between 60–70% of the total protein coding genes per strain considered as being orthologous and syntenically conserved. These analyses and the genome sequence described herein are the starting point for more detailed molecular analyses into the growth and phenotypical responses of Agaricus bisporus var. bisporus ARP23 when challenged with economically important mycoviruses.

Download Full-text

A draft genome assembly of the eastern banjo frog Limnodynastes dumerilii dumerilii (Anura: Limnodynastidae)

Gigabyte ◽

10.46471/gigabyte.2 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Qiye Li ◽

Qunfei Guo ◽

Yang Zhou ◽

Huishuang Tan ◽

Terry Bertozzi ◽

...

Keyword(s):

Genome Assembly ◽

Draft Genome ◽

Protein Coding ◽

Large Genome ◽

Draft Genome Assembly ◽

Protein Coding Genes ◽

Repeat Content ◽

Australian Continent ◽

Large Genome Size ◽

Reference Genomes

Amphibian genomes are usually challenging to assemble due to their large genome size and high repeat content. The Limnodynastidae is a family of frogs native to Australia, Tasmania and New Guinea. As an anuran lineage that successfully diversified on the Australian continent, it represents an important lineage in the amphibian tree of life but lacks reference genomes. Here we sequenced and annotated the genome of the eastern banjo frog Limnodynastes dumerilii dumerilii to fill this gap. The total length of the genome assembly is 2.38 Gb with a scaffold N50 of 285.9 kb. We identified 1.21 Gb of non-redundant sequences as repetitive elements and annotated 24,548 protein-coding genes in the assembly. BUSCO assessment indicated that more than 94% of the expected vertebrate genes were present in the genome assembly and the gene set. We anticipate that this annotated genome assembly will advance the future study of anuran phylogeny and amphibian genome evolution.

Download Full-text

Protein-Coding Genes in Euarchontoglires with Pseudogene Homologs in Humans

Life ◽

10.3390/life10090192 ◽

2020 ◽

Vol 10 (9) ◽

pp. 192

Author(s):

Lev I. Rubanov ◽

Oleg A. Zverkov ◽

Gregory A. Shilovsky ◽

Alexandr V. Seliverstov ◽

Vassily A. Lyubetsky

Keyword(s):

Immune System ◽

Nonhuman Primates ◽

Computer Software ◽

Current Evidence ◽

Model Species ◽

Protein Coding ◽

Protein Coding Genes ◽

Fast Computer ◽

Per Gene ◽

Local Synteny

An original bioinformatics technique is developed to identify the protein-coding genes in rodents, lagomorphs and nonhuman primates that are pseudogenized in humans. The method is based on per-gene verification of local synteny, similarity of exon-intronic structures and orthology in a set of genomes. It is applicable to any genome set, even with the number of genomes exceeding 100, and efficiently implemented using fast computer software. Only 50 evolutionary recent human pseudogenes were predicted. Their functional homologs in model species are often associated with the immune system or digestion and mainly express in the testes. According to current evidence, knockout of most of these genes leads to an abnormal phenotype. Some genes were pseudogenized or lost independently in human and nonhuman hominoids.

Download Full-text

PWAS: proteome-wide association study—linking genes and phenotypes by functional variation in proteins

Genome Biology ◽

10.1186/s13059-020-02089-x ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Nadav Brandes ◽

Nathan Linial ◽

Michal Linial

Keyword(s):

Association Study ◽

Protein Function ◽

Probabilistic Models ◽

Command Line ◽

Functional Variation ◽

Protein Coding ◽

New Associations ◽

Protein Coding Genes ◽

Complex Modes ◽

Command Line Tool

Abstract We introduce Proteome-Wide Association Study (PWAS), a new method for detecting gene-phenotype associations mediated by protein function alterations. PWAS aggregates the signal of all variants jointly affecting a protein-coding gene and assesses their overall impact on the protein’s function using machine learning and probabilistic models. Subsequently, it tests whether the gene exhibits functional variability between individuals that correlates with the phenotype of interest. PWAS can capture complex modes of heritability, including recessive inheritance. A comparison with GWAS and other existing methods proves its capacity to recover causal protein-coding genes and highlight new associations. PWAS is available as a command-line tool.

Download Full-text

Integrated modeling of protein-coding genes in theManduca sextagenome using RNA-seq data from the biochemical model insect

10.1603/ice.2016.110841 ◽

2016 ◽

Cited By ~ 1

Author(s):

Xiaolong Cao

Keyword(s):

Integrated Modeling ◽

Rna Seq ◽

Protein Coding ◽

Protein Coding Genes ◽

Biochemical Model

Download Full-text