IsoCon: Deciphering highly similar multigene family transcripts from Iso-Seq data

Mapping Intimacies ◽

10.1101/246066 ◽

2018 ◽

Author(s):

Kristoffer Sahlin ◽

Marta Tomaszkiewicz ◽

Kateryna D. Makova ◽

Paul Medvedev

Keyword(s):

De Novo ◽

Gene Families ◽

Computational Techniques ◽

Nucleotide Level ◽

Statistical Framework ◽

Novel Approach ◽

Gene Copies ◽

Multicopy Gene ◽

Novel Isoforms ◽

Similar Gene

AbstractA significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence can be highly variable across individuals. For example, each Y chromosome ampliconic gene family harbors several nearly identical (up to 99.99%) gene copies. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end to end transcripts with nucleotide-level precision or assigning them to their respective gene copies. We present IsoCon, a novel approach that combines experimental and computational techniques that leverage the power of long PacBio Iso-Seq reads to determine the full-length transcripts of highly similar multicopy gene families. IsoCon uses a cautiously iterative process to correct errors, followed by a statistical framework that allows it to distinguish errors from true variants with high precision. IsoCon outperforms existing methods for transcriptome analysis of Y ampliconic gene families in both simulated and real human data and is able to detect rare transcripts that differ by as little as one base pair from much more abundant transcripts. IsoCon has allowed us to detect an unprecedented number of novel isoforms, as well as to derive estimates on the number of gene copies in human Y ampliconic gene families.

A Plug-and-Play Approach for the De Novo Generation of Dually Functionalised Bispecifics

10.26434/chemrxiv.8068184.v1 ◽

2019 ◽

Author(s):

Antoine Maruani ◽

Peter A. Szijj ◽

Calise Bahou ◽

João C. F. Nogueira ◽

Stephen Caddick ◽

...

Keyword(s):

De Novo ◽

Therapeutic Index ◽

Antibody Fragments ◽

Bispecific Antibodies ◽

Full Potential ◽

Mechanisms Of Resistance ◽

Large Excess ◽

Chemical Methods ◽

New Class ◽

Novel Approach

<p>Diseases are multifactorial, with redundancies and synergies between various pathways. However, most of the antibody-based therapeutics in clinical trials and on the market interact with only one target thus limiting their efficacy. The targeting of multiple epitopes could improve the therapeutic index of treatment and counteract mechanisms of resistance. To this effect, a new class of therapeutics emerged: bispecific antibodies.</p><p>Bispecific formation using chemical methods is rare and low yielding and/or requires a large excess of one of the two proteins to avoid homodimerisation. In order for chemically prepared bispecifics to deliver their full potential, high-yielding, modular and reliable cross-linking technologies are required. Herein, we describe a novel approach not only for the rapid and high-yielding chemical generation of bispecific antibodies from native antibody fragments, but also for the site-specific dual functionalisation of the resulting bioconjugates. Based on orthogonal clickable functional groups, this strategy enables the assembly of functionalised bispecifics with controlled loading in a modular and convergent manner.</p>

Dissecting the chromosome-level genome of the Asian Clam (Corbicula fluminea)

Scientific Reports ◽

10.1038/s41598-021-94545-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tongqing Zhang ◽

Jiawen Yin ◽

Shengkai Tang ◽

Daming Li ◽

Xiankun Gu ◽

...

Keyword(s):

De Novo ◽

Corbicula Fluminea ◽

Gene Families ◽

Ruditapes Philippinarum ◽

Genomic Sequences ◽

Future Research ◽

Asian Clam ◽

Sequencing Technologies ◽

East And Southeast Asia ◽

Clam Corbicula

AbstractThe Asian Clam (Corbicula fluminea) is a valuable commercial and medicinal bivalve, which is widely distributed in East and Southeast Asia. As a natural nutrient source, the clam is rich in protein, amino acids, and microelements. The genome of C. fluminea has not yet been characterized; therefore, genome-assisted breeding and improvements cannot yet be implemented. In this work, we present a de novo chromosome-scale genome assembly of C. fluminea using PacBio and Hi-C sequencing technologies. The assembled genome comprised 4728 contigs, with a contig N50 of 521.06 Kb, and 1,215 scaffolds with a scaffold N50 of 70.62 Mb. More than 1.51 Gb (99.17%) of genomic sequences were anchored to 18 chromosomes, of which 1.40 Gb (92.81%) of genomic sequences were ordered and oriented. The genome contains 38,841 coding genes, 32,591 (83.91%) of which were annotated in at least one functional database. Compared with related species, C. fluminea had 851 expanded gene families and 191 contracted gene families. The phylogenetic tree showed that C. fluminea diverged from Ruditapes philippinarum, ~ 228.89 million years ago (Mya), and the genomes of C. fluminea and R. philippinarum shared 244 syntenic blocks. Additionally, we identified 2 MITF members and 99 NLRP members in C. fluminea genome. The high-quality and chromosomal Asian Clam genome will be a valuable resource for a range of development and breeding studies of C. fluminea in future research.

Identification and Expression Analysis of the Genes Involved in the Raffinose Family Oligosaccharides Pathway of Phaseolus vulgaris and Glycine max

Plants ◽

10.3390/plants10071465 ◽

2021 ◽

Vol 10 (7) ◽

pp. 1465

Author(s):

Ramon de Koning ◽

Raphaël Kiekens ◽

Mary Esther Muyoka Toili ◽

Geert Angenon

Keyword(s):

Common Bean ◽

Seed Development ◽

Expression Analysis ◽

De Novo ◽

Expression Patterns ◽

Gene Families ◽

Rna Seq ◽

Raffinose Family Oligosaccharides ◽

Specific Expression ◽

Raffinose Synthase

Raffinose family oligosaccharides (RFO) play an important role in plants but are also considered to be antinutritional factors. A profound understanding of the galactinol and RFO biosynthetic gene families and the expression patterns of the individual genes is a prerequisite for the sustainable reduction of the RFO content in the seeds, without compromising normal plant development and functioning. In this paper, an overview of the annotation and genetic structure of all galactinol- and RFO biosynthesis genes is given for soybean and common bean. In common bean, three galactinol synthase genes, two raffinose synthase genes and one stachyose synthase gene were identified for the first time. To discover the expression patterns of these genes in different tissues, two expression atlases have been created through re-analysis of publicly available RNA-seq data. De novo expression analysis through an RNA-seq study during seed development of three varieties of common bean gave more insight into the expression patterns of these genes during the seed development. The results of the expression analysis suggest that different classes of galactinol- and RFO synthase genes have tissue-specific expression patterns in soybean and common bean. With the obtained knowledge, important galactinol- and RFO synthase genes that specifically play a key role in the accumulation of RFOs in the seeds are identified. These candidate genes may play a pivotal role in reducing the RFO content in the seeds of important legumes which could improve the nutritional quality of these beans and would solve the discomforts associated with their consumption.

The reference genome and transcriptome of the limestone langur, Trachypithecus leucocephalus, reveal expansion of genes related to alkali tolerance

BMC Biology ◽

10.1186/s12915-021-00998-2 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Tengcheng Que ◽

Huifeng Wang ◽

Weifei Yang ◽

Jianbao Wu ◽

Chenyang Hou ◽

...

Keyword(s):

De Novo ◽

Great Majority ◽

Gene Families ◽

Functional Enrichment ◽

Mineral Absorption ◽

Iron Storage ◽

Trachypithecus Leucocephalus ◽

Karst Environment ◽

Alkali Tolerance ◽

Almost All

Abstract Background Trachypithecus leucocephalus, the white-headed langur, is a critically endangered primate that is endemic to the karst mountains in the southern Guangxi province of China. Studying the genomic and transcriptomic mechanisms underlying its local adaptation could help explain its persistence within a highly specialized ecological niche. Results In this study, we used PacBio sequencing and optical assembly and Hi-C analysis to create a high-quality de novo assembly of the T. leucocephalus genome. Annotation and functional enrichment revealed many genes involved in metabolism, transport, and homeostasis, and almost all of the positively selected genes were related to mineral ion binding. The transcriptomes of 12 tissues from three T. leucocephalus individuals showed that the great majority of genes involved in mineral absorption and calcium signaling were expressed, and their gene families were significantly expanded. For example, FTH1 primarily functions in iron storage and had 20 expanded copies. Conclusions These results increase our understanding of the evolution of alkali tolerance and other traits necessary for the persistence of T. leucocephalus within an ecologically unique limestone karst environment.

Genome sequence, transcriptome, and annotation of rodent malaria parasite Plasmodium yoelii nigeriensis N67

BMC Genomics ◽

10.1186/s12864-021-07555-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Cui Zhang ◽

Cihan Oguz ◽

Sue Huse ◽

Lu Xia ◽

Jian Wu ◽

...

Keyword(s):

Dna Sequences ◽

Malaria Parasite ◽

Vaccine Development ◽

De Novo ◽

Gene Families ◽

Plasmodium Yoelii ◽

Control Measures ◽

Effective Control ◽

Malaria Parasites ◽

Rodent Malaria

Abstract Background Rodent malaria parasites are important models for studying host-malaria parasite interactions such as host immune response, mechanisms of parasite evasion of host killing, and vaccine development. One of the rodent malaria parasites is Plasmodium yoelii, and multiple P. yoelii strains or subspecies that cause different disease phenotypes have been widely employed in various studies. The genomes and transcriptomes of several P. yoelii strains have been analyzed and annotated, including the lethal strains of P. y. yoelii YM (or 17XL) and non-lethal strains of P. y. yoelii 17XNL/17X. Genomic DNA sequences and cDNA reads from another subspecies P. y. nigeriensis N67 have been reported for studies of genetic polymorphisms and parasite response to drugs, but its genome has not been assembled and annotated. Results We performed genome sequencing of the N67 parasite using the PacBio long-read sequencing technology, de novo assembled its genome and transcriptome, and predicted 5383 genes with high overall annotation quality. Comparison of the annotated genome of the N67 parasite with those of YM and 17X parasites revealed a set of genes with N67-specific orthology, expansion of gene families, particularly the homologs of the Plasmodium chabaudi erythrocyte membrane antigen, large numbers of SNPs and indels, and proteins predicted to interact with host immune responses based on their functional domains. Conclusions The genomes of N67 and 17X parasites are highly diverse, having approximately one polymorphic site per 50 base pairs of DNA. The annotated N67 genome and transcriptome provide searchable databases for fast retrieval of genes and proteins, which will greatly facilitate our efforts in studying the parasite biology and gene function and in developing effective control measures against malaria.

De Novo Genome Assembly of Limpet Bathyacmaea lactea (Gastropoda: Pectinodontidae): The First Reference Genome of a Deep-Sea Gastropod Endemic to Cold Seeps

Genome Biology and Evolution ◽

10.1093/gbe/evaa100 ◽

2020 ◽

Vol 12 (6) ◽

pp. 905-910 ◽

Cited By ~ 2

Author(s):

Ruoyu Liu ◽

Kun Wang ◽

Jun Liu ◽

Wenjie Xu ◽

Yang Zhou ◽

...

Keyword(s):

Deep Sea ◽

Metal Ion ◽

De Novo ◽

Demographic History ◽

Gene Families ◽

Phylogenetic Position ◽

Cold Seeps ◽

Nitrogen And Phosphorus ◽

De Novo Genome Assembly ◽

A Genome

Abstract Cold seeps, characterized by the methane, hydrogen sulfide, and other hydrocarbon chemicals, foster one of the most widespread chemosynthetic ecosystems in deep sea that are densely populated by specialized benthos. However, scarce genomic resources severely limit our knowledge about the origin and adaptation of life in this unique ecosystem. Here, we present a genome of a deep-sea limpet Bathyacmaea lactea, a common species associated with the dominant mussel beds in cold seeps. We yielded 54.6 gigabases (Gb) of Nanopore reads and 77.9-Gb BGI-seq raw reads, respectively. Assembly harvested a 754.3-Mb genome for B. lactea, with 3,720 contigs and a contig N50 of 1.57 Mb, covering 94.3% of metazoan Benchmarking Universal Single-Copy Orthologs. In total, 23,574 protein-coding genes and 463.4 Mb of repetitive elements were identified. We analyzed the phylogenetic position, substitution rate, demographic history, and TE activity of B. lactea. We also identified 80 expanded gene families and 87 rapidly evolving Gene Ontology categories in the B. lactea genome. Many of these genes were associated with heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism. The high-quality assembly and in-depth characterization suggest the B. lactea genome will serve as an essential resource for understanding the origin and adaptation of life in the cold seeps.

Unique structure and positive selection promote the rapid divergence of Drosophila Y chromosomes

10.1101/2021.08.16.456461 ◽

2021 ◽

Author(s):

Ching-Ho Chang ◽

Lauren E. Gregory ◽

Kathleen E. Gordon ◽

Colin D. Meiklejohn ◽

Amanda M. Larracuente

Keyword(s):

Positive Selection ◽

Y Chromosome ◽

Related Species ◽

De Novo ◽

Gene Families ◽

Chromosome Organization ◽

End Joining ◽

Sexual Antagonism ◽

Closely Related Species ◽

Y Chromosomes

AbstractY chromosomes across diverse species convergently evolve a gene-poor, heterochromatic organization enriched for duplicated genes, LTR retrotransposable elements, and satellite DNA. Sexual antagonism and a loss of recombination play major roles in the degeneration of young Y chromosomes. However, the processes shaping the evolution of mature, already degenerated Y chromosomes are less well-understood. Because Y chromosomes evolve rapidly, comparisons between closely related species are particularly useful. We generated de novo long read assemblies complemented with cytological validation to reveal Y chromosome organization in three closely related species of the Drosophila simulans complex, which diverged only 250,000 years ago and share >98% sequence identity. We find these Y chromosomes are divergent in their organization and repetitive DNA composition and discover new Y-linked gene families whose evolution is driven by both positive selection and gene conversion. These Y chromosomes are also enriched for large deletions, suggesting that the repair of double-strand breaks on Y chromosomes may be biased toward microhomology-mediated end joining over canonical non-homologous end-joining. We propose that this repair mechanism generally contributes to the convergent evolution of Y chromosome organization.

CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009631 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009631

Author(s):

Raquel Linheiro ◽

John Archer

Keyword(s):

De Novo ◽

Simulated Data ◽

Real Data ◽

Gene Families ◽

Classification Systems ◽

Whole Body ◽

Cdna Libraries ◽

Sequence Information ◽

Rna Seq ◽

High Quality

With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/.

Discovery of a New TLR Gene and Gene Expansion Event through Improved Desert Tortoise Genome Assembly with Chromosome-Scale Scaffolds

Genome Biology and Evolution ◽

10.1093/gbe/evaa016 ◽

2020 ◽

Vol 12 (2) ◽

pp. 3917-3925

Author(s):

Greer A Dolby ◽

Matheo Morales ◽

Timothy H Webster ◽

Dale F DeNardo ◽

Melissa A Wilson ◽

...

Keyword(s):

Genome Assembly ◽

Mojave Desert ◽

De Novo ◽

Stop Codon ◽

Draft Genome ◽

Gene Families ◽

Sea Turtle ◽

Desert Tortoise ◽

Draft Genome Assembly ◽

Gene Expansion

Abstract Toll-like receptors (TLRs) are a complex family of innate immune genes that are well characterized in mammals and birds but less well understood in nonavian sauropsids (reptiles). The advent of highly contiguous draft genomes of nonmodel organisms enables study of such gene families through analysis of synteny and sequence identity. Here, we analyze TLR genes from the genomes of 22 tetrapod species. Findings reveal a TLR8 gene expansion in crocodilians and turtles (TLR8B), and a second duplication (TLR8C) specifically within turtles, followed by pseudogenization of that gene in the nonfreshwater species (desert tortoise and green sea turtle). Additionally, the Mojave desert tortoise (Gopherus agassizii) has a stop codon in TLR8B (TLR8-1) that is polymorphic among conspecifics. Revised orthology further reveals a new TLR homolog, TLR21-like, which is exclusive to lizards, snakes, turtles, and crocodilians. These analyses were made possible by a new draft genome assembly of the desert tortoise (gopAga2.0), which used chromatin-based assembly to yield draft chromosomal scaffolds (L50 = 26 scaffolds, N50 = 28.36 Mb, longest scaffold = 107 Mb) and an enhanced de novo genome annotation with 25,469 genes. Our three-step approach to orthology curation and comparative analysis of TLR genes shows what new insights are possible using genome assemblies with chromosome-scale scaffolds that permit integration of synteny conservation data.

Transcriptome Analysis of Young Ovaries Reveals Candidate Genes Involved in Gamete Formation in Lantana camara

Plants ◽

10.3390/plants8080263 ◽

2019 ◽

Vol 8 (8) ◽

pp. 263 ◽

Cited By ~ 1

Author(s):

Ze Peng ◽

Krishna Bhattarai ◽

Saroj Parajuli ◽

Zhe Cao ◽

Zhanao Deng

Keyword(s):

Candidate Genes ◽

De Novo ◽

Transcriptome Assembly ◽

Gene Families ◽

Unreduced Gametes ◽

Lantana Camara ◽

Unreduced Gamete ◽

Genomic Resources ◽

Gamete Formation ◽

Gamete Production

Lantana (Lantana camara L., Verbenaceae) is an important ornamental crop, yet can be a highly invasive species. The formation of unreduced female gametes (UFGs) is a major factor contributing to its invasiveness and has severely hindered the development of sterile cultivars. To enrich the genomic resources and gain insight into the genetic mechanisms of UFG formation in lantana, we investigated the transcriptomes of young ovaries of two lantana genotypes, GDGHOP-36 (GGO), producing 100% UFGs, and a cultivar Landmark White Lantana (LWL), not producing UFGs. The de novo transcriptome assembly resulted in a total of 90,641 unique transcript sequences with an N50 of 1692 bp, among which, 29,383 sequences contained full-length coding sequences (CDS). There were 214 transcripts associated with the biological processes of gamete production and 10 gene families orthologous to genes known to control unreduced gamete production in Arabidopsis. We identified 925 transcription factor (TF)-encoding sequences, 91 nucleotide-binding site (NBS)-containing genes, and gene families related to drought/salt tolerance and allelopathy. These genomic resources and candidate genes involved in gamete formation will be valuable for developing new tools to control the invasiveness in L. camara, protect native lantana species, and understand the formation of unreduced gametes in plants.