Towards Spider Glue: Long-read scaffolding for extreme length and repetitious silk family genes AgSp1 and AgSp2 with insights into functional adaptation

The aggregate gland glycoprotein glue coating the prey-capture threads of orb weaving and cobweb weaving spider webs is comprised of silk protein spidroins (spider fibroins) encoded by two members of the silk gene family. It functions to retain prey that make contact with the web, but differs from solid silk fibers as it is a viscoelastic, amorphic, wet adhesive that is responsive to environmental conditions. Most spidroins are extremely large, highly repetitive genes that are impossible to sequence using only short-read technology. We sequenced for the first time the complete genomic Aggregate Spidroin 1 (AgSp1) and Aggregate Spidroin 2 (AgSp2) glue genes of Argiope trifasciata by using error-prone long reads to scaffold for high accuracy short reads. The massive coding sequences are 42,270 bp (AgSp1) and 20,526 bp (AgSp2) in length, the largest silk genes currently described. The majority of the predicted amino acid sequence of AgSp1 consists of two similar but distinct motifs that are repeated ~40 times each, while AgSp2 contains ~48 repetitions of an AgSp1-similar motif, interspersed by regions high in glutamine. Comparisons of AgSp repetitive motifs from orb web and cobweb spiders show regions of strict conservation followed by striking diversification. Glues from these two spider families have evolved contrasting material properties in adhesion, extensibility, and elasticity, which we link to mechanisms established for related silk genes in the same family. Full-length aggregate spidroin sequences from diverse species with differing material characteristics will provide insights for designing tunable bio-inspired adhesives for a variety of unique purposes.

Download Full-text

Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing

Communications Biology ◽

10.1038/s42003-021-02559-3 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Caroline Belser ◽

Franc-Christophe Baurens ◽

Benjamin Noel ◽

Guillaume Martin ◽

Corinne Cruaud ◽

...

Keyword(s):

Musa Acuminata ◽

Genetic Maps ◽

Nanopore Sequencing ◽

Genome Coverage ◽

Long Reads ◽

Oxford Nanopore ◽

A Genome ◽

Long Read ◽

Genome Assemblies ◽

First Time

AbstractLong-read technologies hold the promise to obtain more complete genome assemblies and to make them easier. Coupled with long-range technologies, they can reveal the architecture of complex regions, like centromeres or rDNA clusters. These technologies also make it possible to know the complete organization of chromosomes, which remained complicated before even when using genetic maps. However, generating a gapless and telomere-to-telomere assembly is still not trivial, and requires a combination of several technologies and the choice of suitable software. Here, we report a chromosome-scale assembly of a banana genome (Musa acuminata) generated using Oxford Nanopore long-reads. We generated a genome coverage of 177X from a single PromethION flowcell with near 17X with reads longer than 75 kbp. From the 11 chromosomes, 5 were entirely reconstructed in a single contig from telomere to telomere, revealing for the first time the content of complex regions like centromeres or clusters of paralogous genes.

Download Full-text

SILK MEDIATED DEFENSE BY AN ORB WEB SPIDER AGAINST PREDATORY MUD-DAUBER WASPS

Behaviour ◽

10.1163/15685390151074357 ◽

2001 ◽

Vol 138 (2) ◽

pp. 155-171 ◽

Cited By ~ 47

Author(s):

Todd Blackledge ◽

John Wenzel

Keyword(s):

Web Sites ◽

Predation Pressure ◽

Spider Webs ◽

Web Spider ◽

Orb Web ◽

Argiope Trifasciata

AbstractStabilimenta are zigzag and spiral designs of seemingly conspicuous silk included at the centers of many spider webs. We examined the association of stabilimenta with the ability of spiders to defend themselves against predatory mud-dauber wasps. We found that Argiope trifasciata (Araneae, Araneidae) were significantly more likely to survive attacks by Chalybion caeruleum and Sceliphron caementarium (Hymenoptera, Sphecidae) when spiders included stabilimenta in webs. This association could not be explained by factors such as differences in sizes or conditions of spiders nor locations of webs. We suggest that stabilimenta may function to delay pursuit of spiders as they drop from webs by physically blocking wasps, camouflaging spiders or distracting attacking wasps. Stabilimenta may function in a role very similar to the retreats built by many other genera of spiders and appear to be an adaptation to reduce the predation pressure faced by spiders that have evolved foraging habits at highly exposed diurnal web sites.

Download Full-text

Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing

10.1101/2021.04.16.440017 ◽

2021 ◽

Author(s):

Caroline Belser ◽

Franc-Christophe Baurens ◽

Benjamin Noel ◽

Guillaume Martin ◽

Corinne Cruaud ◽

...

Keyword(s):

Musa Acuminata ◽

Genetic Maps ◽

Nanopore Sequencing ◽

Genome Coverage ◽

Long Reads ◽

Oxford Nanopore ◽

A Genome ◽

Long Read ◽

Genome Assemblies ◽

First Time

AbstractLong-read technologies hold the promise to obtain more complete genome assemblies and to make them easier. Coupled with long-range technologies, they can reveal the architecture of complex regions, like centromeres or rDNA clusters. These technologies also make it possible to know the complete organization of chromosomes, which remained complicated before even when using genetic maps. However, generating a gapless and telomere-to-telomere assembly is still not trivial, and requires a combination of several technologies and the choice of suitable software. Here, we report a chromosome-scale assembly of a banana genome (Musa acuminata) generated using Oxford Nanopore long-reads. We generated a genome coverage of 177X from a single PromethION flowcell with near 17X with reads longer than 75Kb. From the 11 chromosomes, 5 were entirely reconstructed in a single contig from telomere to telomere, revealing for the first time the content of complex regions like centromeres or clusters of paralogous genes.

Download Full-text

Spidroin profiling of cribellate spiders provides insight into the evolution of spider prey capture strategies

Scientific Reports ◽

10.1038/s41598-020-72888-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Nobuaki Kono ◽

Hiroyuki Nakamura ◽

Masaru Mori ◽

Masaru Tomita ◽

Kazuharu Arakawa

Keyword(s):

Driving Force ◽

Prey Capture ◽

Transcriptome Assembly ◽

Morphological Characteristics ◽

Silk Proteins ◽

Monophyletic Origin ◽

Long Read ◽

Orb Web ◽

Gene Architecture ◽

Insight Into

Abstract Orb-weaving spiders have two main methods of prey capture: cribellate spiders use dry, sticky capture threads, and ecribellate spiders use viscid glue droplets. Predation behaviour is a major evolutionary driving force, and it is important on spider phylogeny whether the cribellate and ecribellate spiders each evolved the orb architecture independently or both strategies were derived from an ancient orb web. These hypotheses have been discussed based on behavioural and morphological characteristics, with little discussion on this subject from the perspective of molecular materials of orb web, since there is little information about cribellate spider-associated spidroin genes. Here, we present in detail a spidroin catalogue of six uloborid species of cribellate orb-weaving spiders, including cribellate and pseudoflagelliform spidroins, with transcriptome assembly complemented with long read sequencing, where silk composition is confirmed by proteomics. Comparative analysis across families (Araneidae and Uloboridae) shows that the gene architecture, repetitive domains, and amino acid frequencies of the orb web constituting silk proteins are similar among orb-weaving spiders regardless of the prey capture strategy. Notably, the fact that there is a difference only in the prey capture thread proteins strongly supports the monophyletic origin of the orb web.

Download Full-text

Ultra-accurate microbial amplicon sequencing with synthetic long reads

Microbiome ◽

10.1186/s40168-021-01072-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Benjamin J. Callahan ◽

Dmitry Grinevich ◽

Siddhartha Thakur ◽

Michael A. Balamotis ◽

Tuval Ben Yehezkel

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Strain Identification ◽

Long Reads ◽

Long Read

Abstract Background Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Methods Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. Results LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. Conclusions The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.

Download Full-text

Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads

BMC Genomics ◽

10.1186/s12864-021-07702-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Seth Commichaux ◽

Kiran Javkar ◽

Padmini Ramachandran ◽

Niranjan Nagarajan ◽

Denis Bertrand ◽

...

Keyword(s):

Public Health ◽

Public Health Response ◽

High Quality ◽

Short Read ◽

Short Reads ◽

The Core ◽

Long Reads ◽

Health Response ◽

Long Read ◽

Core Genes

Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.

Download Full-text

Survey of the Bradysia odoriphaga Transcriptome Using PacBio Single-Molecule Long-Read Sequencing

Genes ◽

10.3390/genes10060481 ◽

2019 ◽

Vol 10 (6) ◽

pp. 481 ◽

Cited By ~ 1

Author(s):

Chen ◽

Lin ◽

Xie ◽

Zhong ◽

Zhang ◽

...

Keyword(s):

Insecticide Resistance ◽

Single Molecule ◽

Functional Categories ◽

Genetic Studies ◽

Sequencing Technologies ◽

Clusters Of Orthologous Groups ◽

Long Read ◽

Main Gene ◽

First Time ◽

Main Factor

The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma

Scientific Reports ◽

10.1038/s41598-021-85354-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hannah E. Roberts ◽

Maria Lopopolo ◽

Alistair T. Pagnamenta ◽

Eshita Sharma ◽

Duncan Parkes ◽

...

Keyword(s):

B Cell ◽

Genome Sequencing ◽

Cell Lymphoma ◽

B Cell Lymphoma ◽

Somatic Variation ◽

Single Nucleotide Variants ◽

Germline Variants ◽

Specificity And Sensitivity ◽

Long Reads ◽

Long Read

AbstractRecent advances in throughput and accuracy mean that the Oxford Nanopore Technologies PromethION platform is a now a viable solution for genome sequencing. Much of the validation of bioinformatic tools for this long-read data has focussed on calling germline variants (including structural variants). Somatic variants are outnumbered many-fold by germline variants and their detection is further complicated by the effects of tumour purity/subclonality. Here, we evaluate the extent to which Nanopore sequencing enables detection and analysis of somatic variation. We do this through sequencing tumour and germline genomes for a patient with diffuse B-cell lymphoma and comparing results with 150 bp short-read sequencing of the same samples. Calling germline single nucleotide variants (SNVs) from specific chromosomes of the long-read data achieved good specificity and sensitivity. However, results of somatic SNV calling highlight the need for the development of specialised joint calling algorithms. We find the comparative genome-wide performance of different tools varies significantly between structural variant types, and suggest long reads are especially advantageous for calling large somatic deletions and duplications. Finally, we highlight the utility of long reads for phasing clinically relevant variants, confirming that a somatic 1.6 Mb deletion and a p.(Arg249Met) mutation involving TP53 are oriented in trans.

Download Full-text

Long read sequencing reveals sequential complex rearrangements driven by Hepatitis B virus integration

10.1101/2021.12.09.471697 ◽

2021 ◽

Author(s):

Songbo Wang ◽

Jiadong Lin ◽

Xiaofei Yang ◽

Zihang Li ◽

Tun Xu ◽

...

Keyword(s):

Hepatitis B ◽

Clinical Samples ◽

Metabolic Dysfunction ◽

Cellular Functions ◽

Human Genomes ◽

Long Reads ◽

B Virus ◽

Long Read ◽

Genetic Structures ◽

Virus Integration

Integration of Hepatitis B (HBV) virus into human genome disrupts genetic structures and cellular functions. Here, we conducted multiplatform long read sequencing on two cell lines and five clinical samples of HBV-induced hepatocellular carcinomas (HCC). We resolved two types of complex viral integration induced genome rearrangements and established a Time-phased Integration and Rearrangement Model (TIRM) to depict their formation progress by differentiating inserted HBV copies with HiFi long reads. We showed that the two complex types were initialized from focal replacements and the fragile virus-human junctions triggered subsequent rearrangements. We further revealed that these rearrangements promoted a prevalent loss-of-heterozygosity at chr4q, accounting for 19.5% of HCC samples in ICGC cohort and contributing to immune and metabolic dysfunction. Overall, our long read based analysis reveals a novel sequential rearrangement progress driven by HBV integration, hinting the structural and functional implications on human genomes.

Download Full-text