Long-read sequence capture of the hemoglobin gene clusters across species

AbstractCombining high-throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short-read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long-read sequencing technology for comparative genomic analyses of the hemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom-made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100kb) and MN (~200kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genetic organization and synteny of the Hb genes within this lineage, yet with several, lineage-specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long-read capture as a versatile approach for comparative genomic studies by generation of a cross-species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes.

Download Full-text

Long‐read sequence capture of the haemoglobin gene clusters across codfish species

Molecular Ecology Resources ◽

10.1111/1755-0998.12955 ◽

2018 ◽

Vol 19 (1) ◽

pp. 245-259 ◽

Cited By ~ 4

Author(s):

Siv Nam Khang Hoff ◽

Helle T. Baalsrud ◽

Ave Tooming‐Klunderud ◽

Morten Skage ◽

Todd Richmond ◽

...

Keyword(s):

Gene Clusters ◽

Sequence Capture ◽

Long Read

Download Full-text

Categorization of Orthologous Gene Clusters in 92 Ascomycota Genomes Reveals Functions Important for Phytopathogenicity

Journal of Fungi ◽

10.3390/jof7050337 ◽

2021 ◽

Vol 7 (5) ◽

pp. 337

Author(s):

Daniel Peterson ◽

Tang Li ◽

Ana M. Calvo ◽

Yanbin Yin

Keyword(s):

Comparative Genomics ◽

Orthologous Gene ◽

Gene Clusters ◽

Phytopathogenic Fungi ◽

Secreted Proteins ◽

Economic Losses ◽

Comparative Genomic ◽

Signal Peptides ◽

Comparative Genomics Analysis

Phytopathogenic Ascomycota are responsible for substantial economic losses each year, destroying valuable crops. The present study aims to provide new insights into phytopathogenicity in Ascomycota from a comparative genomic perspective. This has been achieved by categorizing orthologous gene groups (orthogroups) from 68 phytopathogenic and 24 non-phytopathogenic Ascomycota genomes into three classes: Core, (pathogen or non-pathogen) group-specific, and genome-specific accessory orthogroups. We found that (i) ~20% orthogroups are group-specific and accessory in the 92 Ascomycota genomes, (ii) phytopathogenicity is not phylogenetically determined, (iii) group-specific orthogroups have more enriched functional terms than accessory orthogroups and this trend is particularly evident in phytopathogenic fungi, (iv) secreted proteins with signal peptides and horizontal gene transfers (HGTs) are the two functional terms that show the highest occurrence and significance in group-specific orthogroups, (v) a number of other functional terms are also identified to have higher significance and occurrence in group-specific orthogroups. Overall, our comparative genomics analysis determined positive enrichment existing between orthogroup classes and revealed a prediction of what genomic characteristics make an Ascomycete phytopathogenic. We conclude that genes shared by multiple phytopathogenic genomes are more important for phytopathogenicity than those that are unique in each genome.

Download Full-text

Verification of CRISPR editing and finding transgenic inserts by Xdrop™ Indirect sequence capture followed by short- and long- read sequencing

Methods ◽

10.1016/j.ymeth.2021.02.003 ◽

2021 ◽

Author(s):

Blondal Thorarinn ◽

Gamba Cristina ◽

Jagd Lea Møller ◽

Su Ling ◽

Demirov Dimiter ◽

...

Keyword(s):

Sequence Capture ◽

Long Read

Download Full-text

Genome Reduction and Secondary Metabolism of the Marine Sponge-Associated Cyanobacterium Leptothoe

Marine Drugs ◽

10.3390/md19060298 ◽

2021 ◽

Vol 19 (6) ◽

pp. 298

Author(s):

Despoina Konstantinou ◽

Rafael V. Popin ◽

David P. Fewer ◽

Kaarina Sivonen ◽

Spyros Gkelis

Keyword(s):

Natural Products ◽

Microbial Communities ◽

Genomic Analysis ◽

Gene Clusters ◽

Marine Sponges ◽

Genome Reduction ◽

Extracellular Polysaccharides ◽

Comparative Genomic ◽

Symbiotic Relationships ◽

Genes Encoding

Sponges form symbiotic relationships with diverse and abundant microbial communities. Cyanobacteria are among the most important members of the microbial communities that are associated with sponges. Here, we performed a genus-wide comparative genomic analysis of the newly described marine benthic cyanobacterial genus Leptothoe (Synechococcales). We obtained draft genomes from Le. kymatousa TAU-MAC 1615 and Le. spongobia TAU-MAC 1115, isolated from marine sponges. We identified five additional Leptothoe genomes, host-associated or free-living, using a phylogenomic approach, and the comparison of all genomes showed that the sponge-associated strains display features of a symbiotic lifestyle. Le. kymatousa and Le. spongobia have undergone genome reduction; they harbored considerably fewer genes encoding for (i) cofactors, vitamins, prosthetic groups, pigments, proteins, and amino acid biosynthesis; (ii) DNA repair; (iii) antioxidant enzymes; and (iv) biosynthesis of capsular and extracellular polysaccharides. They have also lost several genes related to chemotaxis and motility. Eukaryotic-like proteins, such as ankyrin repeats, playing important roles in sponge-symbiont interactions, were identified in sponge-associated Leptothoe genomes. The sponge-associated Leptothoe stains harbored biosynthetic gene clusters encoding novel natural products despite genome reduction. Comparisons of the biosynthetic capacities of Leptothoe with chemically rich cyanobacteria revealed that Leptothoe is another promising marine cyanobacterium for the biosynthesis of novel natural products.

Download Full-text

Genomic Characteristics and Comparative Genomics Analysis of Two Chinese Corynespora cassiicola Strains Causing Corynespora Leaf Fall (CLF) Disease

Journal of Fungi ◽

10.3390/jof7060485 ◽

2021 ◽

Vol 7 (6) ◽

pp. 485

Author(s):

Boxun Li ◽

Yang Yang ◽

Jimiao Cai ◽

Xianbao Liu ◽

Tao Shi ◽

...

Keyword(s):

Comparative Genomics ◽

Single Molecule ◽

Rubber Tree ◽

Gene Families ◽

Gene Clusters ◽

The Philippines ◽

Tree Plantations ◽

Comparative Genomic ◽

Corynespora Cassiicola ◽

Leaf Fall

Rubber tree Corynespora leaf fall (CLF) disease, caused by the fungus Corynespora cassiicola, is one of the most damaging diseases in rubber tree plantations in Asia and Africa, and this disease also threatens rubber nurseries and young rubber plantations in China. C. cassiicola isolates display high genetic diversity, and virulence profiles vary significantly depending on cultivar. Although one phytotoxin (cassicolin) has been identified, it cannot fully explain the diversity in pathogenicity between C. cassiicola species, and some virulent C. cassiicola strains do not contain the cassiicolin gene. In the present study, we report high-quality gapless genome sequences, obtained using short-read sequencing and single-molecule long-read sequencing, of two Chinese C. cassiicola virulent strains. Comparative genomics of gene families in these two stains and a virulent CPP strain from the Philippines showed that all three strains experienced different selective pressures, and metabolism-related gene families vary between the strains. Secreted protein analysis indicated that the quantities of secreted cell wall-degrading enzymes were correlated with pathogenesis, and the most aggressive CCP strain (cassiicolin toxin type 1) encoded 27.34% and 39.74% more secreted carbohydrate-active enzymes (CAZymes) than Chinese strains YN49 and CC01, respectively, both of which can only infect rubber tree saplings. The results of antiSMASH analysis showed that all three strains encode ~60 secondary metabolite biosynthesis gene clusters (SM BGCs). Phylogenomic and domain structure analyses of core synthesis genes, together with synteny analysis of polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) gene clusters, revealed diversity in the distribution of SM BGCs between strains, as well as SM polymorphisms, which may play an important role in pathogenic progress. The results expand our understanding of the C. cassiicola genome. Further comparative genomic analysis indicates that secreted CAZymes and SMs may influence pathogenicity in rubber tree plantations. The findings facilitate future exploration of the molecular pathogenic mechanism of C. cassiicola.

Download Full-text

Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing

The ISME Journal ◽

10.1038/s41396-021-01052-3 ◽

2021 ◽

Author(s):

Valentin Waschulin ◽

Chiara Borsetto ◽

Robert James ◽

Kevin K. Newsham ◽

Stefano Donadio ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Full Length ◽

Metagenomic Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Rich Diversity ◽

Long Read ◽

The Rich

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.

Download Full-text

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Nature Communications ◽

10.1038/s41467-021-23143-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Mathys Grapotte ◽

Manu Saraswat ◽

Chloé Bessière ◽

Christophe Menichelli ◽

Jordan A. Ramilowski ◽

...

Keyword(s):

Transcription Initiation ◽

Tandem Repeats ◽

Specific Gene ◽

Rna Seq ◽

Transcription Start Sites ◽

Long Read ◽

Cap Analysis ◽

Dna Tandem Repeats ◽

Short Tandem ◽

Str Polymorphism

AbstractUsing the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.

Download Full-text

The B-type lamin is required for somatic repression of testis-specific gene clusters

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.0811933106 ◽

2009 ◽

Vol 106 (9) ◽

pp. 3282-3287 ◽

Cited By ~ 87

Author(s):

Y. Y. Shevelyov ◽

S. A. Lavrov ◽

L. M. Mikhaylova ◽

I. D. Nurminsky ◽

R. J. Kulathinal ◽

...

Keyword(s):

Gene Clusters ◽

Specific Gene

Download Full-text

Use of targeted sequence capture and high-throughput sequencing identifies a novel PKD1 mutation involved in adult polycystic kidney disease

Gene ◽

10.1016/j.gene.2017.08.040 ◽

2017 ◽

Vol 634 ◽

pp. 1-4 ◽

Cited By ~ 1

Author(s):

Yan-Kun Sha ◽

Yan-Wei Sha ◽

Li-Bin Mei ◽

Xian-Jing Huang ◽

Xu Wang ◽

...

Keyword(s):

Kidney Disease ◽

Polycystic Kidney Disease ◽

High Throughput ◽

High Throughput Sequencing ◽

Polycystic Kidney ◽

Adult Polycystic Kidney Disease ◽

Sequence Capture ◽

Targeted Sequence Capture

Download Full-text

Native molecule sequencing by nano-ID reveals synthesis and stability of RNA isoforms

10.1101/601856 ◽

2019 ◽

Cited By ~ 3

Author(s):

Kerstin C. Maier ◽

Saskia Gressel ◽

Patrick Cramer ◽

Björn Schwalb

Keyword(s):

Rna Stability ◽

Tail Length ◽

Rna Metabolism ◽

Synthesis Rate ◽

Specific Gene ◽

Nanopore Sequencing ◽

Rna Molecules ◽

Eukaryotic Genes ◽

Rna Labeling ◽

Long Read

AbstractEukaryotic genes often generate a variety of RNA isoforms that can lead to functionally distinct protein variants. The synthesis and stability of RNA isoforms is however poorly characterized. The reason for this is that current methods to quantify RNA metabolism use ‘short-read’ sequencing that cannot detect RNA isoforms. Here we present nanopore sequencing-based Isoform Dynamics (nano-ID), a method that detects newly synthesized RNA isoforms and monitors isoform metabolism. nano-ID combines metabolic RNA labeling, ‘long-read’ nanopore sequencing of native RNA molecules and machine learning. Application of nano-ID to the heat shock response in human cells reveals that many RNA isoforms change their synthesis rate, stability, and splicing pattern. nano-ID also shows that the metabolism of individual RNA isoforms differs strongly from that estimated for the combined RNA signal at a specific gene locus. And although combined RNA stability correlates with poly(A)-tail length, individual RNA isoforms can deviate significantly. nano-ID enables studies of RNA metabolism on the level of single RNA molecules and isoforms in different cell states and conditions.

Download Full-text