De Novo Assembly and Species-Specific Marker Development as a Useful Tool for the Identification of Scutellaria L. Species

Scutellaria L. (family Lamiaceae) includes approximately 470 species found in most parts of the world and is commonly known as skullcaps. Scutellaria L. is a medicinal herb used as a folk remedy in Korea and East Asia, but it is difficult to identify and classify various subspecies by morphological methods. Since Scutellaria L. has not been studied genetically, to expand the knowledge of species in the genus Scutellaria L., de novo whole-genome assembly was performed in Scutellaria indica var. tsusimensis (H. Hara) Ohwi using the Illumina sequencing platform. We aimed to develop a molecular method that could be used to classify S.indica var. tsusimensis (H. Hara) Ohwi, S. indica L. and three other Scutellaria L. species. The assembly results for S.indica var. tsusimensis (H. Hara) Ohwi revealed a genome size of 318,741,328 bp and a scaffold N50 of 78,430. The assembly contained 92.08% of the conserved BUSCO core gene set and was estimated to cover 94.65% of the genome. The obtained genes were compared with previously registered Scutellaria nucleotide sequences and similar regions using the NCBI BLAST service, and a total of 279 similar nucleotide sequences were detected. By selecting the 279 similar nucleotide sequences and nine chloroplast DNA barcode genes, primers were prepared so that the size of the PCR product was 100 to 1000 bp. As a result, a species-specific primer set capable of distinguishing five species of Scutellaria L. was developed.

Download Full-text

Species-Specific Marker Discovery in Tilapia

Scientific Reports ◽

10.1038/s41598-019-48339-2 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 9

Author(s):

Mochamad Syaifudin ◽

Michaël Bekaert ◽

John B. Taggart ◽

Kerry L. Bartie ◽

Stefanie Wehner ◽

...

Keyword(s):

Phylogenetic Trees ◽

Restriction Site ◽

De Novo ◽

Snp Markers ◽

Specific Marker ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Species Analysis ◽

Species Specific ◽

Marker Discovery

Abstract Tilapias (family Cichlidae) are of importance in aquaculture and fisheries. Hybridisation and introgression are common within tilapia genera but are difficult to analyse due to limited numbers of species-specific genetic markers. We tested the potential of double digested restriction-site associated DNA (ddRAD) sequencing for discovering single nucleotide polymorphism (SNP) markers to distinguish between 10 tilapia species. Analysis of ddRAD data revealed 1,371 shared SNPs in the de novo-based analysis and 1,204 SNPs in the reference-based analysis. Phylogenetic trees based on these two analyses were very similar. A total of 57 species-specific SNP markers were found among the samples analysed of the 10 tilapia species. Another set of 62 species-specific SNP markers was identified from a subset of four species which have often been involved in hybridisation in aquaculture: 13 for Oreochromis niloticus, 23 for O. aureus, 12 for O. mossambicus and 14 for O. u. hornorum. A panel of 24 SNPs was selected to distinguish among these four species and validated using 91 individuals. Larger numbers of SNP markers were found that could distinguish between the pairs of species within this subset. This technique offers potential for the investigation of hybridisation and introgression among tilapia species in aquaculture and in wild populations.

Download Full-text

DNA barcode based species-specific marker for Ocimum tenuiflorum and its applicability in quantification of adulteration in herbal formulations using qPCR

Journal of Herbal Medicine ◽

10.1016/j.hermed.2020.100376 ◽

2020 ◽

Vol 23 ◽

pp. 100376

Author(s):

Amit Kumar ◽

Vereena Rodrigues ◽

Kuppusamy Baskaran ◽

Ashutosh K. Shukla ◽

Velusamy Sundaresan

Keyword(s):

Dna Barcode ◽

Specific Marker ◽

Ocimum Tenuiflorum ◽

Species Specific ◽

Herbal Formulations

Download Full-text

The sequence and de novo assembly of Oxygymnocypris stewartii genome

Scientific Data ◽

10.1038/sdata.2019.9 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 11

Author(s):

Hai-Ping Liu ◽

Shi-Jun Xiao ◽

Nan Wu ◽

Di Wang ◽

Yan-Chao Liu ◽

...

Keyword(s):

Tibetan Plateau ◽

De Novo ◽

The Tibetan Plateau ◽

Sequencing Data ◽

Protein Coding ◽

Sequencing Platform ◽

A Genome ◽

Altitude Adaptation ◽

Oxygymnocypris Stewartii ◽

Local Ecology

Abstract Animal genomes in the Qinghai-Tibetan Plateau provide valuable resources for scientists to understand the molecular mechanism of environmental adaptation. Tibetan fish species play essential roles in the local ecology; however, the genomic information for native fishes was still insufficient. Oxygymnocypris stewartii, belonging to Oxygymnocypris genus, Schizothoracinae subfamily, is a native fish in the Tibetan plateau living within the elevation from roughly 3,000 m to 4,200 m. In this report, PacBio and Illumina sequencing platform were used to generate ~385.3 Gb genomic sequencing data. A genome of about 1,849.2 Mb was obtained with a contig N50 length of 257.1 kb. More than 44.5% of the genome were identified as repetitive elements, and 46,400 protein-coding genes were annotated in the genome. The assembled genome can be used as a reference for future population genetic studies of O. stewartii and will improve our understanding of high altitude adaptation of fishes in the Qinghai-Tibetan Plateau.

Download Full-text

Hybrid de novo genome-reassembly reveals new insights on pathways and pathogenicity determinants in rice blast pathogen Magnaporthe oryzae RMg_Dl

Scientific Reports ◽

10.1038/s41598-021-01980-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Bhaskar Reddy ◽

Aundy Kumar ◽

Sahil Mehta ◽

Neelam Sheoran ◽

Viswanathan Chinnusamy ◽

...

Keyword(s):

Magnaporthe Oryzae ◽

De Novo ◽

Molecular Taxonomy ◽

Pectate Lyase ◽

Mapk Signaling ◽

Blast Disease ◽

Upstream Region ◽

Specific Marker ◽

Cell Wall Degrading Enzymes ◽

A Genome

AbstractBlast disease incited by Magnaporthe oryzae is a major threat to sustain rice production in all rice growing nations. The pathogen is widely distributed in all rice paddies and displays rapid aerial transmissions, and seed-borne latent infection. In order to understand the genetic variability, host specificity, and molecular basis of the pathogenicity-associated traits, the whole genome of rice infecting Magnaporthe oryzae (Strain RMg_Dl) was sequenced using the Illumina and PacBio (RSII compatible) platforms. The high-throughput hybrid assembly of short and long reads resulted in a total of 375 scaffolds with a genome size of 42.43 Mb. Furthermore, comparative genome analysis revealed 99% average nucleotide identity (ANI) with other oryzae genomes and 83% against M. grisea, and 73% against M. poe genomes. The gene calling identified 10,553 genes with 10,539 protein-coding sequences. Among the detected transposable elements, the LTR/Gypsy and Type LINE showed high occurrence. The InterProScan of predicted protein sequences revealed that 97% protein family (PFAM), 98% superfamily, and 95% CDD were shared among RMg_Dl and reference 70-15 genome, respectively. Additionally, 550 CAZymes with high GH family content/distribution and cell wall degrading enzymes (CWDE) such endoglucanase, beta-glucosidase, and pectate lyase were also deciphered in RMg_Dl. The prevalence of virulence factors determination revealed that 51 different VFs were found in the genome. The biochemical pathway such as starch and sucrose metabolism, mTOR signaling, cAMP signaling, MAPK signaling pathways related genes were identified in the genome. The 49,065 SNPs, 3267 insertions and 3611 deletions were detected, and majority of these varinats were located on downstream and upstream region. Taken together, the generated information will be useful to develop a specific marker for diagnosis, pathogen surveillance and tracking, molecular taxonomy, and species delineation which ultimately leads to device improved management strategies for blast disease.

Download Full-text

An Unbiased Predictive Model to Detect DNA Methylation Propensity of CpG Islands in the Human Genome

Current Bioinformatics ◽

10.2174/1574893615999200724145835 ◽

2020 ◽

Vol 15 ◽

Author(s):

Dicle Yalcin ◽

Hasan H. Otu

Keyword(s):

Model Building ◽

De Novo ◽

Cpg Islands ◽

Treatment Strategies ◽

Area Under The Curve ◽

Global Methylation ◽

Sequence Features ◽

A Genome ◽

Combined Features ◽

Epigenetic Repression

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.

Download Full-text

Genomic Characterization Provides an Insight into the Pathogenicity of the Poplar Canker Bacterium Lonsdalea populi

Genes ◽

10.3390/genes12020246 ◽

2021 ◽

Vol 12 (2) ◽

pp. 246

Author(s):

Xiaomeng Chen ◽

Rui Li ◽

Yonglin Wang ◽

Aining Li

Keyword(s):

Genome Sequence ◽

Extracellular Enzymes ◽

De Novo ◽

Whole Genome Sequence ◽

Hybrid Poplars ◽

A Genome ◽

Conserved Genes ◽

Genomic Characterization ◽

Molecular Bases ◽

Insight Into

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.

Download Full-text

In Search of Species-Specific SNPs in a Non-Model Animal (European Bison (Bison bonasus))—Comparison of De Novo and Reference-Based Integrated Pipeline of STACKS Using Genotyping-by-Sequencing (GBS) Data

Animals ◽

10.3390/ani11082226 ◽

2021 ◽

Vol 11 (8) ◽

pp. 2226

Author(s):

Sazia Kunvar ◽

Sylwia Czarnomska ◽

Cino Pertoldi ◽

Małgorzata Tokarska

Keyword(s):

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Model Organism ◽

Genotyping By Sequencing ◽

Model Organisms ◽

European Bison ◽

Model Animal ◽

Pcr Duplicates ◽

Species Specific

The European bison is a non-model organism; thus, most of its genetic and genomic analyses have been performed using cattle-specific resources, such as BovineSNP50 BeadChip or Illumina Bovine 800 K HD Bead Chip. The problem with non-specific tools is the potential loss of evolutionary diversified information (ascertainment bias) and species-specific markers. Here, we have used a genotyping-by-sequencing (GBS) approach for genotyping 256 samples from the European bison population in Bialowieza Forest (Poland) and performed an analysis using two integrated pipelines of the STACKS software: one is de novo (without reference genome) and the other is a reference pipeline (with reference genome). Moreover, we used a reference pipeline with two different genomes, i.e., Bos taurus and European bison. Genotyping by sequencing (GBS) is a useful tool for SNP genotyping in non-model organisms due to its cost effectiveness. Our results support GBS with a reference pipeline without PCR duplicates as a powerful approach for studying the population structure and genotyping data of non-model organisms. We found more polymorphic markers in the reference pipeline in comparison to the de novo pipeline. The decreased number of SNPs from the de novo pipeline could be due to the extremely low level of heterozygosity in European bison. It has been confirmed that all the de novo/Bos taurus and Bos taurus reference pipeline obtained SNPs were unique and not included in 800 K BovineHD BeadChip.

Download Full-text

riboCIRC: a comprehensive database of translatable circRNAs

Genome Biology ◽

10.1186/s13059-021-02300-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Huihui Li ◽

Mingzhe Xie ◽

Yan Wang ◽

Ludong Yang ◽

Zhi Xie ◽

...

Keyword(s):

De Novo ◽

Species Conservation ◽

Structure And Function ◽

Research Community ◽

Genome Browser ◽

Valuable Resource ◽

Sequence Structure ◽

A Genome ◽

Context Specific ◽

And Function

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.

Download Full-text

De Novo Genome Assembly of Limpet Bathyacmaea lactea (Gastropoda: Pectinodontidae): The First Reference Genome of a Deep-Sea Gastropod Endemic to Cold Seeps

Genome Biology and Evolution ◽

10.1093/gbe/evaa100 ◽

2020 ◽

Vol 12 (6) ◽

pp. 905-910 ◽

Cited By ~ 2

Author(s):

Ruoyu Liu ◽

Kun Wang ◽

Jun Liu ◽

Wenjie Xu ◽

Yang Zhou ◽

...

Keyword(s):

Deep Sea ◽

Metal Ion ◽

De Novo ◽

Demographic History ◽

Gene Families ◽

Phylogenetic Position ◽

Cold Seeps ◽

Nitrogen And Phosphorus ◽

De Novo Genome Assembly ◽

A Genome

Abstract Cold seeps, characterized by the methane, hydrogen sulfide, and other hydrocarbon chemicals, foster one of the most widespread chemosynthetic ecosystems in deep sea that are densely populated by specialized benthos. However, scarce genomic resources severely limit our knowledge about the origin and adaptation of life in this unique ecosystem. Here, we present a genome of a deep-sea limpet Bathyacmaea lactea, a common species associated with the dominant mussel beds in cold seeps. We yielded 54.6 gigabases (Gb) of Nanopore reads and 77.9-Gb BGI-seq raw reads, respectively. Assembly harvested a 754.3-Mb genome for B. lactea, with 3,720 contigs and a contig N50 of 1.57 Mb, covering 94.3% of metazoan Benchmarking Universal Single-Copy Orthologs. In total, 23,574 protein-coding genes and 463.4 Mb of repetitive elements were identified. We analyzed the phylogenetic position, substitution rate, demographic history, and TE activity of B. lactea. We also identified 80 expanded gene families and 87 rapidly evolving Gene Ontology categories in the B. lactea genome. Many of these genes were associated with heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism. The high-quality assembly and in-depth characterization suggest the B. lactea genome will serve as an essential resource for understanding the origin and adaptation of life in the cold seeps.

Download Full-text