Application of genotyping-by-sequencing data on inferring the phylogeny of Curcuma (Zingiberaceae) from China

Abstract Background: Genotyping-by-sequencing (GBS), as one of the next generation sequences, has been applied to large scale genotyping in plants, which is poor in morphological differentiation and low in genetic divergence among different species. Curcuma is a significantly medicinal and edible genus. Improvement efforts of phylogenetic relationships and disentangling species are still a challenge due to poor morphology and lack in a reference genome. Result: A high-throughput genomic sequence data which was obtained through GBS protocols was used to investigate the relationships among 8 species with 60 total samples of Curcuma. Through the use of the ipyrad software, 437,061 loci and 997,988 filtered SNPs without reliance upon a reference genome were produced. After quality control (QC) of the filtered SNPs, 1,295 high-quality SNPs were used to clarify the phylogenetic relationships among Curcuma species. Based on these data, a supermatrix approach was used to speculate the phylogeny, and the phylogenetic trees and the relationships were inferred . Conclusions: Varying degrees of support can be explained, as well as the diversification events for Chinese Curcuma. The diversification events showed that the third intense uplift of Qinghai–Tibet Plateau (QTP) and formation of the Hengduan Mountains may speed up Curcuma interspecific divergence in China. The PCA suggested the same topology of the phylogenetic tree. The genetic structure analysis revealed that extensive hybridization may exist in Chinese Curcuma. Additionally, the GBS will be a promising approach for the phylogenetic and systematic study in the future.

Download Full-text

Phylogenomics of orchids and their mycorrhizal fungi : trees, diversity, and the pursuit of symbiosis

10.32469/10355/72205 ◽

2019 ◽

Author(s):

◽

Sarah Unruh

Keyword(s):

Mycorrhizal Fungi ◽

Phylogenetic Trees ◽

High Throughput Sequencing ◽

Genomic Sequence ◽

Sequence Data ◽

Mycorrhizal Symbiosis ◽

Sequencing Data ◽

Phylogenetic Structure ◽

University Of Missouri ◽

Fungal Symbiosis

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Phylogenetic trees show us how organisms are related and provide frameworks for studying and testing evolutionary hypotheses. To better understand the evolution of orchids and their mycorrhizal fungi, I used high-throughput sequencing data and bioinformatic analyses, to build phylogenetic hypotheses. In Chapter 2, I used transcriptome sequences to both build a phylogeny of the slipper orchid genera and to confirm the placement of a polyploidy event at the base of the orchid family. Polyploidy is hypothesized to be a strong driver of evolution and a source of unique traits so confirming this event leads us closer to explaining extant orchid diversity. The list of orthologous genes generated from this study will provide a less expensive and more powerful method for researchers examining the evolutionary relationships in Orchidaceae. In Chapter 3, I generated genomic sequence data for 32 fungal isolates that were collected from orchids across North America. I inferred the first multi-locus nuclear phylogenetic tree for these fungal clades. The phylogenetic structure of these fungi will improve the taxonomy of these clades by providing evidence for new species and for revising problematic species designations. A robust taxonomy is necessary for studying the role of fungi in the orchid mycorrhizal symbiosis. In chapter 4 I summarize my work and outline the future directions of my lab at Illinois College including addressing the remaining aims of my Community Sequencing Proposal with the Joint Genome Institute by analyzing the 15 fungal reference genomes I generated during my PhD. Together these chapters are the start of a life-long research project into the evolution and function of the orchid/fungal symbiosis.

Download Full-text

A target enrichment probe set for resolving the flagellate plant tree of life

10.1101/2020.05.29.124081 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jesse W. Breinholt ◽

Sarah B. Carey ◽

George P. Tiley ◽

E. Christine Davis ◽

Lorena Endara ◽

...

Keyword(s):

Phylogenetic Relationships ◽

Phylogenetic Trees ◽

Large Scale ◽

Sequence Data ◽

Low Cost ◽

Tree Of Life ◽

Target Enrichment ◽

Sequencing Technologies ◽

Flanking Regions ◽

Probe Set

ABSTRACTPremise of the studyNew sequencing technologies enable the possibility of generating large-scale molecular datasets for constructing the plant tree of life. We describe a new probe set for target enrichment sequencing to generate nuclear sequence data to build phylogenetic trees with any flagellate plants, comprising hornworts, liverworts, mosses, lycophytes, ferns, and gymnosperms.Methods and ResultsWe leveraged existing transcriptome and genome sequence data to design a set of 56,989 probes for target enrichment sequencing of 451 nuclear exons and non-coding flanking regions across flagellate plant lineages. We describe the performance of target enrichment using the probe set across flagellate plants and demonstrate the potential of the data to resolve relationships among both ancient and closely related taxa.ConclusionsA target enrichment approach using the new probe set provides a relatively low-cost solution to obtain large-scale nuclear sequence data for inferring phylogenetic relationships across flagellate plants.

Download Full-text

EdClust: A heuristic sequence clustering method with higher sensitivity

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720021500360 ◽

2021 ◽

Author(s):

Ming Cao ◽

Qinke Peng ◽

Ze-Gang Wei ◽

Fei Liu ◽

Yi-Fan Hou

Keyword(s):

Large Scale ◽

Sequence Data ◽

Clustering Algorithms ◽

Clustering Methods ◽

Sequencing Data ◽

Clustering Method ◽

Cluster Number ◽

Sequence Clustering ◽

Downstream Analysis ◽

Heuristic Clustering

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.

Download Full-text

Large‐scale genomic sequence data resolve the deepest divergences in the legume phylogeny and support a near‐simultaneous evolutionary origin of all six subfamilies

New Phytologist ◽

10.1111/nph.16290 ◽

2019 ◽

Vol 225 (3) ◽

pp. 1355-1369 ◽

Cited By ~ 12

Author(s):

Erik J. M. Koenen ◽

Dario I. Ojeda ◽

Royce Steeves ◽

Jérémy Migliore ◽

Freek T. Bakker ◽

...

Keyword(s):

Large Scale ◽

Genomic Sequence ◽

Sequence Data ◽

Evolutionary Origin

Download Full-text

GIbPSs: a toolkit for fast and accurate analyses of genotyping-by-sequencing data without a reference genome

Molecular Ecology Resources ◽

10.1111/1755-0998.12510 ◽

2016 ◽

Vol 16 (4) ◽

pp. 979-990 ◽

Cited By ~ 12

Author(s):

A. Hapke ◽

D. Thiele

Keyword(s):

Reference Genome ◽

Genotyping By Sequencing ◽

Sequencing Data

Download Full-text

gplas: a comprehensive tool for plasmid analysis using short-read graphs

Bioinformatics ◽

10.1093/bioinformatics/btaa233 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3874-3876 ◽

Cited By ~ 1

Author(s):

Sergio Arredondo-Alonso ◽

Martin Bootsma ◽

Yaïr Hein ◽

Malbert R C Rogers ◽

Jukka Corander ◽

...

Keyword(s):

Large Scale ◽

Sequence Data ◽

Bacterial Genome ◽

Workflow Management ◽

Supplementary Information ◽

Whole Genome Sequencing Data ◽

Network Partitioning ◽

Sequencing Data ◽

Genetic Traits ◽

Short Read

Abstract Summary Plasmids can horizontally transmit genetic traits, enabling rapid bacterial adaptation to new environments and hosts. Short-read whole-genome sequencing data are often applied to large-scale bacterial comparative genomics projects but the reconstruction of plasmids from these data is facing severe limitations, such as the inability to distinguish plasmids from each other in a bacterial genome. We developed gplas, a new approach to reliably separate plasmid contigs into discrete components using sequence composition, coverage, assembly graph information and network partitioning based on a pruned network of plasmid unitigs. Gplas facilitates the analysis of large numbers of bacterial isolates and allows a detailed analysis of plasmid epidemiology based solely on short-read sequence data. Availability and implementation Gplas is written in R, Bash and uses a Snakemake pipeline as a workflow management system. Gplas is available under the GNU General Public License v3.0 at https://gitlab.com/sirarredondo/gplas.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Coalescent-Based Analyses of Genomic Sequence Data Provide a Robust Resolution of Phylogenetic Relationships among Major Groups of Gibbons

Molecular Biology and Evolution ◽

10.1093/molbev/msx277 ◽

2017 ◽

Vol 35 (1) ◽

pp. 159-179 ◽

Cited By ~ 27

Author(s):

Cheng-Min Shi ◽

Ziheng Yang

Keyword(s):

Phylogenetic Relationships ◽

Genomic Sequence ◽

Sequence Data

Download Full-text

Agro-morphological, yield, and genotyping-by-sequencing data of selected wheat germplasm

10.1101/2020.07.18.209882 ◽

2020 ◽

Author(s):

Madiha Islam ◽

Abdullah ◽

Bibi Zubaida ◽

Nosheen Shafqat ◽

Rabia Masood ◽

...

Keyword(s):

Triticum Aestivum ◽

Sequence Data ◽

Genotyping By Sequencing ◽

Wheat Breeding ◽

Sequencing Data ◽

Illumina Hiseq ◽

Yield Data ◽

Single Nucleotide ◽

Breeding Programs ◽

Short Reads

AbstractWheat (Triticum aestivum) is the most important staple food in Pakistan. Knowledge of its genetic diversity is critical for designing effective crop breeding programs. Here we report agro-morphological and yield data for 112 genotypes (including 7 duplicates) of wheat (Triticum aestivum) cultivars, advance lines, landraces and wild relatives, collected from several research institutes and breeders across Pakistan. We also report genotyping-by-sequencing (GBS) data for a selected sub-set of 52 genotypes. Sequencing was performed using Illumina HiSeq 2500 platform using the PE150 run. Data generated per sample ranged from 1.01 to 2.5 Gb; 90% of the short reads exhibited quality scores above 99.9%. TGACv1 wheat genome was used as a reference to map short reads from individual genotypes and to filter single nucleotide polymorphic loci (SNPs). On average, 364,074±54479 SNPs per genotype were recorded. The sequencing data has been submitted to the SRA database of NCBI (accession number SRP179096). The agro-morphological and yield data, along with the sequence data and SNPs will be invaluable resources for wheat breeding programs in future.

Download Full-text

Genotyping-by-sequencing enables linkage mapping in three octoploid cultivated strawberry families

10.7287/peerj.preprints.2975v1 ◽

2017 ◽

Author(s):

Kelly J Vining ◽

Natalia Salinas ◽

Jacob A Tennessen ◽

Jason D Zurn ◽

Daniel James Sargent ◽

...

Keyword(s):

Reference Genome ◽

Sequence Data ◽

Genotyping By Sequencing ◽

Nucleotide Polymorphisms ◽

Linkage Groups ◽

Single Nucleotide ◽

Ancestral Species ◽

Polymorphic Snps ◽

Genome Wide ◽

Diploid Ancestor

With the goal of evaluating genotyping-by-sequencing (GBS) in a species with a complex octoploid genome, GBS was used to survey genome-wide single-nucleotide polymorphisms (SNPs) in three biparental strawberry (Fragaria ×ananassa) populations. GBS sequence data were aligned to the F. vesca ‘Fvb’ reference genome in order to call SNPs. Numbers of polymorphic SNPs per population ranged from 1,163 to 3,190. Linkage maps consisting of 30-65 linkage groups were produced from the SNP sets derived from each parent. The linkage groups covered 99% of the Fvb reference genome, with three to seven linkage groups from a given parent aligned to any particular chromosome. A phylogenetic analysis performed using the POLiMAPS pipeline revealed linkage groups that were most similar to ancestral species F. vesca for each chromosome. Linkage groups that were most similar to a second ancestral species, F. iinumae, were only resolved for Fvb 4. The quantity of missing data and heterogeneity in genome coverage inherent in GBS complicated the analysis, but POLiMAPS resolved F. ×ananassa chromosomal regions derived from diploid ancestor F. vesca.

Download Full-text

Searching more genomic sequence with less memory for fast and accurate metagenomic profiling

10.1101/036681 ◽

2016 ◽

Author(s):

Shea N Gardner ◽

Sasha K Ames ◽

Maya B Gokhale ◽

Tom R Slezak ◽

Jonathan Allen

Keyword(s):

Large Scale ◽

Genomic Sequence ◽

Sequence Data ◽

Low Cost ◽

False Negative ◽

Human Microbiome ◽

Human Microbiome Project ◽

Metagenomic Data ◽

Reference Database ◽

Metagenomic Sequence

Software for rapid, accurate, and comprehensive microbial profiling of metagenomic sequence data on a desktop will play an important role in large scale clinical use of metagenomic data. Here we describe LMAT-ML (Livermore Metagenomics Analysis Toolkit-Marker Library) which can be run with 24 GB of DRAM memory, an amount available on many clusters, or with 16 GB DRAM plus a 24 GB low cost commodity flash drive (NVRAM), a cost effective alternative for desktop or laptop users. We compared results from LMAT with five other rapid, low-memory tools for metagenome analysis for 131 Human Microbiome Project samples, and assessed discordant calls with BLAST. All the tools except LMAT-ML reported overly specific or incorrect species and strain resolution of reads that were in fact much more widely conserved across species, genera, and even families. Several of the tools misclassified reads from synthetic or vector sequence as microbial or human reads as viral. We attribute the high numbers of false positive and false negative calls to a limited reference database with inadequate representation of known diversity. Our comparisons with real world samples show that LMAT-ML is the only tool tested that classifies the majority of reads, and does so with high accuracy.

Download Full-text