scholarly journals Genome sequences of horticultural plants: past, present, and future

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Fei Chen ◽  
Yunfeng Song ◽  
Xiaojiang Li ◽  
Junhao Chen ◽  
Lan Mo ◽  
...  

Abstract Horticultural plants play various and critical roles for humans by providing fruits, vegetables, materials for beverages, and herbal medicines and by acting as ornamentals. They have also shaped human art, culture, and environments and thereby have influenced the lifestyles of humans. With the advent of sequencing technologies, there has been a dramatic increase in the number of sequenced genomes of horticultural plant species in the past decade. The genomes of horticultural plants are highly diverse and complex, often with a high degree of heterozygosity and a high ploidy due to their long and complex history of evolution and domestication. Here we summarize the advances in the genome sequencing of horticultural plants, the reconstruction of pan-genomes, and the development of horticultural genome databases. We also discuss past, present, and future studies related to genome sequencing, data storage, data quality, data sharing, and data visualization to provide practical guidance for genomic studies of horticultural plants. Finally, we propose a horticultural plant genome project as well as the roadmap and technical details toward three goals of the project.

GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Valentine Murigneux ◽  
Subash Kumar Rai ◽  
Agnelo Furtado ◽  
Timothy J C Bruxner ◽  
Wei Tian ◽  
...  

Abstract Background Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. Results Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. Conclusions The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.


Author(s):  
Valentine Murigneux ◽  
Subash Kumar Rai ◽  
Agnelo Furtado ◽  
Timothy J.C. Bruxner ◽  
Wei Tian ◽  
...  

AbstractSequencing technologies have advanced to the point where it is possible to generate high accuracy, haplotype resolved, chromosome scale assemblies. Several long read sequencing technologies are available on the market and a growing number of algorithms have been developed over the last years to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology as well as the most appropriate software for assembly and polishing. For this reason, it is important to benchmark different approaches applied to the same sample. Here, we report a comparison of three long read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION) and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of PacBio and Nanopore reads. Results obtained from combining long read technologies or short read and long read technologies are also presented. The assemblies were compared for contiguity, accuracy and completeness as well as sequencing costs and DNA material requirements. Overall, the three long read technologies produced highly contiguous and complete genome assemblies of Macadamia jansenii. At the time of sequencing, the cost associated with each method was significantly different but continuous improvements in technologies have resulted in greater accuracy, increased throughput and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.


Author(s):  
Giulio Caravagna

AbstractCancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.


2016 ◽  
Vol 14 (1) ◽  
pp. 1-13
Author(s):  
Lê Thị Thu Hiền ◽  
Hugo De Boer ◽  
Vincent Manzanilla ◽  
Hà Văn Huân ◽  
Nông Văn Hải

Advances in genome sequencing technologies have created a new genomic era of life sciences research worldwide in which a number of modern and sophisticated techniques and tools have been developed and employed. Many countries have invested in plant genome sequencing as part of a sustainable development strategy. Each year, the number of plant genomes and transcriptomes sequenced has increased. The results obtained offer opportunities for fundamental and applied research, provide valuable data for identification of genes or molecular markers linked to traits that are important for selection, cultivation, and/or production. In Vietnam, partial or complete genome sequencing of crops has been recently conducted, primarily as part of international collaborative projects. The genus Panax L. (Araliaceae family) is comprised of several species of commercial value with narrow distributions such as P. bipinnatifidus Seem., P. stipuleanatus H.T.Tsai & K.M.Feng, and Panax vietnamensis Ha et Grushv. Despite their very important roles in traditional medicine, understanding of their genetic characteristics is still limited. Molecular studies on the genus have, so far, only evaluated limited markers for phylogenetic analysis. Therefore, genome sequencing of these important herbal plants is needed to understand their genetic characteristics, their evolutionary history and the genes and biochemical pathways contributing to medicinally important metabolites. This review summarizes all related genome sequencing technologies including the most recent advances in the last decade and their applications in genome and transcriptome sequencing of plants in general and in the genus Panax L. in particular.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i186-i193
Author(s):  
Matthew A Myers ◽  
Simone Zaccaria ◽  
Benjamin J Raphael

Abstract Motivation Recent single-cell DNA sequencing technologies enable whole-genome sequencing of hundreds to thousands of individual cells. However, these technologies have ultra-low sequencing coverage (<0.5× per cell) which has limited their use to the analysis of large copy-number aberrations (CNAs) in individual cells. While CNAs are useful markers in cancer studies, single-nucleotide mutations are equally important, both in cancer studies and in other applications. However, ultra-low coverage sequencing yields single-nucleotide mutation data that are too sparse for current single-cell analysis methods. Results We introduce SBMClone, a method to infer clusters of cells, or clones, that share groups of somatic single-nucleotide mutations. SBMClone uses a stochastic block model to overcome sparsity in ultra-low coverage single-cell sequencing data, and we show that SBMClone accurately infers the true clonal composition on simulated datasets with coverage at low as 0.2×. We applied SBMClone to single-cell whole-genome sequencing data from two breast cancer patients obtained using two different sequencing technologies. On the first patient, sequenced using the 10X Genomics CNV solution with sequencing coverage ≈0.03×, SBMClone recovers the major clonal composition when incorporating a small amount of additional information. On the second patient, where pre- and post-treatment tumor samples were sequenced using DOP-PCR with sequencing coverage ≈0.5×, SBMClone shows that tumor cells are present in the post-treatment sample, contrary to published analysis of this dataset. Availability and implementation SBMClone is available on the GitHub repository https://github.com/raphael-group/SBMClone. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 23 (1) ◽  
pp. 38-48 ◽  
Author(s):  
M. K. Bragina ◽  
D. A. Afonnikov ◽  
E. A. Salina

Since the first plant genome of Arabidopsis thaliana has been sequenced and published, genome sequencing technologies have undergone significant changes. New algorithms, sequencing technologies and bioinformatic approaches were adopted to obtain genome, transcriptome and exome sequences for model and crop species, which have permitted deep inferences into plant biology. As a result of an improved genome assembly and analysis methods, genome sequencing costs plummeted and the number of high-quality plant genome sequences is constantly growing. Consequently, more than 300 plant genome sequences have been published over the past twenty years. Although many of the published genomes are considered incomplete, they proved to be a valuable tool for identifying genes involved in the formation of economically valuable plant traits, for marker-assisted and genomic selection and for comparative analysis of plant genomes in order to determine the basic patterns of origin of various plant species. Since a high coverage and resolution of a genome sequence is not enough to detect all changes in complex samples, targeted sequencing, which consists in the isolation and sequencing of a specific region of the genome, has begun to develop. Targeted sequencing has a higher detection power (the ability to identify new differences/variants) and resolution (up to one basis). In addition, exome sequencing (the method of sequencing only protein-coding genes regions) is actively developed, which allows for the sequencing of non-expressed alleles and genes that cannot be found with RNA-seq. In this review, an analysis of sequencing technologies development and the construction of “reference” genomes of plants is performed. A comparison of the methods of targeted sequencing based on the use of the reference DNA sequence is accomplished.


2015 ◽  
Author(s):  
Justin M Zook ◽  
David Catoe ◽  
Jennifer McDaniel ◽  
Lindsay Vang ◽  
Noah Spies ◽  
...  

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCodeTM WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.


Rice ◽  
2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Veronica Roman-Reyna ◽  
Dale Pinili ◽  
Frances N. Borja ◽  
Ian L. Quibod ◽  
Simon C. Groen ◽  
...  

Abstract Background The crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3024 rice accessions. Findings By using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial “hubs” and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities. Conclusion We showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts.


2020 ◽  
Author(s):  
Veronica Roman-Reyna ◽  
Dale Pinili ◽  
Frances Nikki Borja ◽  
Ian Lorenzo Quibod ◽  
Simon C. Groen ◽  
...  

Abstract Background: The crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3,024 rice accessions. Findings: By using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial “hubs” and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities. Conclusion: We showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts.


2017 ◽  
Author(s):  
Mark J.P. Chaisson ◽  
Ashley D. Sanders ◽  
Xuefang Zhao ◽  
Ankit Malhotra ◽  
David Porubsky ◽  
...  

ABSTRACTThe incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.


Sign in / Sign up

Export Citation Format

Share Document