scholarly journals Genome improvement and genetic map construction for Aethionema arabicum, the first divergent branch in the Brassicaceae family

2019 ◽  
Author(s):  
Thu-Phuong Nguyen ◽  
Cornelia Mühlich ◽  
Setareh Mohammadin ◽  
Erik van den Bergh ◽  
Adrian E. Platts ◽  
...  

AbstractBackgroundThe genus Aethionema is a sister-group to the core-group of the Brassicaceae family that includes Arabidopsis thaliana and the Brassica crops. Thus, Aethionema is phylogenetically well-placed for the investigation and understanding of genome and trait evolution across the family. We aimed to improve the quality of the reference genome draft version of the annual species Aethionema arabicum. Secondly, we constructed the first Ae. arabicum genetic map. The improved reference genome and genetic map enabled the development of each other.ResultsWe started with the initially published genome (version 2.5). PacBio and MinION sequencing together with genetic map v2.5 were incorporated to produce the new reference genome v3.0. The improved genome contains 203 MB of sequence, with approximately 94% of the assembly made up of called bases, assembled into 2,883 scaffolds. The N50 (10.3 MB) represents an 80-fold over the initial genome release. We generated a Recombinant Inbred Line (RIL) population that was derived from two ecotypes: Cyprus and Turkey (the reference genotype. Using a Genotyping by Sequencing (GBS) approach, we generated a high-density genetic map with 749 (v2.5) and then 632 SNPs (v3.0) was generated. The genetic map and reference genome were integrated, thus greatly improving the scaffolding of the reference genome into 11 linkage groups.ConclusionsWe show that long-read sequencing data and genetics are complementary, resulting in an improved genome assembly in Ae. arabicum. They will facilitate comparative genetic mapping work for the Brassicaceae family and are also valuable resources to investigate wide range of life history traits in Aethionema.

2020 ◽  
Author(s):  
Kyle Fletcher ◽  
Lin Zhang ◽  
Juliana Gil ◽  
Rongkui Han ◽  
Keri Cavanaugh ◽  
...  

AbstractBackgroundGenetic maps are an important resource for validation of genome assemblies, trait discovery, and breeding. Next generation sequencing has enabled production of high-density genetic maps constructed with 10,000s of markers. Most current approaches require a genome assembly to identify markers. Our Assembly Free Linkage Analysis Pipeline (AFLAP) removes this requirement by using uniquely segregating k-mers as markers to rapidly construct a genotype table and perform subsequent linkage analysis. This avoids potential biases including preferential read alignment and variant calling.ResultsThe performance of AFLAP was determined in simulations and contrasted to a conventional workflow. We tested AFLAP using 100 F2 individuals of Arabidopsis thaliana, sequenced to low coverage. Genetic maps generated using k-mers contained over 130,000 markers that were concordant with the genomic assembly. The utility of AFLAP was then demonstrated by generating an accurate genetic map using genotyping-by-sequencing data of 235 recombinant inbred lines of Lactuca spp. AFLAP was then applied to 83 F1 individuals of the oomycete Bremia lactucae, sequenced to >5x coverage. The genetic map contained over 90,000 markers ordered in 19 large linkage groups. This genetic map was used to fragment, order, orient, and scaffold the genome, resulting in a much-improved reference assembly.ConclusionsAFLAP can be used to generate high density linkage maps and improve genome assemblies of any organism when a mapping population is available using whole genome sequencing or genotyping-by-sequencing data. Genetic maps produced for B. lactucae were accurately aligned to the genome and guided significant improvements of the reference assembly.


2020 ◽  
Vol 10 (8) ◽  
pp. 2801-2809 ◽  
Author(s):  
Tingting Zhao ◽  
Zhongqu Duan ◽  
Georgi Z. Genchev ◽  
Hui Lu

Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Enhua Xia ◽  
Fangdong Li ◽  
Wei Tong ◽  
Hua Yang ◽  
Songbo Wang ◽  
...  

2020 ◽  
Vol 10 (4) ◽  
pp. 1297-1308 ◽  
Author(s):  
Raju Chaudhary ◽  
Chu Shin Koh ◽  
Sateesh Kagale ◽  
Lily Tang ◽  
Siu Wah Wu ◽  
...  

Camelina sativa (L.) Crantz an oilseed crop of the Brassicaceae family is gaining attention due to its potential as a source of high value oil for food, feed or fuel. The hexaploid domesticated C. sativa has limited genetic diversity, encouraging the exploration of related species for novel allelic variation for traits of interest. The current study utilized genotyping by sequencing to characterize 193 Camelina accessions belonging to seven different species collected primarily from the Ukrainian-Russian region and Eastern Europe. Population analyses among Camelina accessions with a 2n = 40 karyotype identified three subpopulations, two composed of domesticated C. sativa and one of C. microcarpa species. Winter type Camelina lines were identified as admixtures of C. sativa and C. microcarpa. Eighteen genotypes of related C. microcarpa unexpectedly shared only two subgenomes with C. sativa, suggesting a novel or cryptic sub-species of C. microcarpa with 19 haploid chromosomes. One C. microcarpa accession (2n = 26) was found to comprise the first two subgenomes of C. sativa suggesting a tetraploid structure. The defined chromosome series among C. microcarpa germplasm, including the newly designated C. neglecta diploid née C. microcarpa, suggested an evolutionary trajectory for the formation of the C. sativa hexaploid genome and re-defined the underlying subgenome structure of the reference genome.


2020 ◽  
Author(s):  
Bo Wang ◽  
Houlin Yu ◽  
Yanyan Jia ◽  
Quanbin Dong ◽  
Christian Steinberg ◽  
...  

AbstractHere, we report a chromosome-level genome assembly of Fusarium oxysporum strain Fo47 (12 pseudomolecules; contig N50: 4.52Mb), generated using a combination of PacBio long-read, Illumina pair-ended and Hi-C sequencing data. Although F. oxysporum causes vascular wilt to over 100 plant species, the strain Fo47 is classified as an endophyte and widely used as a biocontrol agent for plant disease control. The Fo47 genome carries a single accessory chromosome of 4.23 Mb, compared to the reference genome of F. oxysporum f.sp. lycopersici strain Fol4287. The high-quality assembly and annotation of the Fo47 genome will be a valuable resource for studying the mechanisms underlying the endophytic interactions between F. oxysporum and plants, as well as deciphering the genome evolution of the F. oxysporum species complex.


2021 ◽  
Vol 7 (3) ◽  
Author(s):  
David R. Greig ◽  
Claire Jenkins ◽  
Saheer E. Gharbia ◽  
Timothy J. Dallman

Compared to short-read sequencing data, long-read sequencing facilitates single contiguous de novo assemblies and characterization of the prophage region of the genome. Here, we describe our methodological approach to using Oxford Nanopore Technology (ONT) sequencing data to quantify genetic relatedness and to look for microevolutionary events in the core and accessory genomes to assess the within-outbreak variation of four genetically and epidemiologically linked isolates. Analysis of both Illumina and ONT sequencing data detected one SNP between the four sequences of the outbreak isolates. The variant calling procedure highlighted the importance of masking homologous sequences in the reference genome regardless of the sequencing technology used. Variant calling also highlighted the systemic errors in ONT base-calling and ambiguous mapping of Illumina reads that results in variations in the genetic distance when comparing one technology to the other. The prophage component of the outbreak strain was analysed, and nine of the 16 prophages showed some similarity to the prophage in the Sakai reference genome, including the stx2a-encoding phage. Prophage comparison between the outbreak isolates identified minor genome rearrangements in one of the isolates, including an inversion and a deletion event. The ability to characterize the accessory genome in this way is the first step to understanding the significance of these microevolutionary events and their impact on the evolutionary history, virulence and potentially the likely source and transmission of this zoonotic, foodborne pathogen.


2020 ◽  
Vol 33 (9) ◽  
pp. 1108-1111 ◽  
Author(s):  
Bo Wang ◽  
Houlin Yu ◽  
Yanyan Jia ◽  
Quanbin Dong ◽  
Christian Steinberg ◽  
...  

Here, we report a chromosome-level genome assembly of Fusarium oxysporum Fo47 (12 pseudomolecules; contig N50: 4.52 Mb), generated using a combination of PacBio long-read, Illumina paired end, and high-throughput chromosome conformation capture sequencing data. Although F. oxysporum causes vascular wilt to over 100 plant species, the strain Fo47 is classified as an endophyte and is widely used as a biocontrol agent for plant disease control. The Fo47 genome carries a single accessory chromosome of 4.23 Mb, compared with the reference genome of F. oxysporum f. sp. lycopersici Fol4287. The high-quality assembly and annotation of the Fo47 genome will be a valuable resource for studying the mechanisms underlying the endophytic interactions between F. oxysporum and plants as well as for deciphering the genome evolution of the F. oxysporum species complex.


2020 ◽  
Author(s):  
Mohamed Awad ◽  
Xiangchao Gan

AbstractHigh-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we propose GALA (Gap-free long-read assembler), a chromosome-by-chromosome assembly method implemented through a multi-layer computer graph that identifies mis-assemblies within preliminary assemblies or chimeric raw reads and partitions the data into chromosome-scale linkage groups. The subsequent independent assembly of each linkage group generates a gap-free assembly free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, a reference genome and even motif analyses, to generate gap-free chromosome-scale assemblies. We de novo assembled the C. elegans and A. thaliana genomes using combined Pacbio and Nanopore sequencing data from publicly available datasets. We also demonstrated the new method’s applicability with a gap-free assembly of a human genome with the help a reference genome. In addition, GALA showed promising performance for Pacbio high-fidelity long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.


2019 ◽  
Author(s):  
Heng Liang ◽  
Yan Zhang ◽  
Jiabing Deng ◽  
Gang Gao ◽  
Chunbang Ding ◽  
...  

Abstract Background: Genotyping-by-sequencing (GBS), as one of the next generation sequences, has been applied to large scale genotyping in plants, which is poor in morphological differentiation and low in genetic divergence among different species. Curcuma is a significantly medicinal and edible genus. Improvement efforts of phylogenetic relationships and disentangling species are still a challenge due to poor morphology and lack in a reference genome. Result: A high-throughput genomic sequence data which was obtained through GBS protocols was used to investigate the relationships among 8 species with 60 total samples of Curcuma. Through the use of the ipyrad software, 437,061 loci and 997,988 filtered SNPs without reliance upon a reference genome were produced. After quality control (QC) of the filtered SNPs, 1,295 high-quality SNPs were used to clarify the phylogenetic relationships among Curcuma species. Based on these data, a supermatrix approach was used to speculate the phylogeny, and the phylogenetic trees and the relationships were inferred . Conclusions: Varying degrees of support can be explained, as well as the diversification events for Chinese Curcuma. The diversification events showed that the third intense uplift of Qinghai–Tibet Plateau (QTP) and formation of the Hengduan Mountains may speed up Curcuma interspecific divergence in China. The PCA suggested the same topology of the phylogenetic tree. The genetic structure analysis revealed that extensive hybridization may exist in Chinese Curcuma. Additionally, the GBS will be a promising approach for the phylogenetic and systematic study in the future.


Sign in / Sign up

Export Citation Format

Share Document