scholarly journals Pan-genome Analysis in Sorghum Highlights the Extent of Genomic Variation and Sugarcane Aphid Resistance Genes

2021 ◽  
Author(s):  
Bo Wang ◽  
Yinping Jiao ◽  
Kapeel Chougule ◽  
Andrew Olson ◽  
Jian Huang ◽  
...  

ABSTRACTSorghum bicolor, one of the most important grass crops around the world, harbors a high degree of genetic diversity. We constructed chromosome-level genome assemblies for two important sorghum inbred lines, Tx2783 and RTx436. The final high-quality reference assemblies consist of 19 and 18 scaffolds, respectively, with contig N50 values of 25.6 and 20.3 Mb. Genes were annotated using evidence-based and de novo gene predictors, and RAMPAGE data demonstrate that transcription start sites were effectively captured. Together with other public sorghum genomes, BTx623, RTx430, and Rio, extensive structural variations (SVs) of various sizes were characterized using Tx2783 as a reference. Genome-wide scanning for disease resistance (R) genes revealed high levels of diversity among these five sorghum accessions. To characterize sugarcane aphid (SCA) resistance in Tx2783, we mapped the resistance region on chromosome 6 using a recombinant inbred line (RIL) population and found a SV of 191 kb containing a cluster of R genes in Tx2783. Using Tx2783 as a backbone, along with the SVs, we constructed a pan-genome to support alignment of resequencing data from 62 sorghum accessions, and then identified core and dispensable genes using this population. This study provides the first overview of the extent of genomic structural variations and R genes in the sorghum population, and reveals potential targets for breeding of SCA resistance.

2020 ◽  
Author(s):  
Nicholas C Palmateer ◽  
Kyle Tretina ◽  
Joshua Orvis ◽  
Olukemi O Ifeonu ◽  
Jonathan Crabtree ◽  
...  

AbstractTheileria parva is an economically important, intracellular, tick-transmitted parasite of cattle. A live vaccine against the parasite is effective against challenge from cattle-transmissible T. parva but not against genotypes originating from the African Cape buffalo, a major wildlife reservoir, prompting the need to characterize genome-wide variation within and between cattle- and buffalo-associated T. parva populations. Here, we describe a capture-based target enrichment approach that enables, for the first time, de novo assembly of nearly complete T. parva genomes derived from infected host cell lines. This approach has exceptionally high specificity and sensitivity and is successful for both cattle- and buffalo-derived T. parva parasites. De novo genome assemblies generated for cattle genotypes differ from the reference by ∼54K single nucleotide polymorphisms (SNPs) throughout the 8.31 Mb genome, an average of 6.5 SNPs/kb. We report the first buffalo-derived T. parva genome, which is larger than the genome from the reference, cattle-derived, Muguga strain. The average non-synonymous nucleotide diversity (πN) per gene, between buffalo-derived T. parva and the Muguga strain, was 1.3%. This remarkably high level of genetic divergence is supported by an average FST, genome-wide, of 0.44, reflecting a degree of genetic differentiation between cattle- and buffalo-derived T. parva parasites more commonly seen between, rather than within, species, with clear implications for vaccine development. The DNA capture approach used provides clear advantages over alternative T. parva DNA enrichment methods used previously and enables in-depth comparative genomics in this apicomplexan parasite.


2019 ◽  
Author(s):  
Jennafer A. P. Hamlin ◽  
Guilherme Dias ◽  
Casey M. Bergman ◽  
Douda Bensasson

ABSTRACTAlthough normally a harmless commensal, Candida albicans has the potential to generate a wide range of infections including systemic candidaemia, making it the most common cause of bloodstream infections worldwide with a high rate of mortality. C. albicans has long been considered an obligate commensal, however, recent studies suggest it can live outside animal hosts. Here, we have generated PacBio sequencing and phased genome assemblies for three C. albicans strains from oak trees in the United Kingdom (NCYC 4144, NCYC 4145, and NCYC 4146). Our results provide phased de novo diploid assemblies for C. albicans and provide a framework to study patterns of genomic variation within and among strains of an important fungal pathogen.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 401-401
Author(s):  
Cai Chen ◽  
Christoph Bartenhagen ◽  
Michael Gombert ◽  
Vera Okpanyi ◽  
Vera Binder ◽  
...  

Abstract Abstract 401 Introduction: High hyperdiploidy (51–67 chromosomes) is the most frequent numerical cytogenetic alteration found in pediatric B-cell precursor acute lymphoblastic leukemia (ALL), occurring in 25–30% of patients. It is characterized by nonrandom gains of chromosomes X, 4, 6, 10, 14, 17, 18, or 21. Children suffering from high hyperdiploid ALL have a good prognosis, nevertheless in 15–20% of cases the disease will recur. The mechanisms involved in the pathogenesis of primary and relapsing high hyperdiploid ALL are poorly understood. In some cases, IGH rearrangements arise in utero, indicating an early formation of pre-leukemic clones. However, the cellular origin of these pre-leukemic clones, as well as the molecular mechanism underlying the formation of high hyperdiploid cells, remains to be determined. Further genetic changes assisting in the development of ALL and recurrent disease are still unknown. Objective: By using massive parallel genome-wide next generation sequencing (Illumina/Solexa), we intended to identify specific cytogenetic structural variations (SVs) of high hyperdiploid ALL and possible clonal relationships between paired diagnostic and relapse ALL samples. Method: Paired-end sequencing libraries were generated from genomic DNA of diagnostic and relapse leukemic samples as well as germline DNA from the same patient. Libraries of two patients and one high hyperdiploid ALL cell line (MHH-CALL-2) with insert sizes of 350–400 bp were sequenced with paired end reads. Read lengths of 36 bp (Genome analyzer IIx) or 51 bp (HiSeq 2000) were sequenced, respectively. Sequencing raw data were aligned to the human reference genome hg19 (GRCh 37) by Burrows-Wheeler Aligner (BWA) and duplicate reads were removed. Copy number variants (CNVs), deletions, intrachromosomal inversions and interchromosomal translocations were analyzed by FREEC and GASV. After subtraction of germline SVs, putative leukemia-specific SVs were obtained. These were validated by PCR performed on genomic DNA. Specific breakpoints of SVs at single base resolution were identified by capillary sequencing of the PCR products. Results: Sequencing of different libraries yielded 95–279 million unique reads that mapped with both ends to the reference genome. Sequence coverages of 57–87% and fragment coverages of 4.9–12.3x were achieved (Table 1). CNV profiles with 10 kb resolution were generated. A comparison of the CNVs of diagnosis and relapse ALL samples demonstrated a high degree of conformity with only few additional alterations present mainly, but not exclusively, in the relapse samples. In one of the patients, a large gain of chromosome 1q was only observed in the relapse sample (Figure 1). SV analysis of all samples resulted in a total of 375 intragenic deletions, 16 intergenic inversions and 83 translocations (Table 1). PCR validation identified 2 previously unknown somatic translocations in the MHH-CALL-2 cell line concerning chromosomes 3 and 7 as well as chromosomes 15 and 18. Furthermore, 6 novel translocations present at diagnosis and relapse could be validated in patient samples. They were concerning chromosomes 3, 11, 12 and 20. One unique new relapse-specific translocation t(4;7) was identified. Conclusion: Paired-end sequencing of leukemia samples and matched non-tumor materials provides a robust tool for the discovery of genome-wide structural rearrangements. The high degree of conformity of CNVs and SVs detected in paired diagnosis/relapse samples indicate a common origin and a close relationship of the leukemic clones at diagnosis and relapse. The observation of few additional alterations in both diagnostic and relapse samples suggests the presence of different subclones at the time of diagnosis and the evolution of the relapse clone from either the diagnostic clone or a minor subclone. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Michael Alonge ◽  
Sebastian Soyk ◽  
Srividya Ramakrishnan ◽  
Xingang Wang ◽  
Sara Goodwin ◽  
...  

Abstract We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jeffrey Shih-Chieh Chu ◽  
Bo Peng ◽  
Kuanqiang Tang ◽  
Xingxing Yi ◽  
Huangkai Zhou ◽  
...  

AbstractComparative analysis of multiple reference genomes representing diverse genetic backgrounds is critical for understanding the role of key alleles important in domestication and genetic breeding of important crops such as soybean. To enrich the genetic resources for soybean, we describe the generation, technical assessment, and preliminary genomic variation analysis of eight de novo reference-grade soybean genome assemblies from wild and cultivated accessions. These resources represent soybeans cultured at different latitudes and exhibiting different agronomical traits. Of these eight soybeans, five are from new accessions that have not been sequenced before. We demonstrate the usage of these genomes to identify small and large genomic variations affecting known genes as well as screening for genic PAV regions for identifying candidates for further functional studies.


Author(s):  
Dhawal Jain ◽  
Chong Chu ◽  
Burak Han Alver ◽  
Soohyun Lee ◽  
Eunjung Alice Lee ◽  
...  

ABSTRACT   Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples. Availability and implementation HiTea is available at https://github.com/parklab/HiTea and as a Docker image. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Jay Ghurye ◽  
Mihai Pop ◽  
Sergey Koren ◽  
Chen-Shan Chin

AbstractMotivationLong read technologies have made a revolution in de novo genome assembly by generating contigs of size orders of magnitude more than that of short read assemblies. Although the assembly contiguity has increased, it still does not span a chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To address this problem, we develop a scalable and computationally efficient scaffolding method that can boost the contiguity of the assembly by a large extent using genome wide chromatin interaction data such as Hi-C. Particularly, we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies.ResultsWe tested our methods on two long read assemblies of different organisms. We compared our method with previously developed method and show that our approach performs better in terms of accuracy of scaffolding.AvailabilityThe software is available for free use and can be downloaded from here: https://github.com/machinegun/[email protected]


2020 ◽  
Author(s):  
Dhawal Jain ◽  
Chong Chu ◽  
Burak Han Alver ◽  
Soohyun Lee ◽  
Eunjung Alice Lee ◽  
...  

AbstractHi-C is a common technique for assessing three-dimensional chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline HiTea (Hi-C based Transposable element analyzer) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE insertion landscape. We employ the pipeline to identify TE insertions from human cell-line Hi-C samples. HiTea is available at https://github.com/parklab/HiTea and as a Docker image.


2020 ◽  
Vol 14 (10) ◽  
pp. e0008781
Author(s):  
Nicholas C. Palmateer ◽  
Kyle Tretina ◽  
Joshua Orvis ◽  
Olukemi O. Ifeonu ◽  
Jonathan Crabtree ◽  
...  

Theileria parva is an economically important, intracellular, tick-transmitted parasite of cattle. A live vaccine against the parasite is effective against challenge from cattle-transmissible T. parva but not against genotypes originating from the African Cape buffalo, a major wildlife reservoir, prompting the need to characterize genome-wide variation within and between cattle- and buffalo-associated T. parva populations. Here, we describe a capture-based target enrichment approach that enables, for the first time, de novo assembly of nearly complete T. parva genomes derived from infected host cell lines. This approach has exceptionally high specificity and sensitivity and is successful for both cattle- and buffalo-derived T. parva parasites. De novo genome assemblies generated for cattle genotypes differ from the reference by ~54K single nucleotide polymorphisms (SNPs) throughout the 8.31 Mb genome, an average of 6.5 SNPs/kb. We report the first buffalo-derived T. parva genome, which is ~20 kb larger than the genome from the reference, cattle-derived, Muguga strain, and contains 25 new potential genes. The average non-synonymous nucleotide diversity (πN) per gene, between buffalo-derived T. parva and the Muguga strain, was 1.3%. This remarkably high level of genetic divergence is supported by an average Wright’s fixation index (FST), genome-wide, of 0.44, reflecting a degree of genetic differentiation between cattle- and buffalo-derived T. parva parasites more commonly seen between, rather than within, species. These findings present clear implications for vaccine development, further demonstrated by the ability to assemble nearly all known antigens in the buffalo-derived strain, which will be critical in design of next generation vaccines. The DNA capture approach used provides a clear advantage in specificity over alternative T. parva DNA enrichment methods used previously, such as those that utilize schizont purification, is less labor intensive, and enables in-depth comparative genomics in this apicomplexan parasite.


2020 ◽  
Vol 10 (8) ◽  
pp. 2585-2592
Author(s):  
Sam D. Heraghty ◽  
John M. Sutton ◽  
Meaghan L. Pimsler ◽  
Janna L. Fierst ◽  
James P. Strange ◽  
...  

Bumble bees are ecologically and economically important insect pollinators. Three abundant and widespread species in western North America, Bombus bifarius, Bombus vancouverensis, and Bombus vosnesenskii, have been the focus of substantial research relating to diverse aspects of bumble bee ecology and evolutionary biology. We present de novo genome assemblies for each of the three species using hybrid assembly of Illumina and Oxford Nanopore Technologies sequences. All three assemblies are of high quality with large N50s (> 2.2 Mb), BUSCO scores indicating > 98% complete genes, and annotations producing 13,325 – 13,687 genes, comparing favorably with other bee genomes. Analysis of synteny against the most complete bumble bee genome, Bombus terrestris, reveals a high degree of collinearity. These genomes should provide a valuable resource for addressing questions relating to functional genomics and evolutionary biology in these species.


Sign in / Sign up

Export Citation Format

Share Document