scholarly journals Whole genome resequencing of a laboratory-adapted Drosophila melanogaster

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2644
Author(s):  
William P. Gilks ◽  
Tanya M. Pennell ◽  
Ilona Flis ◽  
Matthew T. Webster ◽  
Edward H. Morrow

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2644 ◽  
Author(s):  
William P. Gilks ◽  
Tanya M. Pennell ◽  
Ilona Flis ◽  
Matthew T. Webster ◽  
Edward H. Morrow

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2644 ◽  
Author(s):  
William P. Gilks ◽  
Tanya M. Pennell ◽  
Ilona Flis ◽  
Matthew T. Webster ◽  
Edward H. Morrow

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).


2016 ◽  
Author(s):  
William P. Gilks ◽  
Tanya M. Pennell ◽  
Ilona Flis ◽  
Matthew T. Webster ◽  
Edward H. Morrow

AbstractAs part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 2 females from the Berkeley reference line (BDGP6/dm6), and 220 hemiclonal females that were heterozygous for the same reference line genome, and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (BioProject PRJNA282591). Haplotype Caller discovered and genotyped 1,726,931 genetic variants (SNPs and indels, <200bp). Additionally, we used GenomeStrip/2.0 to discover and genotype 167 large structural variants (1-100Kb in size). Sequence data and quality-filtered genotype data are publicly-available at NCBI (Short Read Archive, dbSNP and dbVar). We have also released the unfiltered genotype data, and the code and logs for data processing, summary statistics, and graphs, via the research data repository, Zenodo, (https://zenodo.org/, ’Sussex Drosophila Sequencing’ community).


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 215
Author(s):  
Harold B. White ◽  
Stacy Pirro

The genus Magicicada (Hemiptera: Cicadidae) includes the periodical cicadas of Eastern North America. Spending the majority of their long lives underground, the adult cicadas emerge every 13 or 17 years to spend 4-6 weeks as adult to mate. We present the whole genome sequences of two species of 17-year cicadas, Magicicada septendecim and Magicicada septendecula. The reads were assembled by a de novo method followed by alignments to related species. Annotation was performed by GeneMark-ES. The raw and assembled data is available via NCBI Short Read Archive and Assembly databases.


2018 ◽  
Author(s):  
Tom Hill ◽  
Andrea J. Betancourt

AbstractWhile the horizontal transfer of a parasitic element can be a potentially catastrophic, it is increasingly recognized as a common occurrence. The horizontal exchange, or lack of exchange, of TE content between species results in different levels of divergence among a species group in the mobile component of their genomes. Here, we examine differences in the TE content of the Drosophila pseudoobscura species group. We identify several putative horizontal transfer events, and examine the role that horizontal transfer plays in the spread of TE families to new species and the homogenization of TE content in these species. Despite rampant exchange of TE families between species, we find that both TE content differs hugely across the group, likely due to differing activity of each TE family and differing suppression of TEs due to divergence in Y chromosome size, and its resulting effects of TE regulation. Overall, we show that TE content is highly dynamic in this species group, and that it plays a large role in shaping the differences seen between species.Data availabilityAll data used in this study (summarized in table S1) is freely available online through the NCBI short read archive (NCBI SRA: ERR127385, SRR330416, SRR330418, SRR1925723, SRR330426, SRR330420, SRR330423, SRR617430-74). All genomes used are either available through flybase.org or popoolation.at.


2021 ◽  
Author(s):  
Víctor García-Olivares ◽  
Adrián Muñoz-Barrera ◽  
José Miguel Lorenzo-Salazar ◽  
Carlos Zaragoza-Trello ◽  
Luis A. Rubio-Rodríguez ◽  
...  

AbstractThe mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. Besides, because of its relevance, we also assess the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.


2021 ◽  
Author(s):  
Julie M Behr ◽  
Xiaotong Yao ◽  
Kevin Hadi ◽  
Huasong Tian ◽  
Aditya Deshpande ◽  
...  

Recent pan-cancer studies have delineated patterns of structural genomic variation across thousands of tumor whole genome sequences. It is not known to what extent the shortcomings of short read (≤ 150 bp) whole genome sequencing (WGS) used for structural variant analysis has limited our understanding of cancer genome structure. To formally address this, we introduce the concept of "loose ends" - copy number alterations that cannot be mapped to a rearrangement by WGS but can be indirectly detected through the analysis of junction-balanced genome graphs. Analyzing 2,319 pan-cancer WGS cases across 31 tumor types, we found loose ends were enriched in reference repeats and fusions of the mappable genome to repetitive or foreign sequences. Among these we found genomic footprints of neotelomeres, which were surprisingly enriched in cancers with low telomerase expression and alternate lengthening of telomeres phenotype. Our results also provide a rigorous upper bound on the role of non-allelic homologous recombination (NAHR) in large-scale cancer structural variation, while nominating INO80, FANCA, and ARID1A as positive modulators of somatic NAHR. Taken together, we estimate that short read WGS maps >97% of all large-scale (>10 kbp) cancer structural variation; the rest represent loose ends that require long molecule profiling to unambiguously resolve. Our results have broad relevance for future research and clinical applications of short read WGS and delineate precise directions where long molecule studies might provide transformative insight into cancer genome structure.


2018 ◽  
Author(s):  
Alba Sanchis-Juan ◽  
Jonathan Stephens ◽  
Courtney E French ◽  
Nicholas Gleadall ◽  
Karyn Mégy ◽  
...  

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiaoting Xia ◽  
Shunjin Zhang ◽  
Huaju Zhang ◽  
Zijing Zhang ◽  
Ningbo Chen ◽  
...  

Abstract Background Native cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and tough feeding conditions. Jiaxian Red, a Chinese native cattle breed, is reported to have originated from crossbreeding between taurine and indicine cattle; their history as a draft and meat animal dates back at least 30 years. Using whole-genome sequencing (WGS) data of 30 animals from the core breeding farm, we investigated the genetic diversity, population structure and genomic regions under selection of Jiaxian Red cattle. Furthermore, we used 131 published genomes of world-wide cattle to characterize the genomic variation of Jiaxian Red cattle. Results The population structure analysis revealed that Jiaxian Red cattle harboured the ancestry with East Asian taurine (0.493), Chinese indicine (0.379), European taurine (0.095) and Indian indicine (0.033). Three methods (nucleotide diversity, linkage disequilibrium decay and runs of homozygosity) implied the relatively high genomic diversity in Jiaxian Red cattle. We used θπ, CLR, FST and XP-EHH methods to look for the candidate signatures of positive selection in Jiaxian Red cattle. A total number of 171 (θπ and CLR) and 17 (FST and XP-EHH) shared genes were identified using different detection strategies. Functional annotation analysis revealed that these genes are potentially responsible for growth and feed efficiency (CCSER1), meat quality traits (ROCK2, PPP1R12A, CYB5R4, EYA3, PHACTR1), fertility (RFX4, SRD5A2) and immune system response (SLAMF1, CD84 and SLAMF6). Conclusion We provide a comprehensive overview of sequence variations in Jiaxian Red cattle genomes. Selection signatures were detected in genomic regions that are possibly related to economically important traits in Jiaxian Red cattle. We observed a high level of genomic diversity and low inbreeding in Jiaxian Red cattle. These results provide a basis for further resource protection and breeding improvement of this breed.


Sign in / Sign up

Export Citation Format

Share Document