Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

AbstractAs part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 2 females from the Berkeley reference line (BDGP6/dm6), and 220 hemiclonal females that were heterozygous for the same reference line genome, and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (BioProject PRJNA282591). Haplotype Caller discovered and genotyped 1,726,931 genetic variants (SNPs and indels, <200bp). Additionally, we used GenomeStrip/2.0 to discover and genotype 167 large structural variants (1-100Kb in size). Sequence data and quality-filtered genotype data are publicly-available at NCBI (Short Read Archive, dbSNP and dbVar). We have also released the unfiltered genotype data, and the code and logs for data processing, summary statistics, and graphs, via the research data repository, Zenodo, (https://zenodo.org/, ’Sussex Drosophila Sequencing’ community).

Download Full-text

Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

F1000Research ◽

10.12688/f1000research.9912.1 ◽

2016 ◽

Vol 5 ◽

pp. 2644 ◽

Cited By ~ 1

Author(s):

William P. Gilks ◽

Tanya M. Pennell ◽

Ilona Flis ◽

Matthew T. Webster ◽

Edward H. Morrow

Keyword(s):

Drosophila Melanogaster ◽

Complex Traits ◽

Population Sample ◽

Genomic Variation ◽

Genotype Data ◽

Whole Genome ◽

Unique Haplotype ◽

Short Read ◽

Short Read Archive ◽

Ncbi Short Read Archive

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).

Download Full-text

Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

F1000Research ◽

10.12688/f1000research.9912.3 ◽

2016 ◽

Vol 5 ◽

pp. 2644 ◽

Cited By ~ 1

Author(s):

William P. Gilks ◽

Tanya M. Pennell ◽

Ilona Flis ◽

Matthew T. Webster ◽

Edward H. Morrow

Keyword(s):

Drosophila Melanogaster ◽

Complex Traits ◽

High Throughput Sequencing ◽

Population Sample ◽

Genomic Variation ◽

Genotype Data ◽

Whole Genome ◽

Short Read ◽

Short Read Archive ◽

Ncbi Short Read Archive

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).

Download Full-text

Whole genome resequencing of a laboratory-adapted Drosophila melanogaster

F1000Research ◽

10.12688/f1000research.9912.2 ◽

2016 ◽

Vol 5 ◽

pp. 2644

Author(s):

William P. Gilks ◽

Tanya M. Pennell ◽

Ilona Flis ◽

Matthew T. Webster ◽

Edward H. Morrow

Keyword(s):

Drosophila Melanogaster ◽

Complex Traits ◽

High Throughput Sequencing ◽

Genomic Variation ◽

Genotype Data ◽

Whole Genome ◽

Unique Haplotype ◽

Short Read ◽

Short Read Archive ◽

Ncbi Short Read Archive

Download Full-text

The complete genome sequences of two species of seventeen-year cicadas: Magicicada septendecim and Magicicada septendecula

F1000Research ◽

10.12688/f1000research.27309.1 ◽

2021 ◽

Vol 10 ◽

pp. 215

Author(s):

Harold B. White ◽

Stacy Pirro

Keyword(s):

North America ◽

Related Species ◽

De Novo ◽

Eastern North America ◽

Whole Genome ◽

Genome Sequences ◽

Short Read ◽

Short Read Archive ◽

Periodical Cicadas ◽

Ncbi Short Read Archive

The genus Magicicada (Hemiptera: Cicadidae) includes the periodical cicadas of Eastern North America. Spending the majority of their long lives underground, the adult cicadas emerge every 13 or 17 years to spend 4-6 weeks as adult to mate. We present the whole genome sequences of two species of 17-year cicadas, Magicicada septendecim and Magicicada septendecula. The reads were assembled by a de novo method followed by alignments to related species. Annotation was performed by GeneMark-ES. The raw and assembled data is available via NCBI Short Read Archive and Assembly databases.

Download Full-text

Extensive horizontal exchange of transposable elements in the Drosophila pseudoobscura group

10.1101/284117 ◽

2018 ◽

Author(s):

Tom Hill ◽

Andrea J. Betancourt

Keyword(s):

Horizontal Transfer ◽

Species Group ◽

Data Availability ◽

Drosophila Pseudoobscura ◽

Chromosome Size ◽

Short Read ◽

Short Read Archive ◽

Ncbi Short Read Archive ◽

Mobile Component ◽

Different Levels

AbstractWhile the horizontal transfer of a parasitic element can be a potentially catastrophic, it is increasingly recognized as a common occurrence. The horizontal exchange, or lack of exchange, of TE content between species results in different levels of divergence among a species group in the mobile component of their genomes. Here, we examine differences in the TE content of the Drosophila pseudoobscura species group. We identify several putative horizontal transfer events, and examine the role that horizontal transfer plays in the spread of TE families to new species and the homogenization of TE content in these species. Despite rampant exchange of TE families between species, we find that both TE content differs hugely across the group, likely due to differing activity of each TE family and differing suppression of TEs due to divergence in Y chromosome size, and its resulting effects of TE regulation. Overall, we show that TE content is highly dynamic in this species group, and that it plays a large role in shaping the differences seen between species.Data availabilityAll data used in this study (summarized in table S1) is freely available online through the NCBI short read archive (NCBI SRA: ERR127385, SRR330416, SRR330418, SRR1925723, SRR330426, SRR330420, SRR330423, SRR617430-74). All genomes used are either available through flybase.org or popoolation.at.

Download Full-text

Loose ends in cancer genome structure

10.1101/2021.05.26.445837 ◽

2021 ◽

Author(s):

Julie M Behr ◽

Xiaotong Yao ◽

Kevin Hadi ◽

Huasong Tian ◽

Aditya Deshpande ◽

...

Keyword(s):

Large Scale ◽

Structural Variation ◽

Genome Structure ◽

Cancer Genome ◽

Genomic Variation ◽

Future Research ◽

Whole Genome ◽

Structural Genomic ◽

Short Read ◽

Pan Cancer

Recent pan-cancer studies have delineated patterns of structural genomic variation across thousands of tumor whole genome sequences. It is not known to what extent the shortcomings of short read (≤ 150 bp) whole genome sequencing (WGS) used for structural variant analysis has limited our understanding of cancer genome structure. To formally address this, we introduce the concept of "loose ends" - copy number alterations that cannot be mapped to a rearrangement by WGS but can be indirectly detected through the analysis of junction-balanced genome graphs. Analyzing 2,319 pan-cancer WGS cases across 31 tumor types, we found loose ends were enriched in reference repeats and fusions of the mappable genome to repetitive or foreign sequences. Among these we found genomic footprints of neotelomeres, which were surprisingly enriched in cancers with low telomerase expression and alternate lengthening of telomeres phenotype. Our results also provide a rigorous upper bound on the role of non-allelic homologous recombination (NAHR) in large-scale cancer structural variation, while nominating INO80, FANCA, and ARID1A as positive modulators of somatic NAHR. Taken together, we estimate that short read WGS maps >97% of all large-scale (>10 kbp) cancer structural variation; the rest represent loose ends that require long molecule profiling to unambiguously resolve. Our results have broad relevance for future research and clinical applications of short read WGS and delineate precise directions where long molecule studies might provide transformative insight into cancer genome structure.

Download Full-text

Complex Structural Variants Resolved by Short-Read and Long-Read Whole Genome Sequencing in Mendelian Disorders

10.1101/281683 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alba Sanchis-Juan ◽

Jonathan Stephens ◽

Courtney E French ◽

Nicholas Gleadall ◽

Karyn Mégy ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Genomic Variation ◽

Mendelian Disease ◽

Whole Genome ◽

Structural Variants ◽

Short Read ◽

Long Read ◽

Complex Structural

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.

Download Full-text

Faculty Opinions recommendation of Systems genetics of complex traits in Drosophila melanogaster.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1157122.618146 ◽

2009 ◽

Author(s):

Andy Groves

Keyword(s):

Drosophila Melanogaster ◽

Complex Traits ◽

Systems Genetics

Download Full-text

Assessing genomic diversity and signatures of selection in Jiaxian Red cattle using whole-genome sequencing data

BMC Genomics ◽

10.1186/s12864-020-07340-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaoting Xia ◽

Shunjin Zhang ◽

Huaju Zhang ◽

Zijing Zhang ◽

Ningbo Chen ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Variation ◽

Genomic Diversity ◽

System Response ◽

Whole Genome ◽

Population Structure Analysis ◽

Native Cattle ◽

Genomic Regions

Abstract Background Native cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and tough feeding conditions. Jiaxian Red, a Chinese native cattle breed, is reported to have originated from crossbreeding between taurine and indicine cattle; their history as a draft and meat animal dates back at least 30 years. Using whole-genome sequencing (WGS) data of 30 animals from the core breeding farm, we investigated the genetic diversity, population structure and genomic regions under selection of Jiaxian Red cattle. Furthermore, we used 131 published genomes of world-wide cattle to characterize the genomic variation of Jiaxian Red cattle. Results The population structure analysis revealed that Jiaxian Red cattle harboured the ancestry with East Asian taurine (0.493), Chinese indicine (0.379), European taurine (0.095) and Indian indicine (0.033). Three methods (nucleotide diversity, linkage disequilibrium decay and runs of homozygosity) implied the relatively high genomic diversity in Jiaxian Red cattle. We used θπ, CLR, FST and XP-EHH methods to look for the candidate signatures of positive selection in Jiaxian Red cattle. A total number of 171 (θπ and CLR) and 17 (FST and XP-EHH) shared genes were identified using different detection strategies. Functional annotation analysis revealed that these genes are potentially responsible for growth and feed efficiency (CCSER1), meat quality traits (ROCK2, PPP1R12A, CYB5R4, EYA3, PHACTR1), fertility (RFX4, SRD5A2) and immune system response (SLAMF1, CD84 and SLAMF6). Conclusion We provide a comprehensive overview of sequence variations in Jiaxian Red cattle genomes. Selection signatures were detected in genomic regions that are possibly related to economically important traits in Jiaxian Red cattle. We observed a high level of genomic diversity and low inbreeding in Jiaxian Red cattle. These results provide a basis for further resource protection and breeding improvement of this breed.

Download Full-text

Clinical-grade whole-genome sequencing and 3′ transcriptome analysis of colorectal cancer patients

Genome Medicine ◽

10.1186/s13073-021-00852-8 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Agata Stodolna ◽

Miao He ◽

Mahesh Vasipalli ◽

Zoya Kingsbury ◽

Jennifer Becq ◽

...

Keyword(s):

Colorectal Cancer ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Transcriptome Analysis ◽

Variant Calling ◽

Standard Of Care ◽

Genomic Variation ◽

Whole Genome ◽

Clinical Grade ◽

Pathway Gene

Abstract Background Clinical-grade whole-genome sequencing (cWGS) has the potential to become the standard of care within the clinic because of its breadth of coverage and lack of bias towards certain regions of the genome. Colorectal cancer presents a difficult treatment paradigm, with over 40% of patients presenting at diagnosis with metastatic disease. We hypothesised that cWGS coupled with 3′ transcriptome analysis would give new insights into colorectal cancer. Methods Patients underwent PCR-free whole-genome sequencing and alignment and variant calling using a standardised pipeline to output SNVs, indels, SVs and CNAs. Additional insights into the mutational signatures and tumour biology were gained by the use of 3′ RNA-seq. Results Fifty-four patients were studied in total. Driver analysis identified the Wnt pathway gene APC as the only consistently mutated driver in colorectal cancer. Alterations in the PI3K/mTOR pathways were seen as previously observed in CRC. Multiple private CNAs, SVs and gene fusions were unique to individual tumours. Approximately 30% of patients had a tumour mutational burden of > 10 mutations/Mb of DNA, suggesting suitability for immunotherapy. Conclusions Clinical whole-genome sequencing offers a potential avenue for the identification of private genomic variation that may confer sensitivity to targeted agents and offer patients new options for targeted therapies.

Download Full-text