scholarly journals Long-read Data Revealed Structural Diversity in Human Centromere Sequences

2019 ◽  
Author(s):  
Yuta Suzuki ◽  
Gene Myers ◽  
Shinichi Morishita

ABSTRACTCentromeres invariably serve as the loci of kinetochore assembly in all eukaryotic cells, but their underlying DNA sequences evolve rapidly. Human centromeres are characterized by their extremely repetitive structures, i.e., higher-order repeats, rendering the region one of the most difficult parts of the genome to assess. Consequently, our understanding of centromere sequence variations across human populations is limited. Here, we analyzed chromosomes 11, 17, and X using long sequencing reads of two European and two Asian genomes, and our results show that human centromere sequences exhibit substantial structural diversity, harboring many novel variant higher-order repeats specific to individuals, while frequent single-nucleotide variants are largely conserved. Our findings add another dimension to our knowledge of centromeres, challenging the notion of stable human centromeres. The discovery of such diversity prompts further deep sequencing of human populations to understand the true nature of sequence evolution in human centromeres.


2020 ◽  
Vol 6 (50) ◽  
pp. eabd9230
Author(s):  
Yuta Suzuki ◽  
Eugene W. Myers ◽  
Shinichi Morishita

Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.



Science ◽  
2019 ◽  
Vol 366 (6463) ◽  
pp. eaax2083 ◽  
Author(s):  
PingHsun Hsieh ◽  
Mitchell R. Vollger ◽  
Vy Dang ◽  
David Porubsky ◽  
Carl Baker ◽  
...  

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.



2019 ◽  
Author(s):  
Alexander Kozik ◽  
Beth A. Rowan ◽  
Dean Lavelle ◽  
Lidija Berke ◽  
M. Eric Schranz ◽  
...  

ABSTRACTPlant mitochondrial genomes are usually assembled and displayed as circular maps based on the widely-held assumption that circular genome molecules are the primary form of mitochondrial DNA, despite evidence to the contrary. Many plant mitochondrial genomes have one or more pairs of large repeats that can act as sites for inter- or intramolecular recombination, leading to multiple alternative genomic arrangements (isoforms). Most mitochondrial genomes have been assembled using methods that were unable to capture the complete spectrum of isoforms within a species, leading to an incomplete inference of their structure and recombinational activity. To document and investigate underlying reasons for structural diversity in plant mitochondrial DNA, we used long-read (PacBio) and short-read (Illumina) sequencing data to assemble and compare mitochondrial genomes of domesticated (Lactuca sativa) and wild (L. saligna and L. serriola) lettuce species. This allowed us to characterize a comprehensive, complex set of isoforms within each species and to compare genome structures between species. Physical analysis of L. sativa mtDNA molecules by fluorescence microscopy revealed a variety of linear, branched linear, and circular structures. The mitochondrial genomes for L. sativa and L. serriola were identical in sequence and arrangement, and differed substantially from L. saligna, indicating that the mitochondrial genome structure did not change during domestication. From the isoforms evident in our data, we inferred that recombination occurs at repeats of all sizes at variable frequencies. The differences in genome structure between L. saligna and the two other lettuce species can be largely explained by rare recombination events that rearrange the structure. Our data demonstrate that representations of plant mitochondrial DNA as simple, genome-sized circular molecules are not accurate descriptions of their true nature and that in reality plant mitochondrial DNA is a complex, dynamic mixture of forms.Data AvailabilityBioProject: Organellar genomes of cultivated and wild lettuce (Lactuca) varieties PRJNA508811 https://www.ncbi.nlm.nih.gov/bioproject/508811 and other accessions as indicated through the text and supplemental data.FundingNSF grant MCB-1413152 to ACC and support from UC Davis to RWM.



Author(s):  
Adrien Oliva ◽  
Raymond Tobler ◽  
Alan Cooper ◽  
Bastien Llamas ◽  
Yassine Souilmi

Abstract The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA ‘reads’) against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30–80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software—BWA-aln, BWA-mem, NovoAlign and Bowtie2—and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hannah E. Roberts ◽  
Maria Lopopolo ◽  
Alistair T. Pagnamenta ◽  
Eshita Sharma ◽  
Duncan Parkes ◽  
...  

AbstractRecent advances in throughput and accuracy mean that the Oxford Nanopore Technologies PromethION platform is a now a viable solution for genome sequencing. Much of the validation of bioinformatic tools for this long-read data has focussed on calling germline variants (including structural variants). Somatic variants are outnumbered many-fold by germline variants and their detection is further complicated by the effects of tumour purity/subclonality. Here, we evaluate the extent to which Nanopore sequencing enables detection and analysis of somatic variation. We do this through sequencing tumour and germline genomes for a patient with diffuse B-cell lymphoma and comparing results with 150 bp short-read sequencing of the same samples. Calling germline single nucleotide variants (SNVs) from specific chromosomes of the long-read data achieved good specificity and sensitivity. However, results of somatic SNV calling highlight the need for the development of specialised joint calling algorithms. We find the comparative genome-wide performance of different tools varies significantly between structural variant types, and suggest long reads are especially advantageous for calling large somatic deletions and duplications. Finally, we highlight the utility of long reads for phasing clinically relevant variants, confirming that a somatic 1.6 Mb deletion and a p.(Arg249Met) mutation involving TP53 are oriented in trans.



2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
K Sonoda ◽  
S Ohno ◽  
M Horie

Abstract Background Genome structural variants (SVs) have larger effect on human genome functions than single nucleotide variants (SNVs). Although short-read sequencing (SRS) is current major next generation sequencing method and has given us a great benefit to elucidate the genetic background of inherited diseases, it does not detect SVs accurately. Long-read sequencing (LRS) produces tens to thousands of kilobases reads and detects the breakpoints of complex SVs. This study aimed to confirm a large deletion, which was suspected by SRS, using LRS by Oxford Nanopore technology (ONT). Methods Genomic libraries for SRS was prepared with HaloPlex. Targeted SRS was performed for 58 genes with MiSeq. Genomic libraries for LRS were prepared using the Ligation sequencing 1D kit SQK-LSK109 (ONT). Whole genome LRS was performed with GridION X5 and R9.4 flow cells (ONT). Results The patient was a five-month-old boy with atrial septal defect (ASD) and atrial tachycardia. Though SRS failed to identify any causative SNVs, the results with SureCall software (Agilent) suspected a deletion between exon 3 to exon 26 in MYH6 encoding α heavy chains of cardiac myosin. The variants in MYH6 are known to be associated with ASD. Because a deletion between MYH6 exon 26 and MYH7 exon 27 was reported as esv2748480 on the Database of Genomic Variants, we performed long-range PCR from MYH6 intron26 to MYH7 exon26 and found an abnormal 1.5K bases PCR product only in the case. Due to high homology of MYH6 and MYH7, Sanger sequencing failed to detect the break point. In LRS, 3 flow cells generated 3.8M base-called reads containing 42G bases with N50 of 13K bases. We used NGMLR, which is a long-read mapper, to align the reads to the human reference genome (hg38). SVs were called by Sniffles detecting all types of SVs. The deletion was found to range from chr14: 23390037 to 23419824 (see figure) and did not contain other SVs. There was no pathogenic SV on ACTC1, GATA4, TBX20 and TLL1 which are genes related to ASD on Genetic Testing Registry. His mother had also ASD and harbored the same deletion. Conclusions This is the first report to identify a large deletion between MYH6 and MYH7 in the family with ASD. The combination of SRS and LRS is useful to detect SVs in patients with suspected inherited diseases but carried no causative SNVs. Funding Acknowledgement Type of funding source: None





2019 ◽  
Vol 490 (4) ◽  
pp. 5088-5102 ◽  
Author(s):  
M Mugrauer

ABSTRACT A new survey is presented, which explores the second data release of the ESA-Gaia mission, in order to search for stellar companions of exoplanet host stars, located at distances closer than about 500 pc around the Sun. In total, 176 binaries, 27 hierarchical triples, and one hierarchical quadruple system are detected among more than 1300 exoplanet host stars, whose multiplicity is investigated, yielding a multiplicity rate of the exoplanet host stars of at least about 15  per cent. The detected companions and the exoplanet host stars are equidistant and share a common proper motion, as it is expected for gravitationally bound stellar systems, proven with their accurate Gaia astrometry. The companions exhibit masses in the range between about 0.078 and 1.4 M⊙ with a peak in their mass distribution between 0.15 and $0.3\, \mathrm{M}_{\odot }$. The companions are separated from the exoplanet host stars by about 20 up to 9100 au, but are found most frequently within a projected separation of 1000 au. While most of the detected companions are early M dwarfs, eight white dwarf companions of exoplanet host stars are also identified in this survey, whose true nature is revealed with their photometric properties. Hence, these degenerated companions and the exoplanet host stars form evolved stellar systems with exoplanets, which have survived (physically but also dynamically) the post-main-sequence evolution of their former primary star.



2021 ◽  
Vol 11 (8) ◽  
pp. 804
Author(s):  
Navid Neyshaburinezhad ◽  
Hengameh Ghasim ◽  
Mohammadreza Rouini ◽  
Youssef Daali ◽  
Yalda H. Ardakani

Genetic polymorphisms in cytochrome P450 genes can cause alteration in metabolic activity of clinically important medicines. Thus, single nucleotide variants (SNVs) and copy number variations (CNVs) in CYP genes are leading factors of drug pharmacokinetics and toxicity and form pharmacogenetics biomarkers for drug dosing, efficacy, and safety. The distribution of cytochrome P450 alleles differs significantly between populations with important implications for personalized drug therapy and healthcare programs. To provide a meta-analysis of CYP allele polymorphisms with clinical importance, we brought together whole-genome and exome sequencing data from 800 unrelated individuals of Iranian population (100 subjects from 8 major ethnics of Iran) and 63,269 unrelated individuals of five major human populations (EUR, AMR, AFR, EAS and SAS). By integrating these datasets with population-specific linkage information, we evolved the frequencies of 140 CYP haplotypes related to 9 important CYP450 isoenzymes (CYP1A2, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4 and CYP3A5) giving a large resource for major genetic determinants of drug metabolism. Furthermore, we evaluated the more frequent Iranian alleles and compared the dataset with the Caucasian race. Finally, the similarity of the Iranian population SNVs with other populations was investigated.



2021 ◽  
pp. gr.275325.121
Author(s):  
Rodrigo P. Baptista ◽  
Yiran Li ◽  
Adam Sateriale ◽  
Karen L. Brooks ◽  
Alan Tracey ◽  
...  

Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community has only had access to a good, but incomplete, Cryptosporidium parvum IOWA reference genome sequence. Incomplete reference sequences hamper annotation, experimental design and interpretation. We have generated a new C. parvum IOWA genome assembly supported by PacBio and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species C. parvum, Cryptosporidium hominis and Cryptosporidium tyzzeri. We made 1,926 C. parvum annotation updates based on experimental evidence. They include new transporters, ncRNAs, introns and altered gene structures. The new assembly and annotation revealed a complete Dnmt2 methylase ortholog. Comparative annotation between C. parvum, C. hominis and C. tyzzeri revealed that most "missing" orthologs are found suggesting that the biological differences between the species must result from gene copy number variation, differences in gene regulation and single nucleotide variants (SNVs). Using the new assembly and annotation as reference, 190 genes are identified as evolving under positive selection, including many not detected previously. The new C. parvum IOWA reference genome assembly is larger, gap free and lacks ambiguous bases. This chromosomal assembly recovers all 16 chromosome ends, 13 of which are contiguously assembled. The three remaining chromosome ends are provisionally placed. These ends represent duplication of entire chromosome ends including subtelomeric regions revealing a new level of genome plasticity that will both inform and impact future research.



Sign in / Sign up

Export Citation Format

Share Document