scholarly journals Discovery of photosynthesis genes through whole-genome sequencing of acetate-requiring mutants of Chlamydomonas reinhardtii

2021 ◽  
Author(s):  
Setsuko Wakao ◽  
Patrick M. Shih ◽  
Katharine Guan ◽  
Wendy Schackwitz ◽  
Joshua Ye ◽  
...  

AbstractLarge-scale mutant libraries have been indispensable for genetic studies, and the development of next-generation genome sequencing technologies has greatly advanced efforts to analyze mutants. In this work, we sequenced the genomes of 660 Chlamydomonas reinhardtii acetate-requiring mutants, part of a larger photosynthesis mutant collection previously generated by insertional mutagenesis with a linearized plasmid. We identified 554 insertion events from 509 mutants by mapping the plasmid insertion sites through paired-end sequences, in which one end aligned to the plasmid and the other to a chromosomal location. Nearly all (96%) of the events were associated with deletions, duplications, or more complex rearrangements of genomic DNA at the sites of plasmid insertion, and 1405 genes in total were affected. Functional annotations of these genes were enriched in those related to photosynthesis, signaling, and tetrapyrrole synthesis as would be expected from a library enriched for photosynthesis mutants. Systematic manual analysis of the disrupted genes for each mutant generated a list of 273 higher-confidence candidate photosynthesis genes, and we experimentally validated two genes that are essential for photoautotrophic growth, CrLPA3 and CrPSBP4. The inventory of candidate genes includes 55 genes from a phylogenomically defined set of conserved genes in green algae and plants. Altogether, 68 candidate genes encode proteins with previously characterized functions in photosynthesis in Chlamydomonas, land plants, and/or cyanobacteria, 15 genes encode proteins previously shown to have functions unrelated to photosynthesis, and 190 genes encode proteins without any functional annotation, signifying that our results connect a function related to photosynthesis to these previously unknown proteins. This mutant library, with genome sequences that reveal the molecular extent of the chromosomal lesions and resulting higher-confidence candidate genes, represents a rich resource for gene discovery and protein functional analysis in photosynthesis.

PLoS Genetics ◽  
2021 ◽  
Vol 17 (9) ◽  
pp. e1009725
Author(s):  
Setsuko Wakao ◽  
Patrick M. Shih ◽  
Katharine Guan ◽  
Wendy Schackwitz ◽  
Joshua Ye ◽  
...  

Large-scale mutant libraries have been indispensable for genetic studies, and the development of next-generation genome sequencing technologies has greatly advanced efforts to analyze mutants. In this work, we sequenced the genomes of 660 Chlamydomonas reinhardtii acetate-requiring mutants, part of a larger photosynthesis mutant collection previously generated by insertional mutagenesis with a linearized plasmid. We identified 554 insertion events from 509 mutants by mapping the plasmid insertion sites through paired-end sequences, in which one end aligned to the plasmid and the other to a chromosomal location. Nearly all (96%) of the events were associated with deletions, duplications, or more complex rearrangements of genomic DNA at the sites of plasmid insertion, and together with deletions that were unassociated with a plasmid insertion, 1470 genes were identified to be affected. Functional annotations of these genes were enriched in those related to photosynthesis, signaling, and tetrapyrrole synthesis as would be expected from a library enriched for photosynthesis mutants. Systematic manual analysis of the disrupted genes for each mutant generated a list of 253 higher-confidence candidate photosynthesis genes, and we experimentally validated two genes that are essential for photoautotrophic growth, CrLPA3 and CrPSBP4. The inventory of candidate genes includes 53 genes from a phylogenomically defined set of conserved genes in green algae and plants. Altogether, 70 candidate genes encode proteins with previously characterized functions in photosynthesis in Chlamydomonas, land plants, and/or cyanobacteria, 14 genes encode proteins previously shown to have functions unrelated to photosynthesis. Among the remaining 169 uncharacterized genes, 38 genes encode proteins without any functional annotation, signifying that our results connect a function related to photosynthesis to these previously unknown proteins. This mutant library, with genome sequences that reveal the molecular extent of the chromosomal lesions and resulting higher-confidence candidate genes, will aid in advancing gene discovery and protein functional analysis in photosynthesis.


2017 ◽  
Author(s):  
Robert J. Schaefer ◽  
Jean-Michel Michno ◽  
Joseph Jeffers ◽  
Owen Hoekenga ◽  
Brian Dilkes ◽  
...  

AbstractBackgroundGenome wide association studies (GWAS) have identified thousands of loci linked to hundreds of traits in many different species. However, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in non-human, non-model species where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes.ResultsTo address this issue, we developed a computational approach called Camoco (Co-Analysis of Molecular Components) that systematically integrates loci identified by GWAS with gene co-expression networks to prioritize putative causal genes. We applied Camoco to prioritize candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize seeds. Camoco identified statistically significant subnetworks for the majority of traits examined, producing a prioritized list of high-confidence causal genes for several agronomically important maize traits. Two candidate genes identified by our approach were validated through analysis of mutant phenotypes. Strikingly, we observed a strong dependence in the performance of our approach on the type of co-expression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, maize roots) outperformed other alternatives.ConclusionsOur study demonstrates that co-expression networks can provide a powerful basis for prioritizing candidate causal genes from GWAS loci, but suggests that the success of such strategies can highly depend on the gene expression data context. Both the Camoco software and the lessons on integrating GWAS data with co-expression networks generalize to species beyond maize.


2017 ◽  
Author(s):  
Mark J.P. Chaisson ◽  
Ashley D. Sanders ◽  
Xuefang Zhao ◽  
Ankit Malhotra ◽  
David Porubsky ◽  
...  

ABSTRACTThe incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.


2019 ◽  
Vol 20 (1) ◽  
pp. 413-432 ◽  
Author(s):  
Kenna R. Mills Shaw ◽  
Anirban Maitra

Since the discovery that DNA alterations initiate tumorigenesis, scientists and clinicians have been exploring ways to counter these changes with targeted therapeutics. The sequencing of tumor DNA was initially limited to highly actionable hot spots—areas of the genome that are frequently altered and have an approved matched therapy in a specific tumor type. Large-scale genome sequencing programs quickly developed technological improvements that enabled the deployment of whole-exome and whole-genome sequencing technologies at scale for pristine sample materials in research environments. However, the turning point for precision medicine in oncology was the innovations in clinical laboratories that improved turnaround time, depth of coverage, and the ability to reliably sequence archived, clinically available samples. Today, tumor genome sequencing no longer suffers from significant technical or financial hurdles, and the next opportunity for improvement lies in the optimal utilization of the technologies and data for many different tumor types.


2016 ◽  
Vol 94 (suppl_5) ◽  
pp. 146-146
Author(s):  
D. M. Bickhart ◽  
L. Xu ◽  
J. L. Hutchison ◽  
J. B. Cole ◽  
D. J. Null ◽  
...  

BIO-PROTOCOL ◽  
2015 ◽  
Vol 5 (24) ◽  
Author(s):  
Chia-Hong Tsai ◽  
Christoph Benning

2021 ◽  
Author(s):  
Parsoa Khorsand ◽  
Fereydoun Hormozdiari

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.


Sign in / Sign up

Export Citation Format

Share Document