scholarly journals Biological insights from the whole genome analysis of human embryonic stem cells

2020 ◽  
Author(s):  
Florian T. Merkle ◽  
Sulagna Ghosh ◽  
Giulio Genovese ◽  
Robert E. Handsaker ◽  
Seva Kashin ◽  
...  

ABSTRACTThere has not yet been a systematic analysis of hESC whole genomes at a single nucleotide resolution. We therefore performed whole genome sequencing (WGS) of 143 hESC lines and annotated their single nucleotide and structural genetic variants. We found that while a substantial fraction of hESC lines contained large deleterious structural variants, finer scale structural and single nucleotide variants (SNVs) that are ascertainable only through WGS analyses were present in hESCs genomes and human blood-derived genomes at similar frequencies. However, WGS did identify SNVs associated with cancer or other diseases that will likely alter cellular phenotypes and may compromise the safety of hESC-derived cellular products transplanted into humans. As a resource to enable reproducible hESC research and safer translation, we provide a user-friendly WGS data portal and a data-driven scheme for cell line maintenance and selection.GRAPHICAL ABSTRACTIN BRIEFMerkle and Ghosh et al. describe insights from the whole genome sequences of commonly used human embryonic stem cell (hESC) lines. Analyses of these sequences show that while hESC genomes had more large structural variants than humans do from genetic inheritance, hESCs did not have an observable excess of finer-scale variants. However, many hESC lines contained rare loss-of-function variants and combinations of common variants that may profoundly shape their biological phenotypes. Thus, genome sequencing data can be valuable to those selecting cell lines for a given biological or clinical application, and the sequences and analysis reported here should facilitate such choices.HIGHLIGHTSOne third of hESCs we analysed are siblings, and almost all are of European ancestryLarge structural variants are common in hESCs, but finer-scale variation is similar to that human populationsMany strong-effect loss-of-function mutations and cancer-associated mutations are present in specific hESC linesWe provide user-friendly resources for rational hESC line selection based on genome sequence

BMC Genetics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Lucy Crooks ◽  
Johnathan Cooper-Knock ◽  
Paul R. Heath ◽  
Ahmed Bouhouche ◽  
Mostafa Elfahime ◽  
...  

Abstract Background Large-scale human sequencing projects have described around a hundred-million single nucleotide variants (SNVs). These studies have predominately involved individuals with European ancestry despite the fact that genetic diversity is expected to be highest in Africa where Homo sapiens evolved and has maintained a large population for the longest time. The African Genome Variation Project examined several African populations but these were all located south of the Sahara. Morocco is on the northwest coast of Africa and mostly lies north of the Sahara, which makes it very attractive for studying genetic diversity. The ancestry of present-day Moroccans is unknown and may be substantially different from Africans found South of the Sahara desert, Recent genomic data of Taforalt individuals in Eastern Morocco revealed 15,000-year-old modern humans and suggested that North African individuals may be genetically distinct from previously studied African populations. Results We present SNVs discovered by whole genome sequencing (WGS) of three Moroccans. From a total of 5.9 million SNVs detected, over 200,000 were not identified by 1000G and were not in the extensive gnomAD database. We summarise the SNVs by genomic position, type of sequence gene context and effect on proteins encoded by the sequence. Analysis of the overall genomic information of the Moroccan individuals to individuals from 1000G supports the Moroccan population being distinct from both sub-Saharan African and European populations. Conclusions We conclude that Moroccan samples are genetically distinct and lie in the middle of the previously observed cline between populations of European and African ancestry. WGS of Moroccan individuals can identify a large number of novel SNVs and aid in functional characterisation of the genome.


2020 ◽  
Author(s):  
Lucy Crooks ◽  
Johnathan Cooper-Knock ◽  
Paul R. Heath ◽  
Ahmed Bouhouche ◽  
Elmostafa El Fahime ◽  
...  

Abstract Background Large-scale human sequencing projects have described around a hundred-million single nucleotide variants (SNVs), which have predominately focused on individuals with European ancestry despite the fact that genetic diversity is expected to be highest in Africa where Homo sapiens evolved and has maintained a large population for the longest time. The more recent African Genome Variation Project examined several African populations but these were all located south of the Sahara. Morocco is on the northwest coast of Africa and mostly lies north of the Sahara, which makes it very attractive for studying genetic diversity. Recent genomic data of Taforalt individuals in Eastern Morocco revealed 15,000-year-old modern humans, showed that North Africa individuals are expected to show genetic differences from previously studied African populations. Results We present single nucleotide variant (SNV) results from whole genome sequencing (WGS) of three Moroccans. From a total of 5.9 million SNVs detected, over 200,000 were not identified by 1000G. We provide a summary of the SNVs by genomic position, gene context and effect on protein coding. Comparison of genome-wide information of the Moroccan individuals to individuals from 1000G by principal component analysis revealed a substantial genomic distinction between the Moroccan population and sub-Saharan African populations. Conclusions We conclude that Moroccan samples lie in the middle of the previously observed cline between populations of European and African ancestry. WGS of Moroccan individuals can identify a large number of new SNVs and aid in functional characterisation of the genome.


Author(s):  
Emmanuel Lecorche ◽  
Côme Daniau ◽  
Kevin La ◽  
Faiza Mougari ◽  
Hanaa Benmansour ◽  
...  

Abstract Background Post-surgical infections due to Mycobacterium chimaera appeared as a novel nosocomial threat in 2015, with a worldwide outbreak due to contaminated heater-cooler units used in open chest surgery. We report the results of investigations conducted in France including whole genome sequencing comparison of patient and HCU isolates. Methods We sought M. chimaera infection cases from 2010 onwards through national epidemiological investigations in healthcare facilities performing cardiopulmonary bypass together with a survey on good practices and systematic heater-cooler unit microbial analyses. Clinical and HCU isolates were subjected to whole genome sequencing analyzed with regards to the reference outbreak strain Zuerich-1. Results Only two clinical cases were shown to be related to the outbreak, although 23% (41/175) heater-cooler units were declared positive for M. avium complex. Specific measures to prevent infection were applied in 89% (50/56) healthcare facilities although only 14% (8/56) of them followed the manufacturer maintenance recommendations. Whole genome sequencing comparison showed that the clinical isolates and 72% (26/36) of heater-cooler unit isolates belonged to the epidemic cluster. Within clinical isolates, 5 to 9 non-synonymous single nucleotide polymorphisms were observed, among which an in vivo mutation in a putative efflux pump gene observed in a clinical isolate obtained for one patient under antimicrobial treatment. Conclusions Cases of post-surgical M. chimaera infections were declared to be rare in France, although heater-cooler units were contaminated as in other countries. Genomic analyses confirmed the connection to the outbreak and identified specific single nucleotide polymorphisms, including one suggesting fitness evolution in vivo.


iScience ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 100769 ◽  
Author(s):  
Basil B. Xavier ◽  
Mohamed Mysara ◽  
Mattia Bolzan ◽  
Bruno Ribeiro-Gonçalves ◽  
Blaise T.F. Alako ◽  
...  

2015 ◽  
Vol 117 (suppl_1) ◽  
Author(s):  
Matthew Wheeler ◽  
Daryl Waggott ◽  
Megan Grove ◽  
Frederick Dewey ◽  
Cuiping Pan ◽  
...  

Background: Technological advances have greatly reduced the cost of whole genome sequencing. For single individuals clinical application is apparent, while exome sequencing in tens of thousands of people has allowed a more global view of genetic variation that can inform interpretation of specific variants in individuals. We hypothesized that genome sequencing of patients with monogenic cardiomyopathy would facilitate discovery of genetic modifiers of phenotype. Methods and Results: We identified 48 individuals diagnosed with cardiomyopathy and with putative mutations in MYH7, the gene encoding beta myosin heavy chain. We carried out whole genome sequencing and applied a newly developed analytical pipeline optimized for discovery of genes modifying severity of clinical presentation and outcomes. Using a combination of external priors and rare variant burden tests we scored genes as potential modifiers. There were 96 genes that reached a modifier score of 6 out of 12 or better (9=2, 8=8, 7=17, 6=69). We identified NCKAP1, a gene that regulates actin filament dynamics, and CAMSAP1, a calmodulin regulate gene that regulates microtubule dynamics, as top scoring modifiers of hypertrophic cardiomyopathy phenotypes (score=9) while LDB2, RYR2, FBN1 and ATP1A2 had modifier scores of 8. Of the top scoring genes, 21 out of 96 were identified as candidates a priori. Our candidate prioritization scheme identified the previously described modifiers of cardiomyopathy phenotype, FHOD3 and MYBPC3, as top scoring genes. We identified structural variants in 21 clinically sequenced cardiomyopathy associated genes, 13 of which were at less than 10% frequency. Copy number variants in ILK and CSRP3 were nominally associated with ejection fraction (p=0.03), while 8 genes showed copy gains (GLA, FKTN, SGCD, TTN, SOS1, ANKRD1, VCL and NEBL). Structural variants were found in CSRP3, MYL3 and TNNC1, all of which have been implicated as causative for HCM. Conclusion: Evaluation of the whole genome sequence, even in the case of putatively monogenic disease, leads to important diagnostic and scientific insights not revealed by panel-based sequencing.


2018 ◽  
Author(s):  
Maxime Garcia ◽  
Szilveszter Juhos ◽  
Malin Larsson ◽  
Pall I. Olason ◽  
Marcel Martin ◽  
...  

AbstractSummaryWhole-genome sequencing (WGS) is a cornerstone of precision medicine, but portable and reproducible open-source workflows for WGS analyses of germline and somatic variants are lacking. We present Sarek, a modular, comprehensive, and easy-to-install workflow, combining a range of software for the identification and annotation of single-nucleotide variants (SNVs), insertion and deletion variants (indels), structural variants, tumor sample heterogeneity, and karyotyping from germline or paired tumor/normal samples. Sarek is implemented in a bioinformatics workflow language (Nextflow) with Docker and Singularity compatible containers, ensuring easy deployment and full reproducibility at any Linux based compute cluster or cloud computing environment. Sarek supports the human reference genomes GRCh37 and GRCh38, and can readily be used both as a core production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups.AvailabilitySource code and instructions for local installation are available at GitHub (https://github.com/SciLifeLab/Sarek) under the MIT open-source license, and we invite the research community to contribute additional functionality as a collaborative open-source development project.


Foods ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 2454
Author(s):  
Rebecca N. Bland ◽  
Jared D. Johnson ◽  
Joy G. Waite-Cusic ◽  
Alexandra J. Weisberg ◽  
Elizabeth R. Riutta ◽  
...  

Recent listeriosis outbreaks linked to fresh produce suggest the need to better understand and mitigate L. monocytogenes contamination in packing and processing environments. Using whole genome sequencing (WGS) and phenotype screening assays for sanitizer tolerance, we characterized 48 L. monocytogenes isolates previously recovered from environmental samples in five produce handling facilities. Within the studied population there were 10 sequence types (STs) and 16 cgMLST types (CTs). Pairwise single nucleotide polymorphisms (SNPs) ranged from 0 to 3047 SNPs within a CT, revealing closely and distantly related isolates indicative of both sporadic and continuous contamination events within the facility. Within Facility 1, we identified a closely related cluster (0–2 SNPs) of isolates belonging to clonal complex 37 (CC37; CT9492), with isolates recovered during sampling events 1-year apart and in various locations inside and outside the facility. The accessory genome of these CC37 isolates varied from 94 to 210 genes. Notable genetic elements and mutations amongst the isolates included the bcrABC cassette (2/48), associated with QAC tolerance; mutations in the actA gene on the Listeria pathogenicity island (LIPI) 1 (20/48); presence of LIPI-3 (21/48) and LIPI-4 (23/48). This work highlights the potential use of WGS in tracing the pathogen within a facility and understanding properties of L. monocytogenes in produce settings.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 3767-3767 ◽  
Author(s):  
Cody Ashby ◽  
Eileen M Boyle ◽  
Brian A Walker ◽  
Michael A Bauer ◽  
Katie Rose Ryan ◽  
...  

Background: Structural variants are key recurrent molecular features of myeloma (MM) with two types of complex rearrangement, chromoplexy and chromothripsis, having been described recently. The contribution of these to MM prognosis, rapid changes in clinical behavior and punctuated evolution is currently unknown as is the mechanism by which they deregulate gene function. Methods: We analyzed two sets of newly diagnosed MM data: 85 cases with phased whole genome sequencing; and 812 cases from CoMMpass where long-insert whole-genome sequencing was available. Patient derived xenografts from five MM cases were used to generate epigenetic maps for the histone marks, BRD4, MED1, H3K27Ac, H3K4me1, H3K4me3, H3K9me3, H3K36me3 and H3K27me3. Results: In the 10X data the median number of structural events per case was 25 (range 1 - 182); with a median of 14 intra-chromosomal events (range 1 - 179; P<0.001) and 7 inter-chromosomal events (range 0 - 29). Structural events were seen most frequently on chromosomes 14 (64%), 8 (53%), 1 (44%) and 6 (42%). Complex chromosomal rearrangements involving 3 or more chromosomal sites were seen in 46%, 4 or more sites in 20%, 5 or more in 10% and 6 or more in 5% of samples. There were significantly more structural events in the t(4;14) subgroup compared to the t(11;14) subgroup. Significantly more events were also seen in the bi-allelically inactivated TP53 cases. Using an elbow test defined cutoff, we identified cases with high structural variant load in 10% of cases. Chromoplexy called by "Chainfinder" was seen in 18% of cases. Chromothripsis called by "Shatterseek" was seen in 9% of cases. Cases with a high structural load alone were not associated with an adverse outcome whereas cases with chromoplexy or chromothripsis were associated with adverse PFS and OS, p=0.001. A new high-risk subgroup comprising approximately 5% of cases was identified with chromoplexy, chromothripsis and a high structural load. Gene set enrichment analysis of cases with chromoplexy and chromothripsis showed an excess of MYC, E2F and G2M targets, and a reduction in RAS signaling. Interferon a and g responses, an excess of TP53 and reduction in TRAF3 mutations was associated predominantly with chromothripsis. How chromoplexy and chromothripsis are tolerated by the cell is unknown and the association with the cGAS/STING response is further being explored. To determine how chromoplexy may deregulate multiple genes we identified the full spectrum of structural variants to the immunoglobulin (Ig) and non-Ig loci. A range of genes are deregulated by Ig loci including MAP3K14 at a frequency of 2% confirming the importance of non-canonical NFkB signaling. A novel intra-chromosomal rearrangement to ZFP36L1 was upregulated in 10% of cases but was not prognostic. Gene upregulation by non-Ig super enhancers is frequent and targets include PAX5, GLI3, CD40, NFKB1, MAP3K14, LRRC37A, LIPG, PHLDA3, ZNF267, CENPF, SLC44A2, MIER1, SOX30, TMEM258, PPIL1, and BUB3. The topologically associating domain (TADs) containing super enhancers bringing about gene deregulation include TXNDC5, FOXO3, FCHSD2, SP2, FAM46C, CACNA1C, TLCD2 and PIK3C2G. These super enhancers frequently contain important MM genes, the coding sequence of which are disrupted by the rearrangement and could contribute to the clinical phenotype. Accurately reconstructing the structure of the complex rearrangements will allow us to identify the mechanism of gene deregulation and to distinguish between either gene stacking, receptor stacking or both. Conclusions: Upregulation of gene expression by super enhancer rearrangement is a major mechanism of gene deregulation in MM and complex structural events contribute significantly to adverse prognosis by a range of mechanisms as well as simple gene overexpression. Disclosures Boyle: Amgen, Abbvie, Janssen, Takeda, Celgene Corporation: Honoraria; Amgen, Janssen, Takeda, Celgene Corporation: Other: Travel expenses. Walker:Celgene: Research Funding. Thakurta:Celgene: Employment, Equity Ownership. Flynt:Celgene Corporation: Employment, Equity Ownership. Davies:Amgen, Celgene, Janssen, Oncopeptides, Roche, Takeda: Membership on an entity's Board of Directors or advisory committees, Other: Consultant/Advisor; Janssen, Celgene: Other: Research Grant, Research Funding. Morgan:Amgen, Roche, Abbvie, Takeda, Celgene, Janssen: Honoraria, Membership on an entity's Board of Directors or advisory committees; Celgene: Other: research grant, Research Funding.


Sign in / Sign up

Export Citation Format

Share Document