scholarly journals Computational identification of cell-specific variable regions in ChIP-seq data

2020 ◽  
Vol 48 (9) ◽  
pp. e53-e53
Author(s):  
Tommaso Andreani ◽  
Steffen Albrecht ◽  
Jean-Fred Fontaine ◽  
Miguel A Andrade-Navarro

Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate ChIP-seq variability, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7 and in mouse embryonic stem cells (mESCs). These variable-occupancy target regions (VOTs) are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Furthermore, in mESCs, VOTs are conserved among placental species suggesting that they could have a function important for this taxon. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow-up experiments.

2019 ◽  
Author(s):  
Tommaso Andreani ◽  
Steffen Albrecht ◽  
Jean-Fred Fontaine ◽  
Miguel A. Andrade-Navarro

ABSTRACTChromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate ChIP-seq variability, we developed a reproducibility score and a method that identifies cell-specific variable regions in ChIP-seq data by integrating replicated ChIP-seq experiments for multiple protein targets on a particular cell type. Using our method, we found variable regions in human cell lines K562, GM12878, HepG2, MCF-7, and in mouse embryonic stem cells (mESCs). These variable-occupancy target regions (VOTs) are CG dinucleotide rich, and show enrichment at promoters and R-loops. They overlap significantly with HOT regions, but are not blacklisted regions producing non-specific binding ChIP-seq peaks. Furthermore, in mESCs, VOTs are conserved among placental species suggesting that they could have a function important for this taxon. Our method can be useful to point to such regions along the genome in a given cell type of interest, to improve the downstream interpretative analysis before follow up experiments.


2018 ◽  
Author(s):  
Bryce van de Geijn ◽  
Hilary Finucane ◽  
Steven Gazal ◽  
Farhad Hormozdiari ◽  
Tiffany Amariuta ◽  
...  

AbstractIt is widely known that regulatory variation plays a major role in complex disease and that cell-type-specific binding of transcription factors (TF) is critical to gene regulation, but genomic annotations from directly measured TF binding information are not currently available for most cell-type-TF pairs. Here, we construct cell-type-specific TF binding annotations by intersecting sequence-based TF binding predictions with cell-type-specific chromatin data; this strategy addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context, and the limitation that sequence-based predictions are generally not cell-type-specific. We evaluated different combinations of sequence-based TF predictions and chromatin data by partitioning the heritability of 49 diseases and complex traits (average N=320K) using stratified LD score regression with the baseline-LD model (which is not cell-type-specific). We determined that 100bp windows around MotifMap sequenced-based TF binding predictions intersected with a union of six cell-type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6x vs 7.3x; P = 9 × 10-14 for difference) and a 12% increase in cell-type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10-11 for difference). Our results show that intersecting sequence-based TF predictions with cell-type-specific chromatin information can help refine genome-wide association signals.


2019 ◽  
Vol 29 (7) ◽  
pp. 1057-1067 ◽  
Author(s):  
Bryce van de Geijn ◽  
Hilary Finucane ◽  
Steven Gazal ◽  
Farhad Hormozdiari ◽  
Tiffany Amariuta ◽  
...  

Abstract Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10−14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10−11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.


2017 ◽  
Vol 14 (4) ◽  
pp. 393-402 ◽  
Author(s):  
Rajaraman Krishnan ◽  
Franz Hefti ◽  
Haim Tsubery ◽  
Michal Lulu ◽  
Ming Proschitsky ◽  
...  

Therapeutic strategies that target pathways of protein misfolding and the toxicity of intermediates along these pathways are mainly at discovery and early development stages, with the exception of monoclonal antibodies that have mainly failed to produce convincing clinical benefits in late stage trials. The clinical failures represent potentially critical lessons for future neurodegenerative disease drug development. More effective drugs may be achieved by pursuing the following two strategies. First, conformational targeting of aggregates of misfolded proteins, rather than less specific binding that includes monomer subunits, which vastly outnumber the toxic targets. Second, since neurodegenerative diseases frequently include more than one potential protein pathology, generic targeting of aggregates by shape might also be a crucial feature of a drug candidate. Incorporating both of these critical features into a viable drug candidate along with high affinity binding has not been achieved with small molecule approaches or with antibody fragments. Monoclonal antibodies developed so far are not broadly acting through conformational recognition. Using GAIM (General Amyloid Interaction Motif) represents a novel approach that incorporates high affinity conformational recognition for multiple protein assemblies, as well as recognition of an array of assemblies along the misfolding pathway between oligomers and fibers. A GAIM-Ig fusion, NPT088, is nearing clinical testing.


2021 ◽  
pp. jech-2020-214358
Author(s):  
Pekka Martikainen ◽  
Kaarina Korhonen ◽  
Aline Jelenkovic ◽  
Hannu Lahtinen ◽  
Aki Havulinna ◽  
...  

BackgroundGenetic vulnerability to coronary heart disease (CHD) is well established, but little is known whether these effects are mediated or modified by equally well-established social determinants of CHD. We estimate the joint associations of the polygenetic risk score (PRS) for CHD and education on CHD events.MethodsThe data are from the 1992, 1997, 2002, 2007 and 2012 surveys of the population-based FINRISK Study including measures of social, behavioural and metabolic factors and genome-wide genotypes (N=26 203). Follow-up of fatal and non-fatal incident CHD events (N=2063) was based on nationwide registers.ResultsAllowing for age, sex, study year, region of residence, study batch and principal components, those in the highest quartile of PRS for CHD had strongly increased risk of CHD events compared with the lowest quartile (HR=2.26; 95% CI: 1.97 to 2.59); associations were also observed for low education (HR=1.58; 95% CI: 1.32 to 1.89). These effects were largely independent of each other. Adjustment for baseline smoking, alcohol use, body mass index, igh-density lipoprotein (HDL) and total cholesterol, blood pressure and diabetes attenuated the PRS associations by 10% and the education associations by 50%. We do not find strong evidence of interactions between PRS and education.ConclusionsPRS and education predict CHD events, and these associations are independent of each other. Both can improve CHD prediction beyond behavioural risks. The results imply that observational studies that do not have information on genetic risk factors for CHD do not provide confounded estimates for the association between education and CHD.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 598.2-598
Author(s):  
E. Myasoedova ◽  
A. Athreya ◽  
C. S. Crowson ◽  
R. Weinshilboum ◽  
L. Wang ◽  
...  

Background:Methotrexate (MTX) is the most common anchor drug for rheumatoid arthritis (RA), but the risk of missing the opportunity for early effective treatment with alternative medications is substantial given the delayed onset of MTX action and 30-40% inadequate response rate. There is a compelling need to accurately predicting MTX response prior to treatment initiation, which allows for effectively identifying patients at RA onset who are likely to respond to MTX.Objectives:To test the ability of machine learning approaches with clinical and genomic biomarkers to predict MTX response with replications in independent samples.Methods:Age, sex, clinical, serological and genome-wide association study (GWAS) data on patients with early RA of European ancestry from 647 patients (336 recruited in United Kingdom [UK]; 307 recruited across Europe; 70% female; 72% rheumatoid factor [RF] positive; mean age 54 years; mean baseline Disease Activity Score with 28-joint count [DAS28] 5.65) of the PhArmacogenetics of Methotrexate in RA (PAMERA) consortium was used in this study. The genomics data comprised 160 genome-wide significant single nucleotide polymorphisms (SNPs) with p<1×10-5 associated with risk of RA and MTX metabolism. DAS28 score was available at baseline and 3-month follow-up visit. Response to MTX monotherapy at the dose of ≥15 mg/week was defined as good or moderate by the EULAR response criteria at 3 months’ follow up visit. Supervised machine-learning methods were trained with 5-repeats and 10-fold cross-validation using data from PAMERA’s 336 UK patients. Class imbalance (higher % of MTX responders) in training was accounted by using simulated minority oversampling technique. Prediction performance was validated in PAMERA’s 307 European patients (not used in training).Results:Age, sex, RF positivity and baseline DAS28 data predicted MTX response with 58% accuracy of UK and European patients (p = 0.7). However, supervised machine-learning methods that combined demographics, RF positivity, baseline DAS28 and genomic SNPs predicted EULAR response at 3 months with area under the receiver operating curve (AUC) of 0.83 (p = 0.051) in UK patients, and achieved prediction accuracies (fraction of correctly predicted outcomes) of 76.2% (p = 0.054) in the European patients, with sensitivity of 72% and specificity of 77%. The addition of genomic data improved the predictive accuracies of MTX response by 19% and achieved cross-site replication. Baseline DAS28 scores and following SNPs rs12446816, rs13385025, rs113798271, and rs2372536 were among the top predictors of MTX response.Conclusion:Pharmacogenomic biomarkers combined with DAS28 scores predicted MTX response in patients with early RA more reliably than using demographics and DAS28 scores alone. Using pharmacogenomics biomarkers for identification of MTX responders at early stages of RA may help to guide effective RA treatment choices, including timely escalation of RA therapies. Further studies on personalized prediction of response to MTX and other anti-rheumatic treatments are warranted to optimize control of RA disease and improve outcomes in patients with RA.Disclosure of Interests:Elena Myasoedova: None declared, Arjun Athreya: None declared, Cynthia S. Crowson Grant/research support from: Pfizer research grant, Richard Weinshilboum Shareholder of: co-founder and stockholder in OneOme, Liewei Wang: None declared, Eric Matteson Grant/research support from: Pfizer, Consultant of: Boehringer Ingelheim, Gilead, TympoBio, Arena Pharmaceuticals, Speakers bureau: Simply Speaking


2021 ◽  
Vol 9 (1) ◽  
pp. 6
Author(s):  
Narendra Pratap Singh ◽  
Bony De Kumar ◽  
Ariel Paulson ◽  
Mark E. Parrish ◽  
Carrie Scott ◽  
...  

Knowledge of the diverse DNA binding specificities of transcription factors is important for understanding their specific regulatory functions in animal development and evolution. We have examined the genome-wide binding properties of the mouse HOXB1 protein in embryonic stem cells differentiated into neural fates. Unexpectedly, only a small number of HOXB1 bound regions (7%) correlate with binding of the known HOX cofactors PBX and MEIS. In contrast, 22% of the HOXB1 binding peaks display co-occupancy with the transcriptional repressor REST. Analyses revealed that co-binding of HOXB1 with PBX correlates with active histone marks and high levels of expression, while co-occupancy with REST correlates with repressive histone marks and repression of the target genes. Analysis of HOXB1 bound regions uncovered enrichment of a novel 15 base pair HOXB1 binding motif HB1RE (HOXB1 response element). In vitro template binding assays showed that HOXB1, PBX1, and MEIS can bind to this motif. In vivo, this motif is sufficient for direct expression of a reporter gene and over-expression of HOXB1 selectively represses this activity. Our analyses suggest that HOXB1 has evolved an association with REST in gene regulation and the novel HB1RE motif contributes to HOXB1 function in part through a repressive role in gene expression.


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 441
Author(s):  
Fanny Pineau ◽  
Davide Caimmi ◽  
Sylvie Taviaux ◽  
Maurane Reveil ◽  
Laura Brosseau ◽  
...  

Cystic fibrosis (CF) is a chronic genetic disease that mainly affects the respiratory and gastrointestinal systems. No curative treatments are available, but the follow-up in specialized centers has greatly improved the patient life expectancy. Robust biomarkers are required to monitor the disease, guide treatments, stratify patients, and provide outcome measures in clinical trials. In the present study, we outline a strategy to select putative DNA methylation biomarkers of lung disease severity in cystic fibrosis patients. In the discovery step, we selected seven potential biomarkers using a genome-wide DNA methylation dataset that we generated in nasal epithelial samples from the MethylCF cohort. In the replication step, we assessed the same biomarkers using sputum cell samples from the MethylBiomark cohort. Of interest, DNA methylation at the cg11702988 site (ATP11A gene) positively correlated with lung function and BMI, and negatively correlated with lung disease severity, P. aeruginosa chronic infection, and the number of exacerbations. These results were replicated in prospective sputum samples collected at four time points within an 18-month period and longitudinally. To conclude, (i) we identified a DNA methylation biomarker that correlates with CF severity, (ii) we provided a method to easily assess this biomarker, and (iii) we carried out the first longitudinal analysis of DNA methylation in CF patients. This new epigenetic biomarker could be used to stratify CF patients in clinical trials.


Sign in / Sign up

Export Citation Format

Share Document