scholarly journals HyperChIP for identifying hypervariable signals across ChIP/ATAC-seq samples

2021 ◽  
Author(s):  
Haojie Chen ◽  
Shiqi Tu ◽  
Chongze Yuan ◽  
Feng Tian ◽  
Yijing Zhang ◽  
...  

With the reduction in sequencing costs, studies become prevalent that profile the chromatin landscape for tens or even hundreds of human individuals by using ChIP/ATAC-seq techniques. Identifying genomic regions with hypervariable ChIP/ATAC-seq signals across given samples is essential for such studies. In particular, the hypervariable regions (HVRs) across tumors from different patients indicate their heterogeneity and can contribute to revealing potential cancer subtypes and the associated epigenetic markers. We present HyperChIP as the first complete statistical tool for the task. HyperChIP uses scaled variances that account for the mean-variance dependence to rank genomic regions, and it increases the statistical power by diminishing the influence of true HVRs on model fitting. Applying it to a large pan-cancer ATAC-seq data set, we found that the identified HVRs not only provided a solid basis to uncover the underlying similarity structure among the involved tumor samples, but also led to the identification of transcription factors pertaining to the similarity structure when coupled with a motif-scanning analysis.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chathura J. Gunasekara ◽  
Eilis Hannon ◽  
Harry MacKay ◽  
Cristian Coarfa ◽  
Andrew McQuillin ◽  
...  

AbstractEpigenetic dysregulation is thought to contribute to the etiology of schizophrenia (SZ), but the cell type-specificity of DNA methylation makes population-based epigenetic studies of SZ challenging. To train an SZ case–control classifier based on DNA methylation in blood, therefore, we focused on human genomic regions of systemic interindividual epigenetic variation (CoRSIVs), a subset of which are represented on the Illumina Human Methylation 450K (HM450) array. HM450 DNA methylation data on whole blood of 414 SZ cases and 433 non-psychiatric controls were used as training data for a classification algorithm with built-in feature selection, sparse partial least squares discriminate analysis (SPLS-DA); application of SPLS-DA to HM450 data has not been previously reported. Using the first two SPLS-DA dimensions we calculated a “risk distance” to identify individuals with the highest probability of SZ. The model was then evaluated on an independent HM450 data set on 353 SZ cases and 322 non-psychiatric controls. Our CoRSIV-based model classified 303 individuals as cases with a positive predictive value (PPV) of 80%, far surpassing the performance of a model based on polygenic risk score (PRS). Importantly, risk distance (based on CoRSIV methylation) was not associated with medication use, arguing against reverse causality. Risk distance and PRS were positively correlated (Pearson r = 0.28, P = 1.28 × 10−12), and mediational analysis suggested that genetic effects on SZ are partially mediated by altered methylation at CoRSIVs. Our results indicate two innate dimensions of SZ risk: one based on genetic, and the other on systemic epigenetic variants.


2021 ◽  
Vol 14 (3) ◽  
pp. 99
Author(s):  
Marc Peter Radke ◽  
Manuel Rupprecht

In this paper, we present a newly generated data set on real returns of households’ aggregated asset holdings, which adds additional and more sophisticated information to existing relevant datasets in the literature. To do this, we draw on various datasets from public and private sources and then transform and combine them in a consistent manner that allows for international comparative and intertemporal analyses. Based on this, we address two current debates on the development of household wealth in the euro area that have been triggered by the low-interest environment. The first debate refers to the development of real yields on household wealth from 2000 to 2018, whereas the second debate deals with the mean-variance efficiency of household portfolios. Contrary to widespread belief, we find that yields on total wealth, which were largely dominated by non-financial assets’ yields, were mostly positive, although they exhibit a declining trend. Moreover, on average, overall real yields were significantly lower after 2008. Referring to portfolio efficiency, we find that current portfolios seem to be comparatively close to mean-variance efficiency. If households were to optimize their portfolios despite limited room for improvement, holdings of equity and investment fund shares should be reduced, contradicting common recommendations of financial advisors.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Zicheng Zhang ◽  
Congcong Yan ◽  
Ke Li ◽  
Siqi Bao ◽  
Lei Li ◽  
...  

AbstractThe emerging field of long noncoding RNA (lncRNA)-immunity has provided a new perspective on cancer immunity and immunotherapies. The lncRNA modifiers of infiltrating immune cells in the tumor immune microenvironment (TIME) and their impact on tumor behavior and disease prognosis remain largely uncharacterized. In the present study, a systems immunology framework integrating the noncoding transcriptome and immunogenomics profiles of 9549 tumor samples across 30 solid cancer types was used, and 36 lncRNAs were identified as modifier candidates underlying immune cell infiltration in the TIME at the pan-cancer level. These TIME lncRNA modifiers (TIL-lncRNAs) were able to subclassify various tumors into three de novo pan-cancer subtypes characterized by distinct immunological features, biological behaviors, and disease prognoses. Finally, a TIL-lncRNA-derived immune state index (TISI) that was reflective of immunological and oncogenic states but also predictive of patients’ prognosis was proposed. Furthermore, the TISI provided additional prognostic value for existing tumor immunological and molecular subtypes. By applying the TISI to tumors from different clinical immunotherapy cohorts, the TISI was found to be significantly negatively correlated with immune-checkpoint genes and to have the ability to predict the effectiveness of immunotherapy. In conclusion, the present study provided comprehensive resources and insights for future functional and mechanistic studies on lncRNA-mediated cancer immunity and highlighted the potential of the clinical application of lncRNA-based immunotherapeutic strategies in precision immunotherapy.


Cells ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 45
Author(s):  
Darío Rocha ◽  
Iris A. García ◽  
Aldana González Montoro ◽  
Andrea Llera ◽  
Laura Prato ◽  
...  

Studying tissue-independent components of cancer and defining pan-cancer subtypes could be addressed using tissue-specific molecular signatures if classification errors are controlled. Since PAM50 is a well-known, United States Food and Drug Administration (FDA)-approved and commercially available breast cancer signature, we applied it with uncertainty assessment to classify tumor samples from over 33 cancer types, discarded unassigned samples, and studied the emerging tumor-agnostic molecular patterns. The percentage of unassigned samples ranged between 55.5% and 86.9% in non-breast tissues, and gene set analysis suggested that the remaining samples could be grouped into two classes (named C1 and C2) regardless of the tissue. The C2 class was more dedifferentiated, more proliferative, with higher centrosome amplification, and potentially more TP53 and RB1 mutations. We identified 28 gene sets and 95 genes mainly associated with cell-cycle progression, cell-cycle checkpoints, and DNA damage that were consistently exacerbated in the C2 class. In some cancer types, the C1/C2 classification was associated with survival and drug sensitivity, and modulated the prognostic meaning of the immune infiltrate. Our results suggest that PAM50 could be repurposed for a pan-cancer context when paired with uncertainty assessment, resulting in two classes with molecular, biological, and clinical implications.


2018 ◽  
Author(s):  
Allison A. Regier ◽  
Yossi Farjoun ◽  
David Larson ◽  
Olga Krasheninina ◽  
Hyun Min Kang ◽  
...  

AbstractHundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years to interrogate a broad range of traits, across diverse populations. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power for trait mapping, and will enable studies of genome biology, population genetics and genome function at unprecedented scale. A central challenge for joint analysis is that different WGS data processing and analysis pipelines cause substantial batch effects in combined datasets, necessitating computationally expensive reprocessing and harmonization prior to variant calling. This approach is no longer tenable given the scale of current studies and data volumes. Here, in a collaboration across multiple genome centers and NIH programs, we define WGS data processing standards that allow different groups to produce “functionally equivalent” (FE) results suitable for joint variant calling with minimal batch effects. Our approach promotes broad harmonization of upstream data processing steps, while allowing for diverse variant callers. Importantly, it allows each group to continue innovating on data processing pipelines, as long as results remain compatible. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results – including single nucleotide (SNV), insertion/deletion (indel) and structural variation (SV) – and produce significantly less variability than sequencing replicates. Residual inter-pipeline variability is concentrated at low quality sites and repetitive genomic regions prone to stochastic effects. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for broad data sharing and community-wide “big-data” human genetics studies.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10681
Author(s):  
Jake Dickinson ◽  
Marcel de Matas ◽  
Paul A. Dickinson ◽  
Hitesh B. Mistry

Purpose To assess whether a model-based analysis increased statistical power over an analysis of final day volumes and provide insights into more efficient patient derived xenograft (PDX) study designs. Methods Tumour xenograft time-series data was extracted from a public PDX drug treatment database. For all 2-arm studies the percent tumour growth inhibition (TGI) at day 14, 21 and 28 was calculated. Treatment effect was analysed using an un-paired, two-tailed t-test (empirical) and a model-based analysis, likelihood ratio-test (LRT). In addition, a simulation study was performed to assess the difference in power between the two data-analysis approaches for PDX or standard cell-line derived xenografts (CDX). Results The model-based analysis had greater statistical power than the empirical approach within the PDX data-set. The model-based approach was able to detect TGI values as low as 25% whereas the empirical approach required at least 50% TGI. The simulation study confirmed the findings and highlighted that CDX studies require fewer animals than PDX studies which show the equivalent level of TGI. Conclusions The study conducted adds to the growing literature which has shown that a model-based analysis of xenograft data improves statistical power over the common empirical approach. The analysis conducted showed that a model-based approach, based on the first mathematical model of tumour growth, was able to detect smaller size of effect compared to the empirical approach which is common of such studies. A model-based analysis should allow studies to reduce animal use and experiment length providing effective insights into compound anti-tumour activity.


2019 ◽  
Author(s):  
John A. Lees ◽  
T. Tien Mai ◽  
Marco Galardini ◽  
Nicole E. Wheeler ◽  
Jukka Corander

ABSTRACTDiscovery of influential genetic variants and prediction of phenotypes such as antibiotic resistance are becoming routine tasks in bacterial genomics. Genome-wide association study (GWAS) methods can be applied to study bacterial populations, with a particular emphasis on alignment-free approaches, which are necessitated by the more plastic nature of bacterial genomes. Here we advance bacterial GWAS by introducing a computationally scalable joint modeling framework, where genetic variants covering the entire pangenome are compactly represented by unitigs, and the model fitting is achieved using elastic net penalization. In contrast to current leading GWAS approaches, which test each genotype-phenotype association separately for each variant, our joint modelling approach is shown to lead to increased statistical power while maintaining control of the false positive rate. Our inference procedure also delivers an estimate of the narrow-sense heritability, which is gaining considerable interest in studies of bacteria. Using an extensive set of state-of-the-art bacterial population genomic datasets we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. We expect that these advances will pave the way for the next generation of high-powered association and prediction studies for an increasing number of bacterial species.


Electroencephalogram (EEG) is one of the most commonly used tools for epilepsy detection. In this paper we have presented two methods for the diagnosis of epilepsy using machine learning techniques.EEG waveforms have five different kinds of frequency bands. Out of which only two namely theta and gamma bands carry epileptic seizure information. Our model determines the statistical features like mean, variance, maximum, minimum, kurtosis, and skewness from the raw data set. This reduces the mathematical complexities and time consumption of the feature extraction method. It then uses a Logistic regression model and decision tree model to classify whether a person is epileptic or not. After the implementation of the machine learning models, parameters like accuracy, sensitivity, and recall have been found. The results for the same are analyzed in detail in this paper. Epileptic seizures cause severe damage to the brain which affects the health of a person. Our key objective from this paper is to help in the early prediction and detection of epilepsy so that preventive interventions can be provided and precautionary measures are taken to prevent the patient from suffering any severe damage


PEDIATRICS ◽  
1989 ◽  
Vol 83 (4) ◽  
pp. 634-634
Author(s):  
JOHN S. LOVERING

Dr. Mauro is obviously knowledgeable in the area of statistical analysis and raises a valid point regarding the importance of evaluating the likelihood of a type II error in studies with negative results. Although one does not wish to detract from the main point of a study with extensive details of the statistical analysis (two pages in this case), some readers may desire more mathematical information than values of mean, variance, t, and P, and do not wish to make their own calculations, to reassure themselves that a reasonable conclusion has been drawn by the authors and their statisticians.


2020 ◽  
pp. 609-623
Author(s):  
Arun Kumar Beerala ◽  
Gobinath R. ◽  
Shyamala G. ◽  
Siribommala Manvitha

Water is the most valuable natural resource for all living things and the ecosystem. The quality of groundwater is changed due to change in ecosystem, industrialisation, and urbanisation, etc. In the study, 60 samples were taken and analysed for various physio-chemical parameters. The sampling locations were located using global positioning system (GPS) and were taken for two consecutive years for two different seasons, monsoon (Nov-Dec) and post-monsoon (Jan-Mar). In 2016-2017 and 2017-2018 pH, EC, and TDS were obtained in the field. Hardness and Chloride are determined using titration method. Nitrate and Sulphate were determined using Spectrophotometer. Machine learning techniques were used to train the data set and to predict the unknown values. The dominant elements of groundwater are as follows: Ca2, Mg2 for cation and Cl-, SO42, NO3− for anions. The regression value for the training data set was found to be 0.90596, and for the entire network, it was found to be 0.81729. The best performance was observed as 0.0022605 at epoch 223.


Sign in / Sign up

Export Citation Format

Share Document