DNA methylation alterations have similar patterns in normal aging tissue and in cancer. In this study, we investigated breast tissue-specific age-related DNA methylation alterations and used those methylation sites to identify individuals with outlier phenotypes. Outlier phenotype is identified by unsupervised anomaly detection algorithms and is defined by individuals who have normal tissue age-dependent DNA methylation levels that vary dramatically from the population mean.
We generated whole-genome DNA methylation profiles (GSE160233) on purified epithelial cells and used publicly available Infinium HumanMethylation 450K array datasets (TCGA, GSE88883, GSE69914, GSE101961, and GSE74214) for discovery and validation.
We found that hypermethylation in normal breast tissue is the best predictor of hypermethylation in cancer. Using unsupervised anomaly detection approaches, we found that about 10% of the individuals (39/427) were outliers for DNA methylation from 6 DNA methylation datasets. We also found that there were significantly more outlier samples in normal-adjacent to cancer (24/139, 17.3%) than in normal samples (15/228, 5.2%). Additionally, we found significant differences between the predicted ages based on DNA methylation and the chronological ages among outliers and not-outliers. Additionally, we found that accelerated outliers (older predicted age) were more frequent in normal-adjacent to cancer (14/17, 82%) compared to normal samples from individuals without cancer (3/17, 18%). Furthermore, in matched samples, we found that the epigenome of the outliers in the pre-malignant tissue was as severely altered as in cancer.
A subset of patients with breast cancer has severely altered epigenomes which are characterized by accelerated aging in their normal-appearing tissue. In the future, these DNA methylation sites should be studied further such as in cell-free DNA to determine their potential use as biomarkers for early detection of malignant transformation and preventive intervention in breast cancer.
We address the problem of unsupervised anomaly detection for multivariate data. Traditional machine learning based anomaly detection algorithms rely on specific assumptions of normal patterns and fail to model complex feature interactions and relations. Recently, existing deep learning based methods are promising for extracting representations from complex features. These methods train an auxiliary task, e.g., reconstruction and prediction, on normal samples. They further assume that anomalies fail to perform well on the auxiliary task since they are never trained during the model optimization. However, the assumption does not always hold in practice. Deep models may also perform the auxiliary task well on anomalous samples, leading to the failure detection of anomalies. To effectively detect anomalies for multivariate data, this paper introduces a teacher-student distillation based framework Distillated Teacher-Student Network Ensemble (DTSNE). The paradigm of the teacher-student distillation is able to deal with high-dimensional complex features. In addition, an ensemble of student networks provides a better capability to avoid generalizing the auxiliary task performance on anomalous samples. To validate the effectiveness of our model, we conduct extensive experiments on real-world datasets. Experimental results show superior performance of DTSNE over competing methods. Analysis and discussion towards the behavior of our model are also provided in the experiment section.