Multiple change-points detection in high dimension

2019 ◽  
Vol 08 (04) ◽  
pp. 1950014 ◽  
Author(s):  
Yunlong Wang ◽  
Changliang Zou ◽  
Zhaojun Wang ◽  
Guosheng Yin

Change-point detection is an integral component of statistical modeling and estimation. For high-dimensional data, classical methods based on the Mahalanobis distance are typically inapplicable. We propose a novel testing statistic by combining a modified Euclidean distance and an extreme statistic, and its null distribution is asymptotically normal. The new method naturally strikes a balance between the detection abilities for both dense and sparse changes, which gives itself an edge to potentially outperform existing methods. Furthermore, the number of change-points is determined by a new Schwarz’s information criterion together with a pre-screening procedure, and the locations of the change-points can be estimated via the dynamic programming algorithm in conjunction with the intrinsic order structure of the objective function. Under some mild conditions, we show that the new method provides consistent estimation with an almost optimal rate. Simulation studies show that the proposed method has satisfactory performance of identifying multiple change-points in terms of power and estimation accuracy, and two real data examples are used for illustration.

Author(s):  
Naser Sina ◽  
Vahid Esfahanian ◽  
Mohammad Reza Hairi Yazdi

Plug-in hybrid electric buses are a viable solution to increase the fuel economy. In this framework, precise estimation of optimal state-of-charge trajectory along the upcoming driving cycle appears to play a pivotal role in the way to approach the globally optimal fuel economy. This paper aims to conduct a parametric study on the key factors affecting the estimation of optimal state-of-charge trajectory, including trip information availability and trip segment distance, and to provide a guideline for the design and implementation of predictive energy management systems. To accomplish this, the dynamic programming algorithm is employed to obtain the solution of optimal control problem for the sampled driving cycles in a particular bus route. A large database comprising of driving features of the cycles and the optimal solution is developed which then is used to construct a neural network based estimator for obtaining the optimal state-of-charge trajectory. The main results show promising performance of the proposed method with about 76% reduction in the root mean square error of the estimated trajectory comparing to the linear state-of-charge trajectory assumption. Moreover, the robustness of the estimator is verified through simulation and it is observed that appropriate choice of trip segment distance is vital to improve the estimation accuracy, especially in case of uncertain prediction of trip information.


2016 ◽  
Vol 14 (04) ◽  
pp. 1643001 ◽  
Author(s):  
Jin Li ◽  
Chengzhen Xu ◽  
Lei Wang ◽  
Hong Liang ◽  
Weixing Feng ◽  
...  

Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named “PSRna” for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC).


2016 ◽  
Author(s):  
Fengchao Yu ◽  
Ning Li ◽  
Weichuan Yu

AbstractIn computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.


2019 ◽  
Vol 47 (13) ◽  
pp. e77-e77
Author(s):  
Xinzhou Ge ◽  
Haowen Zhang ◽  
Lingjue Xie ◽  
Wei Vivian Li ◽  
Soo Bin Kwon ◽  
...  

AbstractThe availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.


Entropy ◽  
2019 ◽  
Vol 21 (3) ◽  
pp. 281
Author(s):  
Shinpei Imori ◽  
Hidetoshi Shimodaira

Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback–Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example.


2019 ◽  
Author(s):  
Dorcas Ofori-Boateng ◽  
Yulia R. Gel ◽  
Ivor Cribben

AbstractIdentifying change points and/or anomalies in dynamic network structures has become increasingly popular across various domains, from neuroscience to telecommunication to finance. One of the particular objectives of the anomaly detection task from the neuroscience perspective is the reconstruction of the dynamic manner of brain region interactions. However, most statistical methods for detecting anomalies have the following unrealistic limitation for brain studies and beyond: that is, network snapshots at different time points are assumed to be independent. To circumvent this limitation, we propose a distribution-free framework for anomaly detection in dynamic networks. First, we present each network snapshot of the data as a linear object and find its respective univariate characterization via local and global network topological summaries. Second, we adopt a change point detection method for (weakly) dependent time series based on efficient scores, and enhance the finite sample properties of change point method by approximating the asymptotic distribution of the test statistic using the sieve bootstrap. We apply our method to simulated and to real data, particularly, two functional magnetic resonance imaging (fMRI) data sets and the Enron communication graph. We find that our new method delivers impressively accurate and realistic results in terms of identifying locations of true change points compared to the results reported by competing approaches. The new method promises to offer a deeper insight into the large-scale characterizations and functional dynamics of the brain and, more generally, into intrinsic structure of complex dynamic networks.


2019 ◽  
Author(s):  
Xinzhou Ge ◽  
Haowen Zhang ◽  
Lingjue Xie ◽  
Wei Vivian Li ◽  
Soo Bin Kwon ◽  
...  

ABSTRACTThe availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the effcacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.


Sign in / Sign up

Export Citation Format

Share Document