Frequent Closed Partial Orders Mining in Sequences

2013 ◽  
Vol 846-847 ◽  
pp. 1304-1307
Author(s):  
Ye Wang ◽  
Yan Jia ◽  
Lu Min Zhang

Mining partial orders from sequence data is an important data mining task with broad applications. As partial orders mining is a NP-hard problem, many efficient pruning algorithm have been proposed. In this paper, we improve a classical algorithm of discovering frequent closed partial orders from string. For general sequences, we consider items appearing together having equal chance to calculate the detecting matrix used for pruning. Experimental evaluations from a real data set show that our algorithm can effectively mine FCPO from sequences.

2012 ◽  
Vol 10 (01) ◽  
pp. 1240008 ◽  
Author(s):  
Sylvia Boyd ◽  
Maryam Haghighi

We provide a computationally realistic mathematical framework for the NP-hard problem of the multichromosomal breakpoint median for linear genomes that can be used in constructing phylogenies. A novel approach is provided that can handle signed, unsigned, and partially signed cases of the multichromosomal breakpoint median problem. Our method provides an avenue for incorporating biological assumptions (whenever available) such as the number of chromosomes in the ancestor, and thus it can be tailored to obtain a more biologically relevant picture of the median. We demonstrate the usefulness of our method by performing an empirical study on both simulated and real data with a comparison to other methods.


2012 ◽  
Vol 2012 ◽  
pp. 1-17
Author(s):  
Mingmin Zhu ◽  
Sanyang Liu

Learning Bayesian network (BN) structure from data is a typical NP-hard problem. But almost existing algorithms have the very high complexity when the number of variables is large. In order to solve this problem(s), we present an algorithm that integrates with a decomposition-based approach and a scoring-function-based approach for learning BN structures. Firstly, the proposed algorithm decomposes the moral graph of BN into its maximal prime subgraphs. Then it orientates the local edges in each subgraph by the K2-scoring greedy searching. The last step is combining directed subgraphs to obtain final BN structure. The theoretical and experimental results show that our algorithm can efficiently and accurately identify complex network structures from small data set.


2003 ◽  
Vol 13 (05) ◽  
pp. 1263-1274 ◽  
Author(s):  
TOMOMICHI NAKAMURA ◽  
KEVIN JUDD ◽  
ALISTAIR MEES

Many models of the dynamics of nonlinear time series have large numbers of parameters and tend to overfit. This paper discusses algorithms for selecting the best basis functions from a dictionary for a model of a time series. Selecting the optimal subset of basis functions is typically an NP-hard problem which usually has to be solved by heuristic methods. In this paper, we propose a new heuristic that is a refinement of a previous one. We demonstrate with applications to artificial and real data. The results indicate that the method proposed in this paper is able to obtain better models in most cases.


2019 ◽  
Author(s):  
Boris Almonacid

The problem of cell formation is an NP-Hard problem, which consists of organising a group of machines and pieces in several cells. The machines are arranged in a fixed way inside the cells, and each machine has some manufacturing operation that applies in different pieces or parts. The idea of the problem is to be able to minimise the movements made by the pieces to reach the machines in the cells. For this problem, a data set has been organised using three manufacturing cells. Through the data set an experiment has been carried out that focuses on obtaining the best solution using a global search solution within 6 days for each instance. The experimental results have been able to obtain the general optimum value for a set of test instances.


2019 ◽  
Author(s):  
Boris Almonacid

The problem of cell formation is an NP-Hard problem, which consists of organising a group of machines and pieces in several cells. The machines are arranged in a fixed way inside the cells, and each machine has some manufacturing operation that applies in different pieces or parts. The idea of the problem is to be able to minimise the movements made by the pieces to reach the machines in the cells. For this problem, a data set has been organised using three manufacturing cells. Through the data set an experiment has been carried out that focuses on obtaining the best solution using a global search solution within 6 days for each instance. The experimental results have been able to obtain the general optimum value for a set of test instances.


2018 ◽  
Author(s):  
Nicola F. Müller ◽  
Huw A. Ogilvie ◽  
Chi Zhang ◽  
Alexei Drummond ◽  
Tanja Stadler

AbstractWhen populations become isolated, members of these populations can diverge genetically over time. This leads to genetic differences between individuals of these populations that increase over time if the isolation persists. This process can be counteracted when genes are exchanged between populations. In order to study the speciation processes when gene flow is present, isolation-with-migration methods have been developed. These methods typically assume that the ranked topology of the species history is already known. However, this is often not the case and the species tree is therefore of interest itself. To infer it is currently only possible when assuming no gene flow. This assumption can lead to wrongly inferred speciation times and species tree topologies.Building on a recently introduced structured coalescent approach, we introduce a new method that allows inference of the species tree while explicitly modelling the flow of genes between coexisting species. By using Markov chain Monte Carlo sampling, we co-infer the species tree alongside evolutionary parameters of interest. By using simulations, we show that our newly introduced approach is able to reliably infer the species trees and parameters of the isolation-with-migration model from genetic sequence data. We then infer the species history of six great ape species including gene flow after population isolation. By using this dataset, we are able to show that our new methods is able to infer the correct species tree not only on simulated but also on a real data set where the species history has already been well studied. In line with previous results, we find some support for some gene flow between bonobos and common chimpanzees.


2019 ◽  
Vol XVI (2) ◽  
pp. 1-11
Author(s):  
Farrukh Jamal ◽  
Hesham Mohammed Reyad ◽  
Soha Othman Ahmed ◽  
Muhammad Akbar Ali Shah ◽  
Emrah Altun

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2019 ◽  
Vol 14 (2) ◽  
pp. 148-156
Author(s):  
Nighat Noureen ◽  
Sahar Fazal ◽  
Muhammad Abdul Qadir ◽  
Muhammad Tanvir Afzal

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.


2021 ◽  
Vol 13 (9) ◽  
pp. 1703
Author(s):  
He Yan ◽  
Chao Chen ◽  
Guodong Jin ◽  
Jindong Zhang ◽  
Xudong Wang ◽  
...  

The traditional method of constant false-alarm rate detection is based on the assumption of an echo statistical model. The target recognition accuracy rate and the high false-alarm rate under the background of sea clutter and other interferences are very low. Therefore, computer vision technology is widely discussed to improve the detection performance. However, the majority of studies have focused on the synthetic aperture radar because of its high resolution. For the defense radar, the detection performance is not satisfactory because of its low resolution. To this end, we herein propose a novel target detection method for the coastal defense radar based on faster region-based convolutional neural network (Faster R-CNN). The main processing steps are as follows: (1) the Faster R-CNN is selected as the sea-surface target detector because of its high target detection accuracy; (2) a modified Faster R-CNN based on the characteristics of sparsity and small target size in the data set is employed; and (3) soft non-maximum suppression is exploited to eliminate the possible overlapped detection boxes. Furthermore, detailed comparative experiments based on a real data set of coastal defense radar are performed. The mean average precision of the proposed method is improved by 10.86% compared with that of the original Faster R-CNN.


Sign in / Sign up

Export Citation Format

Share Document