A fast method for statistical grammar induction

1998 ◽  
Vol 4 (3) ◽  
pp. 191-209 ◽  
Author(s):  
WIDE R. HOGENHOUT ◽  
YUJI MATSUMOTO

The statistical induction of stochastic context free grammars from bracketed corpora with the Inside Outside Algorithm is an appealing method for grammar learning, but the computational complexity of this algorithm has made it impossible to generate a large scale grammar. Researchers from natural language processing and speech recognition have suggested various methods to reduce the computational complexity and, at the same time, guide the learning algorithm towards a solution by, for example, placing constraints on the grammar. We suggest a method that strongly reduces that computational cost of the algorithm without placing constraints on the grammar. This method can in principle be combined with any of the constraints on grammars that have been suggested in earlier studies. We show that it is feasible to achieve results equivalent to earlier research, but with much lower computational effort. After creating a small grammar, the grammar is incrementally increased while rules that have become obsolete are removed at the same time. We explain the modifications to the algorithm, give results of experiments and compare these to results reported in other publications.

2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Jengnan Tzeng

The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high- dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.


In Service Oriented Architecture (SOA) web services plays important role. Web services are web application components that can be published, found, and used on the Web. Also machine-to-machine communication over a network can be achieved through web services. Cloud computing and distributed computing brings lot of web services into WWW. Web service composition is the process of combing two or more web services to together to satisfy the user requirements. Tremendous increase in the number of services and the complexity in user requirement specification make web service composition as challenging task. The automated service composition is a technique in which Web Service Composition can be done automatically with minimal or no human intervention. In this paper we propose a approach of web service composition methods for large scale environment by considering the QoS Parameters. We have used stacked autoencoders to learn features of web services. Recurrent Neural Network (RNN) leverages uses the learned features to predict the new composition. Experiment results show the efficiency and scalability. Use of deep learning algorithm in web service composition, leads to high success rate and less computational cost.


2007 ◽  
Vol 07 (02) ◽  
pp. 303-320
Author(s):  
MOHAMED ALI BEN AYED ◽  
AMINE SAMET ◽  
NOURI MASMOUDI

A merging procedure joining search pattern and variable block size motion estimation for H.264/AVC is proposed in this paper. The principal purpose of the proposed methods is the reduction of the computational complexity for block matching module. In fact, there are numerous contributions in the literature aiming the reduction of the computational cost needed for motion estimation. The best solution from a qualitative point of view is the full search that considers every possible detail. The computational effort required is enormous and this makes motion estimation by far the most important computational bottleneck in video coding systems. Our approach invests and exploits the center-biased characteristics of the real world video sequences, aiming to achieve an acceptable image quality while independently targeting the reduction of the computational complexity. The simulations results demonstrated that the proposal performs well.


1983 ◽  
Vol 105 (2) ◽  
pp. 242-248
Author(s):  
K. P. Lam

The optimal layout problem of allocating different types of rectangular shapes to a large rectangular sheet (also referred to as the two-dimensional knapsack problem) is tackled by a hierarchical approach using the concepts of quad-cut, guillotine-cut, and edge-cut with variable window sizes. The method can handle sheet defects and also allows for the specification of important pieces at a fixed or variable location. In addition, the hierarchical approach has the flexibility of generating different layout patterns with little computational effort once the knapsack function for the largest window has been obtained. Although the method is suboptimal in the sense that it may not achieve the best possible result with minimum waste, extensive simulation indicates that it always gives good alternative solutions at reasonable computational cost; this is in contrast with the optimal solution for large-scale problems which often requires excessive computational effort beyond practical consideration.


2010 ◽  
Vol 132 (3) ◽  
Author(s):  
F. Wei ◽  
G. T. Zheng

Direct time integration methods are usually applied to determine the dynamic response of systems with local nonlinearities. Nevertheless, these methods are computationally expensive to predict the steady state response. To significantly reduce the computational effort, a new approach is proposed for the multiharmonic response analysis of dynamical systems with local nonlinearities. The approach is based on the describing function (DF) method and linear receptance data. With the DF method, the kinetic equations are converted into a set of complex algebraic equations. By using the linear receptance data, the dimension of the complex algebraic equations, which should be solved iteratively, are only related to nonlinear degrees of freedom (DOFs). A cantilever beam with a local nonlinear element is presented to show the procedure and performance of the proposed approach. The approach can greatly reduce the size and computational cost of the problem. Thus, it can be applicable to large-scale systems with local nonlinearities.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
Jaesung Lee ◽  
Dae-Won Kim

The data-driven management of real-life systems based on a trained model, which in turn is based on the data gathered from its daily usage, has attracted a lot of attention because it realizes scalable control for large-scale and complex systems. To obtain a model within an acceptable computational cost that is restricted by practical constraints, the learning algorithm may need to identify essential data that carries important knowledge on the relation between the observed features representing the measurement value and labels encoding the multiple target concepts. This results in an increased computational burden owing to the concurrent learning of multiple labels. A straightforward approach to address this issue is feature selection; however, it may be insufficient to satisfy the practical constraints because the computational cost for feature selection can be impractical when the number of labels is large. In this study, we propose an efficient multilabel feature selection method to achieve scalable multilabel learning when the number of labels is large. The empirical experiments on several multilabel datasets show that the multilabel learning process can be boosted without deteriorating the discriminating power of the multilabel classifier.


2021 ◽  
Author(s):  
Min Chen

Abstract Deep learning (DL) techniques, more specifically Convolutional Neural Networks (CNNs), have become increasingly popular in advancing the field of data science and have had great successes in a wide array of applications including computer vision, speech, natural language processing and etc. However, the training process of CNNs is computationally intensive and high computational cost, especially when the dataset is huge. To overcome these obstacles, this paper takes advantage of distributed frameworks and cloud computing to develop a parallel CNN algorithm. MapReduce is a scalable and fault-tolerant data processing tool that was developed to provide significant improvements in large-scale data-intensive applications in clusters. A MapReduce-based CNN (MCNN) is developed in this work to tackle the task of image classification. In addition, the proposed MCNN adopted the idea of adding dropout layers in the networks to tackle the overfitting problem. Close examination of the implementation of MCNN as well as how the proposed algorithm accelerates learning are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2451 ◽  
Author(s):  
Jin Wang ◽  
Yangning Tang ◽  
Shiming He ◽  
Changqing Zhao ◽  
Pradip Kumar Sharma ◽  
...  

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.


2011 ◽  
Vol 19 (4) ◽  
pp. 525-560 ◽  
Author(s):  
Rajan Filomeno Coelho ◽  
Philippe Bouillard

This paper addresses continuous optimization problems with multiple objectives and parameter uncertainty defined by probability distributions. First, a reliability-based formulation is proposed, defining the nondeterministic Pareto set as the minimal solutions such that user-defined probabilities of nondominance and constraint satisfaction are guaranteed. The formulation can be incorporated with minor modifications in a multiobjective evolutionary algorithm (here: the nondominated sorting genetic algorithm-II). Then, in the perspective of applying the method to large-scale structural engineering problems—for which the computational effort devoted to the optimization algorithm itself is negligible in comparison with the simulation—the second part of the study is concerned with the need to reduce the number of function evaluations while avoiding modification of the simulation code. Therefore, nonintrusive stochastic metamodels are developed in two steps. First, for a given sampling of the deterministic variables, a preliminary decomposition of the random responses (objectives and constraints) is performed through polynomial chaos expansion (PCE), allowing a representation of the responses by a limited set of coefficients. Then, a metamodel is carried out by kriging interpolation of the PCE coefficients with respect to the deterministic variables. The method has been tested successfully on seven analytical test cases and on the 10-bar truss benchmark, demonstrating the potential of the proposed approach to provide reliability-based Pareto solutions at a reasonable computational cost.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-27
Author(s):  
Meng Liu ◽  
Hongsheng Hu ◽  
Haolong Xiang ◽  
Chi Yang ◽  
Lingjuan Lyu ◽  
...  

Recently, biometric identification has been extensively used for border control. Some face recognition systems have been designed based on Internet of Things. But the rich personal information contained in face images can cause severe privacy breach and abuse issues during the process of identification if a biometric system has compromised by insiders or external security attacks. Encrypting the query face image is the state-of-the-art solution to protect an individual’s privacy but incurs huge computational cost and poses a big challenge on time-critical identification applications. However, due to their high computational complexity, existing methods fail to handle large-scale biometric repositories where a target face is searched. In this article, we propose an efficient privacy-preserving face recognition scheme based on clustering. Concretely, our approach innovatively matches an encrypted face query against clustered faces in the repository to save computational cost while guaranteeing identification accuracy via a novel multi-matching scheme. To the best of our knowledge, our scheme is the first to reduce the computational complexity from O(M) in existing methods to approximate O (√ M ), where M is the size of a face repository. Extensive experiments on real-world datasets have shown the effectiveness and efficiency of our scheme.


Sign in / Sign up

Export Citation Format

Share Document