Improved Probabilistic Context-Free Grammars for Passwords Using Word Extraction

Author(s):  
Haibo Cheng ◽  
Wenting Li ◽  
Ping Wang ◽  
Kaitai Liang
2007 ◽  
Vol 33 (4) ◽  
pp. 477-491 ◽  
Author(s):  
Noah A. Smith ◽  
Mark Johnson

This article studies the relationship between weighted context-free grammars (WCFGs), where each production is associated with a positive real-valued weight, and probabilistic context-free grammars (PCFGs), where the weights of the productions associated with a nonterminal are constrained to sum to one. Because the class of WCFGs properly includes the PCFGs, one might expect that WCFGs can describe distributions that PCFGs cannot. However, Z. Chi (1999, Computational Linguistics, 25(1):131–160) and S. P. Abney, D. A. McAllester, and P. Pereira (1999, In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 542–549, College Park, MD) proved that every WCFG distribution is equivalent to some PCFG distribution. We extend their results to conditional distributions, and show that every WCFG conditional distribution of parses given strings is also the conditional distribution defined by some PCFG, even when the WCFG's partition function diverges. This shows that any parsing or labeling accuracy improvement from conditional estimation of WCFGs or conditional random fields (CRFs) over joint estimation of PCFGs or hidden Markov models (HMMs) is due to the estimation procedure rather than the change in model class, because PCFGs and HMMs are exactly as expressive as WCFGs and chain-structured CRFs, respectively.


1983 ◽  
Vol 6 (2) ◽  
pp. 403-407 ◽  
Author(s):  
R. Chaudhuri ◽  
A. N. V. Rao

It is proved that for a probabilistic context-free languageL(G), the population density of a character (terminal symbol) is equal to its relative density in the words of a sampleSfromL(G)whenever the production probabilities of the grammarGare estimated by the relative frequencies of the corresponding productions in the sample.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6559 ◽  
Author(s):  
Witold Dyrka ◽  
Mateusz Pyzik ◽  
François Coste ◽  
Hugo Talibart

Interactions between amino acids that are close in the spatial structure, but not necessarily in the sequence, play important structural and functional roles in proteins. These non-local interactions ought to be taken into account when modeling collections of proteins. Yet the most popular representations of sets of related protein sequences remain the profile Hidden Markov Models. By modeling independently the distributions of the conserved columns from an underlying multiple sequence alignment of the proteins, these models are unable to capture dependencies between the protein residues. Non-local interactions can be represented by using more expressive grammatical models. However, learning such grammars is difficult. In this work, we propose to use information on protein contacts to facilitate the training of probabilistic context-free grammars representing families of protein sequences. We develop the theory behind the introduction of contact constraints in maximum-likelihood and contrastive estimation schemes and implement it in a machine learning framework for protein grammars. The proposed framework is tested on samples of protein motifs in comparison with learning without contact constraints. The evaluation shows high fidelity of grammatical descriptors to protein structures and improved precision in recognizing sequences. Finally, we present an example of using our method in a practical setting and demonstrate its potential beyond the current state of the art by creating a grammatical model of a meta-family of protein motifs. We conclude that the current piece of research is a significant step towards more flexible and accurate modeling of collections of protein sequences. The software package is made available to the community.


Author(s):  
Ayesha Khatun ◽  
Khadiza Tul Kobra Happy ◽  
Babe Sultana ◽  
Jahidul Islam ◽  
Sumaiya Kabir

The parsing technique based on associate grammar rules as well as probability is called stochastic parsing. This paper suggested a probabilistic method to eliminate the uncertainty from the sentences of Bangla. The technique of Binarization is applied to increase the precision of the parsing. CYK algorithm is used in this paper. The work mainly focused on intonation-based sentences, for these reasons PCFGs (Probabilistic Context-Free Grammars) is based on proposed. About 30324 words are used to test the proposed system; average 93% accuracy is achieved. GUB JOURNAL OF SCIENCE AND ENGINEERING, Vol 7, Dec 2020 P 51-56


Sign in / Sign up

Export Citation Format

Share Document