Generating Dynamic Box by Using an Input String

Splittability of Bilexical Context-Free Grammars is Undecidable

Computational Linguistics ◽

10.1162/coli_a_00079 ◽

2011 ◽

Vol 37 (4) ◽

pp. 867-879

Author(s):

Mark-Jan Nederhof ◽

Giorgio Satta

Keyword(s):

Dynamic Programming ◽

Natural Language ◽

Input String ◽

Running Time ◽

Natural Language Parsing ◽

Central Interest ◽

The Right ◽

Programming Algorithms ◽

Context Free ◽

Context Free Grammars

Bilexical context-free grammars (2-LCFGs) have proved to be accurate models for statistical natural language parsing. Existing dynamic programming algorithms used to parse sentences under these models have running time of O(∣w∣4), where w is the input string. A 2-LCFG is splittable if the left arguments of a lexical head are always independent of the right arguments, and vice versa. When a 2-LCFGs is splittable, parsing time can be asymptotically improved to O(∣w∣3). Testing this property is therefore of central interest to parsing efficiency. In this article, however, we show the negative result that splittability of 2-LCFGs is undecidable.

Download Full-text

WORD SEGMENTATION OF OUTPUT RESPONSE FOR SIGN LANGUAGE DEVICES

IIUM Engineering Journal ◽

10.31436/iiumej.v21i2.1408 ◽

2020 ◽

Vol 21 (2) ◽

pp. 153-163

Author(s):

Nor Farahidah Za'bah ◽

Ahmad Amierul Ashraf Muhammad Nazmi ◽

Amelia Wong Azman

Keyword(s):

Dynamic Programming ◽

Sign Language ◽

English Language ◽

Word Segmentation ◽

Input String ◽

Text Segmentation ◽

Segmentation Method ◽

Text Input ◽

Acceptable Accuracy ◽

Language Text

Segmentation is an important aspect of translating finger spelling of sign language into Latin alphabets. Although the sign language devices that are currently available can translate the finger spelling into alphabets, there is a limitation where the output is stored in a long continuous string without spaces between words. The system proposed in this work is meant to be used together with a text-generating glove device. The system used text input string and the string is then fed into the system, one character at a time, and then it is segmented into words that is semantically correct. The proposed text segmentation method in this work is by using the dynamic programming and back-off algorithm, together with the probability score using word matching with an English language text corpus. Based on the results, the system is able to properly segment words with acceptable accuracy. ABSTRAK: Segmentasi adalah aspek penting dalam menterjemahkan ejaan bahasa isyarat ke dalam huruf Latin. Walaupun terdapat peranti bahasa isyarat yang menterjemahkan ejaan jari menjadi huruf, namun begitu, huruf-huruf yang dihasilkan disimpan dalam rentetan berterusan yang panjang tanpa jarak antara setiap perkataan. Sistem yang dicadangkan di dalam jurnal ini akan diselaraskan bersama dengan sarung tangan bahasa isyarat yang boleh menghasilkan teks. Sistem ini akan mengambil rentetan input teks di mana huruf akan dimasukkan satu persatu dan huruf-huruf itu akan disegmentasikan menjadi perkataan yang betul secara semantik. Kaedah pembahagian yang dicadangkan ialah segmentasi yang menggunakan pengaturcaraan dinamik dan kaedah kebarangkalian untuk mengsegmentasikan huruf-huruf tersebut berdasarkan padanan perkataan dengan pengkalan data di dalam Bahasa Inggeris. Berdasarkan hasil yang telah diperolehi, sistem ini berjaya mengsegmentasikan huruf-huruf tersebut dengan berkesan dan tepat.

Download Full-text

The Design of a Verified Derivative-Based Parsing Tool for Regular Expressions

CLEI electronic journal ◽

10.19153/cleiej.24.3.2 ◽

2021 ◽

Vol 24 (3) ◽

Author(s):

Elton Cardoso ◽

Maycon Amaro ◽

Samuel Feitosa ◽

Leonardo Reis ◽

André Du Bois ◽

...

Keyword(s):

Regular Expression ◽

Input String ◽

Regular Expressions

We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.

Download Full-text

Design of Adaptive Compression Algorithm Elias Delta Code and Huffman

10.31227/osf.io/qbkhn ◽

2018 ◽

Author(s):

Andysah Putera Utama Siahaan

Keyword(s):

Data Compression ◽

Binary Tree ◽

Research Data ◽

Input String ◽

Binary Trees ◽

Text File ◽

Compression Process ◽

Storage Media ◽

Huffman Algorithm ◽

Adaptive Compression

Compression aims to reduce data before storing or moving it into storage media. Huffman and Elias Delta Code are two algorithms used for the compression process in this research. Data compression with both algorithms is used to compress text files. These two algorithms have the same way of working. It starts by sorting characters based on their frequency, binary tree formation and ends with code formation. In the Huffman algorithm, binary trees are formed from leaves to roots and are called tree-forming from the bottom up. In contrast, the Elias Delta Code method has a different technique. Text file compression is done by reading the input string in a text file and encoding the string using both algorithms. The compression results state that the Huffman algorithm is better overall than Elias Delta Code.

Download Full-text

Efficient repeat finding in sets of strings via suffix arrays

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.597 ◽

2013 ◽

Vol Vol. 15 no. 2 (Discrete Algorithms) ◽

Author(s):

Pablo Barenbaum ◽

Verónica Becher ◽

Alejandro Deymonnaz ◽

Melisa Halsband ◽

Pablo Ariel Heiber

Keyword(s):

Suffix Array ◽

Input String ◽

Experimental Results ◽

Suffix Arrays ◽

Input Size ◽

Discrete Algorithms ◽

International Audience

Discrete Algorithms International audience We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.

Download Full-text

Domain modelling in Optimality Theory: Morphophonological cyclicity vs. stepwise prosodic parsing

Journal of Linguistics ◽

10.1017/s0022226719000082 ◽

2019 ◽

Vol 56 (1) ◽

pp. 3-43

Author(s):

KAROLINA BROŚ

Keyword(s):

Optimality Theory ◽

Morphological Structure ◽

Input String ◽

Domain Specificity ◽

Prosodic Structure ◽

Phonological Processes ◽

Structure Building ◽

Harmonic Serialism ◽

Chilean Spanish ◽

Insight Into

This paper examines opaque examples of phrase-level phonology taken from Chilean Spanish under the framework of Stratal Optimality Theory (OT) (Rubach 1997; Bermúdez-Otero 2003, 2019) and Harmonic Serialism (HS) (McCarthy 2008a, b, 2016). The data show an interesting double repair of the coda /s/ taking place at word edges. It is argued that Stratal OT is superior in modelling phonological processes that take place at the interface between morphology and phonology because it embraces cyclicity. Under this model, prosodic structure is built serially, level by level, and in accordance with the morphological structure of the input string. In this way, opacity at constituent edges can be solved. Stratal OT also provides insight into word-internal morphological structure and the domain-specificity of phonological processes. It is demonstrated that a distinction in this model is necessary between the word and the phrase levels, and between the stem and the word levels. As illustrated by the behaviour of Spanish nouns, affixation and the resultant alternations inform us about the domains to which both morphological and phonological processes should be assigned. Against this background, Harmonic Serialism embraces an apparently simpler recursive mechanism in which stepwise prosodic parsing can be incorporated. What is more, it offers insight into the nature of operations in OT, as well as into such problematic issues as structure building and directionality. Nevertheless, despite the model’s ability to solve various cases of opacity, the need to distinguish between two competing repairs makes HS fail when confronted with the Chilean data under examination.

Download Full-text

ONLINE AND DYNAMIC RECOGNITION OF SQUAREFREE STRINGS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054107004747 ◽

2007 ◽

Vol 18 (02) ◽

pp. 401-414 ◽

Cited By ~ 3

Author(s):

JESPER JANSSON ◽

ZESHAN PENG

Keyword(s):

Efficient Algorithm ◽

Simple Algorithm ◽

Time Algorithm ◽

Input String ◽

Recognition Problem ◽

First Occurrence ◽

Dynamic Version ◽

Dynamic Recognition

The online squarefree recognition problem is to detect the first occurrence of a square in a string whose characters are provided as input one at a time. We present an efficient algorithm to solve this problem for strings over arbitrarily ordered alphabets in O(n log n) time, where n is the ending position of the first square. We also note that the same technique yields an O(n·(|Σn|+ log n))-time algorithm for general alphabets, where |Σn| is the number of different symbols in the first n positions of the input string. (This is faster than the previously fastest method for general alphabets when |Σn| = o( log 2 n).) Finally, we present a simple algorithm for a dynamic version of the problem over general alphabets in which we are initially given a squarefree string, followed by a series of updates, and the objective is to determine after each update if the resulting string is still squarefree.

Download Full-text

Learning Dependency Translation Models as Collections of Finite-State Head Transducers

Computational Linguistics ◽

10.1162/089120100561629 ◽

2000 ◽

Vol 26 (1) ◽

pp. 45-60 ◽

Cited By ~ 35

Author(s):

Hiyan Alshawi ◽

Srinivas Bangalore ◽

Shona Douglas

Keyword(s):

Search Algorithm ◽

Input String ◽

State Machines ◽

Training Method ◽

Input Output ◽

Finite State Transducers ◽

Finite State ◽

Training Examples ◽

Correlation Statistics ◽

Special Case

The paper defines weighted head transducers, finite-state machines that perform middle-out string transduction. These transducers are strictly more expressive than the special case of standard left-to-right finite-state transducers. Dependency transduction models are then defined as collections of weighted head transducers that are applied hierarchically. A dynamic programming search algorithm is described for finding the optimal transduction of an input string with respect to a dependency transduction model. A method for automatically training a dependency transduction model from a set of input-output example strings is presented. The method first searches for hierarchical alignments of the training examples guided by correlation statistics, and then constructs the transitions of head transducers that are consistent with these alignments. Experimental results are given for applying the training method to translation from English to Spanish and Japanese.

Download Full-text

A functional approach to the attainability of typological targets in L2 acquisition

Interlanguage studies bulletin (Utrecht) ◽

10.1177/026765838600200102 ◽

1986 ◽

Vol 2 (1) ◽

pp. 16-32 ◽

Cited By ~ 5

Author(s):

Helmut Zobl

Keyword(s):

Functional Approach ◽

Input String ◽

The Other ◽

L2 Acquisition ◽

Second Languages ◽

Other Hand ◽

Logical Entailment ◽

L1 Acquisition ◽

L1 And L2 ◽

Relationship Of

This paper proposes a functional, parsing-based approach to the attainability of typological targets in L1 and L2 acquisition. Ideally, there should be a functional synchronization between the order in which principles constituting typological values emerge in learner grammars and the computational demands imposed by the simplest data instantiating a typological value. In L2 acquisition this functional synchronization is jeopardized by the possibility of L1-inspired misparses which may impute more structure to an input string than is consistent with a minimal parse. As a result, the more marked typological setting or the implicans of two grammatical principles in a relationship of logical entailment can appear first in interlanguage grammars. In L1 acquisition, on the other hand, misparses appear to be the result of assuming too little structure. Because of these differences, recovery from an inappropriate value follows different courses in L1 and L2 acquisition. It is proposed that this difference has important implications for the learn-ability of first and second languages.

Download Full-text

Computing the Expected Edit Distance from a String to a Probabilistic Finite-State Automaton

International Journal of Foundations of Computer Science ◽

10.1142/s0129054117400093 ◽

2017 ◽

Vol 28 (05) ◽

pp. 603-621 ◽

Cited By ~ 2

Author(s):

Jorge Calvo-Zaragoza ◽

Jose Oncina ◽

Colin de la Higuera

Keyword(s):

Polynomial Time ◽

Edit Distance ◽

Input String ◽

Finite State Automaton ◽

Global View ◽

Finite State ◽

Median String

In a number of fields, it is necessary to compare a witness string with a distribution. One possibility is to compute the probability of the string for that distribution. Another, giving a more global view, is to compute the expected edit distance from a string randomly drawn to the witness string. This number is often used to measure the performance of a prediction, the goal then being to return the median string, or the string with smallest expected distance. To be able to measure this, computing the distance between a hypothesis and that distribution is necessary. This paper proposes two solutions for computing this value, when the distribution is defined with a probabilistic finite state automaton. The first is exact but has a cost which can be exponential in the length of the input string, whereas the second is a fully polynomial-time randomized schema.

Download Full-text