syntactic structure Latest Research Papers

A Self-Supervised Representation Learning of Sentence Structure for Authorship Attribution

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3491203 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-16

Author(s):

Fereshteh Jafariakinabad ◽

Kien A. Hua

Keyword(s):

Structural Information ◽

Syntactic Structure ◽

Representation Learning ◽

Authorship Attribution ◽

Sentence Structure ◽

Vector Representation ◽

Writing Style ◽

Neural Models ◽

Syntactic Information ◽

Classification Tasks

The syntactic structure of sentences in a document substantially informs about its authorial writing style. Sentence representation learning has been widely explored in recent years and it has been shown that it improves the generalization of different downstream tasks across many domains. Even though utilizing probing methods in several studies suggests that these learned contextual representations implicitly encode some amount of syntax, explicit syntactic information further improves the performance of deep neural models in the domain of authorship attribution. These observations have motivated us to investigate the explicit representation learning of syntactic structure of sentences. In this article, we propose a self-supervised framework for learning structural representations of sentences. The self-supervised network contains two components; a lexical sub-network and a syntactic sub-network which take the sequence of words and their corresponding structural labels as the input, respectively. Due to the n -to-1 mapping of words to their structural labels, each word will be embedded into a vector representation which mainly carries structural information. We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task. Our experimental results indicate that the structural embeddings significantly improve the classification tasks when concatenated with the existing pre-trained word embeddings.

I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3472295 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-32

Author(s):

Dana Halabi ◽

Ebaa Fayyoumi ◽

Arafat Awajan

Keyword(s):

Natural Language ◽

Syntactic Structure ◽

Arabic Language ◽

Morphological Features ◽

Linguistic Resources ◽

Part Of Speech ◽

Comparable Level ◽

Percentage Improvement ◽

Level Of Details

Treebanks are valuable linguistic resources that include the syntactic structure of a language sentence in addition to part-of-speech tags and morphological features. They are mainly utilized in modeling statistical parsers. Although the statistical natural language parser has recently become more accurate for languages such as English, those for the Arabic language still have low accuracy. The purpose of this article is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language, to investigate their effects on the accuracy of statistical parsers. The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts. The first concept is the approach of determining the main word of the sentence, and the second concept is the representation of the joined and covert pronouns. To evaluate I3rab, we compared its performance against a subset of Prague Arabic Dependency Treebank that shares a comparable level of details. The conducted experiments show that the percentage improvement reached up to 10.24% in UAS and 18.42% in LAS.

Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3479155 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-25

Author(s):

Pragya Katyayan ◽

Nisheeth Joshi

Keyword(s):

Language Processing ◽

Word Order ◽

Native Speakers ◽

Syntactic Structure ◽

Contextual Knowledge ◽

Text File ◽

Rule Based ◽

Parts Of Speech ◽

Order Language ◽

Free Word

Hindi is the third most-spoken language in the world (615 million speakers) and has the fourth highest native speakers (341 million). It is an inflectionally rich and relatively free word-order language with an immense vocabulary set. Despite being such a celebrated language across the globe, very few Natural Language Processing (NLP) applications and tools have been developed to support it computationally. Moreover, most of the existing ones are not efficient enough due to the lack of semantic information (or contextual knowledge). Hindi grammar is based on Paninian grammar and derives most of its rules from it. Paninian grammar very aggressively highlights the role of karaka theory in free-word order languages. In this article, we present an application that extracts all possible karakas from simple Hindi sentences with an accuracy of 84.2% and an F1 score of 88.5%. We consider features such as Parts of Speech tags, post-position markers (vibhaktis), semantic tags for nouns and syntactic structure to grab the context in different-sized word windows within a sentence. With the help of these features, we built a rule-based inference engine to extract karakas from a sentence. The application takes in a text file with clean (without punctuation) simple Hindi sentences and gives back karaka tagged sentences in a separate text file as output.

On Learning Interpreted Languages with Recurrent Models

Computational Linguistics ◽

10.1162/coli_a_00431 ◽

2022 ◽

pp. 1-13

Author(s):

Denis Paperno

Keyword(s):

Natural Language ◽

Data Processing ◽

Syntactic Structure ◽

Neural Nets ◽

Training Data ◽

Sequential Data ◽

Extensive Training ◽

Formal Syntax ◽

Compositional Interpretation

Abstract Can recurrent neural nets, inspired by human sequential data processing, learn to understand language? We construct simplified datasets reflecting core properties of natural language as modeled in formal syntax and semantics: recursive syntactic structure and compositionality. We find LSTM and GRU networks to generalise to compositional interpretation well, but only in the most favorable learning settings, with a well-paced curriculum, extensive training data, and left-to-right (but not right-to-left) composition.

Predicting the impact of online news articles – is information necessary?

Multimedia Tools and Applications ◽

10.1007/s11042-021-11621-5 ◽

2022 ◽

Author(s):

Judita Preiss

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Syntactic Structure ◽

Named Entity Recognition ◽

Relation Extraction ◽

Online News ◽

Entity Recognition ◽

News Article ◽

Popularity Prediction ◽

The Impact

AbstractWe exploit the Twitter platform to create a dataset of news articles derived from tweets concerning COVID-19, and use the associated tweets to define a number of popularity measures. The focus on (potentially) biomedical news articles allows the quantity of biomedically valid information (as extracted by biomedical relation extraction) to be included in the list of explored features. Aside from forming part of a systematic correlation exploration, the features – ranging from the semantic relations through readability measures to the article’s digital content – are used within a number of machine learning classifier and regression algorithms. Unsurprisingly, the results support that for more complex articles (as determined by a readability measure) more sophisticated syntactic structure may be expected. A weak correlation is found with information within an article suggesting that other factors, such as numbers of videos, have a notable impact on the popularity of a news article. The best popularity prediction performance is obtained using a random forest machine learning algorithm, and the feature describing the quantity of biomedical information is in the top 3 most important features in almost a third of the experiments performed. Additionally, this feature is found to be more valuable than the widely used named entity recognition.

Roots of V-to-C Movement in Romance

Domínios de Lingu gem ◽

10.14393/dl49-v16n1a2022-7 ◽

2022 ◽

Vol 16 (1) ◽

pp. 210-229

Author(s):

André Antonelli

Keyword(s):

Syntactic Structure ◽

Direct Object ◽

Romance Languages ◽

Left Periphery ◽

Verb Second ◽

Verb Raising ◽

Germanic Languages ◽

Finite Verb

The paper investigates the syntactic structure of wh-clauses in late Latin. The results show that, in sentences with a wh-phrase as direct object, the interrogative operator reaches FocP in the left periphery, with the finite verb raising to the Foc head. This spec-head relation accounts for why subjects and dislocated XPs (like topics or focus elements) can not be intervening constituents between the object wh-phrase and the verb. For wh-clauses in which the interrogative operator is an adjunct, the hypothesis is that the wh-phrase occupies [Spec,IntP]. Here, the verb does not move to the CP-field, thus explaining the possibility of intervening subjects and interpolated XPs between the adjunct wh-element and the verb. These results show that the verb second (V2) property of V-to-C movement, as seen in several old Romance languages, can be derived from late Latin, and not exclusively from a supposed influence of Germanic languages, as is assumed in the literature.

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

International Journal of Information Retrieval Research ◽

10.4018/ijirr.290830 ◽

2022 ◽

Vol 12 (1) ◽

pp. 1-18

Author(s):

Umamageswari Kumaresan ◽

Kalpana Ramanujam

Keyword(s):

Web Sites ◽

Syntactic Structure ◽

Structured Data ◽

Web Pages ◽

Semantic Labeling ◽

Repeated Pattern ◽

Computationally Intensive ◽

To Come ◽

String Pattern ◽

Informative Content

The intent of this research is to come up with an automated web scraping system which is capable of extracting structured data records embedded in semi-structured web pages. Most of the automated extraction techniques in the literature captures repeated pattern among a set of similarly structured web pages, thereby deducing the template used for the generation of those web pages and then data records extraction is done. All of these techniques exploit computationally intensive operations such as string pattern matching or DOM tree matching and then perform manual labeling of extracted data records. The technique discussed in this paper departs from the state-of-the-art approaches by determining informative sections in the web page through repetition of informative content rather than syntactic structure. From the experiments, it is clear that the system has identified data rich region with 100% precision for web sites belonging to different domains. The experiments conducted on the real world web sites prove the effectiveness and versatility of the proposed approach.

Fast priming of grammatical decisions: repetition and transposed-word priming effects

Royal Society Open Science ◽

10.1098/rsos.211082 ◽

2022 ◽

Vol 9 (1) ◽

Author(s):

Jonathan Mirault ◽

Mathieu Declerck ◽

Jonathan Grainger

Keyword(s):

Sentence Processing ◽

Repetition Priming ◽

Syntactic Structure ◽

Decision Task ◽

Target Sequence ◽

Font Size ◽

Prime Word ◽

Priming Effects ◽

Word Sequence ◽

Word Priming

We used the grammatical decision task to investigate fast priming of written sentence processing. Targets were sequences of 5 words that either formed a grammatically correct sentence or were ungrammatical. Primes were sequences of 5 words and could be the same word sequence as targets, a different sequence of words with a similar syntactic structure, the same sequence with two inner words transposed or the same sequence with two inner words substituted by different words. Prime-word sequences were presented in a larger font size than targets for 200 ms and followed by the target sequence after a 100 ms delay. We found robust repetition priming in grammatical decisions, with same sequence primes leading to faster responses compared with prime sequences containing different words. We also found transposed-word priming effects, with faster responses following a transposed-word prime compared with substituted-word primes. We conclude that fast primed grammatical decisions might offer investigations of written sentence processing what fast primed lexical decisions have offered studies of visual word recognition.

CONJUNCTIONS OF TEMPORARY MEANING IN THE DIALECTS OF THE KOMI LANGUAGE

Bulletin of Udmurt University. Series History and Philology ◽

10.35634/2412-9534-2021-31-6-1170-1177 ◽

2021 ◽

Vol 31 (6) ◽

pp. 1170-1177

Author(s):

E.N. Popova

Keyword(s):

Syntactic Structure ◽

Structural Features ◽

Parts Of Speech ◽

Part Of Speech ◽

The Future ◽

First Time

The relevance of this work is related to the poorly studied conjunctions in the dialects of the Komi language. The study of conjunctions is relevant not only in connection with the solution of the problem of the formation of a conjunction as a part of speech, but also with the solution of problems related to the complication of a sentence, and at the same time, the syntactic structure of the language. Due to the lack of descriptions of conjunctions in the dialects of the Komi language, we continue to conduct research on the problem of "Conjunctions in the Komi-Zyryan dialects" in order to create generalizing works based on such descriptions in the future. The object of the study is the conjunctions of temporary meaning that function in the dialects of the Komi language. The scientific novelty of the study is that, for the first time, the conjunctions of temporary meaning that were not previously described in the dialects of the Komi language, are considered. Their composition in dialects is revealed; the origin, methods and ways of their formation, genetic connection with other parts of speech are established; structural features, peculiarities of their use in a sentence are determined. In the course of the study, descriptive-analytical, comparative, etymological methods, the method of lexicographic search were used.

The Role of Prosodic Factors in Determining The Meaning of The Text

International Journal of Social Science and Human Research ◽

10.47191/ijsshr/v4-i12-65 ◽

2021 ◽

Vol 04 (12) ◽

Author(s):

Khatira Avaz Gojayeva ◽

Keyword(s):

Syntactic Structure ◽

Current Situation ◽

Main Task ◽

Grammatical Function ◽

Discourse Function ◽

New Information ◽

Attitude Function

Intonation is a very complex language unit. It has many functions, and these functions are performed by different phonetic events: 1. A phonetic event that performs the attitude function. This function reflects attitude, emotion, and different phonetic events. For example, depending on the context of the speech and the current situation of the speaker, you can use falling intonation, rising intonation, falling-rising intonation, rising-falling tones. 2. A phonetic event that performs an accentual function. This term is used in connection with an accent. Some phonetics also use stress instead of accent. In this function, the emphasis falls on the last lexical word, and the phonetic event of accentuation and clarification of intonation is performed. 3. A phonetic event that performs a grammatical function. In this function, tone boundaries are defined by intonation. With the help of this phonetic phenomenon, the listener can better recognize the grammar and syntactic structure of what is being said. 4. A phonetic event that performs a discourse function. The main task of this phonetic event is to convey to the listener what "new" information is. An eye-catching tonic accent is placed on the appropriate syllable of a particular sound.

syntactic structure
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Self-Supervised Representation Learning of Sentence Structure for Authorship Attribution

I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

On Learning Interpreted Languages with Recurrent Models

Predicting the impact of online news articles – is information necessary?

Roots of V-to-C Movement in Romance

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

Fast priming of grammatical decisions: repetition and transposed-word priming effects

CONJUNCTIONS OF TEMPORARY MEANING IN THE DIALECTS OF THE KOMI LANGUAGE

The Role of Prosodic Factors in Determining The Meaning of The Text

Export Citation Format

syntactic structureRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Self-Supervised Representation Learning of Sentence Structure for Authorship Attribution

I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

Development of Automatic Rule-based Semantic Tagger and Karaka Analyzer for Hindi

On Learning Interpreted Languages with Recurrent Models

Predicting the impact of online news articles – is information necessary?

Roots of V-to-C Movement in Romance

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

Fast priming of grammatical decisions: repetition and transposed-word priming effects

CONJUNCTIONS OF TEMPORARY MEANING IN THE DIALECTS OF THE KOMI LANGUAGE

The Role of Prosodic Factors in Determining The Meaning of The Text

syntactic structure
Recently Published Documents