free word
Recently Published Documents


TOTAL DOCUMENTS

305
(FIVE YEARS 77)

H-INDEX

22
(FIVE YEARS 2)

Author(s):  
Pragya Katyayan ◽  
Nisheeth Joshi

Hindi is the third most-spoken language in the world (615 million speakers) and has the fourth highest native speakers (341 million). It is an inflectionally rich and relatively free word-order language with an immense vocabulary set. Despite being such a celebrated language across the globe, very few Natural Language Processing (NLP) applications and tools have been developed to support it computationally. Moreover, most of the existing ones are not efficient enough due to the lack of semantic information (or contextual knowledge). Hindi grammar is based on Paninian grammar and derives most of its rules from it. Paninian grammar very aggressively highlights the role of karaka theory in free-word order languages. In this article, we present an application that extracts all possible karakas from simple Hindi sentences with an accuracy of 84.2% and an F1 score of 88.5%. We consider features such as Parts of Speech tags, post-position markers (vibhaktis), semantic tags for nouns and syntactic structure to grab the context in different-sized word windows within a sentence. With the help of these features, we built a rule-based inference engine to extract karakas from a sentence. The application takes in a text file with clean (without punctuation) simple Hindi sentences and gives back karaka tagged sentences in a separate text file as output.


2021 ◽  
Vol 6 ◽  
Author(s):  
Joana Taci ◽  
Mirela Saraci

The following paper aims at shedding some light on Albanian language case system with special focus on the assignment of accusative case. As a member of the vast Indo-European family Albanian language is characterized by an inflected case system and as so a free word order. Traditionally, we are taught and we still teach to the coming generations that accusative case is assigned mostly by the verb to that sentence noun phrase syntactically representing the direct object and semantically introducing the Theme or the Patient.   Moreover in Albanian accusative is also assigned by another morphological category bearing the distinctive features [+noun;+verb], namely the preposition. Furthermore, as a researcher in the field of generative syntax I have a stake at analyzing certain exceptional cases of accusative case assignment to the subject NP of the Albanian subjunctive clause. In conclusion, I was really tempted to adopt Chomsky’s reconciling proposal in accusative case assignment under the specifier-head structural and schematic relation. 


2021 ◽  
Vol 11 (6) ◽  
pp. 120
Author(s):  
Daniele Franceschi

This paper examines some cases of lexical adaptation and innovation in present-day English resulting from changed communicative needs brought about by the current coronavirus pandemic. The data analysed consists of 15 lexical items retrieved in the NOW Corpus, a web-based collection of newspapers and magazines freely accessible online. The study shows that certain already existing words and expressions tend to be used more frequently in coronavirus-related discourse than in other contexts prior to the current crisis; others appear to be undergoing a re-adaptation of their semantic range, while new ones seem to have emerged and to be making their way into dictionaries. At the same time, there are certain free-word combinations built “on the fly” whose stability is still uncertain.


2021 ◽  
pp. 1-30
Author(s):  
Nathan Duran ◽  
Steve Battle ◽  
Jim Smith

Abstract In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.


Author(s):  
Evelyne Decullier ◽  
Mathilde Chauliaguet ◽  
Arnaud Siméone ◽  
Julie Haesebaert ◽  
Agnès Witko

Despite a keen interest in clinical research, most paramedical professionals are unwilling to play an active role. Our objective was to explore paramedical professionals’ representations of research. Using an existing database of final year paramedical students (speech therapy, occupational therapy, psychomotricity, audiometry, physiotherapy, orthoptics), we deployed a qualitative approach composed of two successive steps: (1) a free word association task, and (2) semi-structured individual interviews. Out of the 54 students who agreed to be contacted, we received 21 responses to the free word association questionnaire, and 11 interviews were conducted. The hierarchical evocation matrix revealed that the scientific representation of research is based on words defining the research and the purpose of the research. “Collaboration” was identified as being an essential part of the research process. The central core of the representation is coherent with all its components perceived as positive. The content analysis of the interviews showed a polarization around two key points: (1) participants are interested in accessing and using evidence in their practice (2) but feel less confident about and/or motivated to generate evidence themselves. This study highlights the need to develop more research-friendly environments, especially in training institutions.


2021 ◽  
Vol 2 (3) ◽  
pp. 93-111
Author(s):  
Maria Kosogorova
Keyword(s):  

The paper analyzes the morphosyntactic status of a copula no as part of complex verbal predicates of Guinean Pular. A locative copula no, combined with various forms of lexical verb, adds up to three verbal constructions. Morphological and semantic non-compositionality of no in such constructions questions its morphosyntactic status as a free word and suggests that it might be an affix or a clitic. Pular data has been subjected to a series of tests using a set of phonological, morphological, and semantic criteria. The results of the phonological tests show that no in complex verbal predicates cannot be a free word, whereas the morphological tests deny it the status of an affix. It is, therefore, concluded that this copula is a clitic, which is confirmed by the language data complying with general morphological and phonological criteria.


Author(s):  
Mauro Fontana ◽  
Aline Machado Pereira ◽  
Estefania Júlia Dierings de Souza ◽  
Adriano Hirsch Ramos ◽  
Roberta Bascke Santos ◽  
...  
Keyword(s):  

Morphology ◽  
2021 ◽  
Author(s):  
Fritz Günther ◽  
Marco Marelli

AbstractMany theories on the role of semantics in morphological representation and processing focus on the interplay between the lexicalized meaning of the complex word on the one hand, and the individual constituent meanings on the other hand. However, the constituent meaning representations at play do not necessarily correspond to the free-word meanings of the constituents: Role-dependent constituent meanings can be subject to sometimes substantial semantic shift from their corresponding free-word meanings (such as -bill in hornbill and razorbill, or step- in stepmother and stepson). While this phenomenon is extremely difficult to operationalize using the standard psycholinguistic toolkit, we demonstrate how these as-constituent meanings can be represented in a quantitative manner using a data-driven computational model. After a qualitative exploration, we validate the model against a large database of human ratings of the meaning retention of constituents in compounds. With this model at hand, we then proceed to investigate the internal semantic structure of compounds, focussing on differences in semantic shift and semantic transparency between the two constituents.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Luisa Weiner ◽  
Andrea Guidi ◽  
Nadège Doignon-Camus ◽  
Anne Giersch ◽  
Gilles Bertschy ◽  
...  

AbstractThere is a lack of consensus on the diagnostic thresholds that could improve the detection accuracy of bipolar mixed episodes in clinical settings. Some studies have shown that voice features could be reliable biomarkers of manic and depressive episodes compared to euthymic states, but none thus far have investigated whether they could aid the distinction between mixed and non-mixed acute bipolar episodes. Here we investigated whether vocal features acquired via verbal fluency tasks could accurately classify mixed states in bipolar disorder using machine learning methods. Fifty-six patients with bipolar disorder were recruited during an acute episode (19 hypomanic, 8 mixed hypomanic, 17 with mixed depression, 12 with depression). Nine different trials belonging to four conditions of verbal fluency tasks—letter, semantic, free word generation, and associational fluency—were administered. Spectral and prosodic features in three conditions were selected for the classification algorithm. Using the leave-one-subject-out (LOSO) strategy to train the classifier, we calculated the accuracy rate, the F1 score, and the Matthews correlation coefficient (MCC). For depression versus mixed depression, the accuracy and F1 scores were high, i.e., respectively 0.83 and 0.86, and the MCC was of 0.64. For hypomania versus mixed hypomania, accuracy and F1 scores were also high, i.e., 0.86 and 0.75, respectively, and the MCC was of 0.57. Given the high rates of correctly classified subjects, vocal features quickly acquired via verbal fluency tasks seem to be reliable biomarkers that could be easily implemented in clinical settings to improve diagnostic accuracy.


We build a model to parse the Arabic verbal sentence based on Arabic grammar ontology. The ontology conceptualizes the Arabic verbal sentence through the representation of grammar parsing classes, verb properties, and conjunction checking. By populating the ontology with verbal sentences and adding grammar rules, we form a verbal sentence knowledge base. The parsing model is supported by morphological analysis for sentence syntactic analysis and supported by Arabic synonyms extractor for deriving synonyms. We have implemented the model and have provided it with a user interface where the user can enter a sentence to be parsed and obtains the parsing results. The interface has the options to partially or totally add diacritics to the words of the sentence and it has the possibility to remove ambiguity by choosing the most appropriate analysis from lexicon results. To evaluate the model, we have selected a representative set of Arabic verbal sentences from Arabic grammar books that represent all the possibilities of a verbal sentence. We have performed several parsing tests on these sentences with and without diacritics. The results prove the ability of the model to parse the various forms of the verbal sentence. The accuracy increases when the sentence is diacriticized while avoiding free word order and following the Arabic verbal sentence general form.


Sign in / Sign up

Export Citation Format

Share Document