linguistic feature
Recently Published Documents


TOTAL DOCUMENTS

201
(FIVE YEARS 87)

H-INDEX

18
(FIVE YEARS 2)

2022 ◽  
Vol 6 (POPL) ◽  
pp. 1-28
Author(s):  
Yizhou Zhang ◽  
Nada Amin

Metareasoning can be achieved in probabilistic programming languages (PPLs) using agent models that recursively nest inference queries inside inference queries. However, the semantics of this powerful, reflection-like language feature has defied an operational treatment, much less reasoning principles for contextual equivalence. We give formal semantics to a core PPL with continuous distributions, scoring, general recursion, and nested queries. Unlike prior work, the presence of nested queries and general recursion makes it impossible to stratify the definition of a sampling-based operational semantics and that of a measure-theoretic semantics—the two semantics must be defined mutually recursively. A key yet challenging property we establish is that probabilistic programs have well-defined meanings: limits exist for the step-indexed measures they induce. Beyond a semantics, we offer relational reasoning principles for probabilistic programs making nested queries. We construct a step-indexed, biorthogonal logical-relations model. A soundness theorem establishes that logical relatedness implies contextual equivalence. We demonstrate the usefulness of the reasoning principles by proving novel equivalences of practical relevance—in particular, game-playing and decisionmaking agents. We mechanize our technical developments leading to the soundness proof using the Coq proof assistant. Nested queries are an important yet theoretically underdeveloped linguistic feature in PPLs; we are first to give them semantics in the presence of general recursion and to provide them with sound reasoning principles for contextual equivalence.


2021 ◽  
Author(s):  
Jeremy Giroud ◽  
Jacques Pesnot Lerousseau ◽  
Francois Pellegrino ◽  
Benjamin Morillon

Humans are expert at processing speech but how this feat is accomplished remains a major question in cognitive neuroscience. Capitalizing on the concept of channel capacity, we developed a unified measurement framework to investigate the respective influence of seven acoustic and linguistic features on speech comprehension, encompassing acoustic, sub-lexical, lexical and supra-lexical levels of description. We show that comprehension is independently impacted by all these features, but at varying degrees and with a clear dominance of the syllabic rate. Comparing comprehension of French words and sentences further reveals that when supra-lexical contextual information is present, the impact of all other features is dramatically reduced. Finally, we estimated the channel capacity associated with each linguistic feature and compared them with their generic distribution in natural speech. Our data point towards supra-lexical contextual information as the feature limiting the flow of natural speech. Overall, this study reveals how multilevel linguistic features constrain speech comprehension.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Guangyao Zhang ◽  
Licheng Wang ◽  
Weixi Xie ◽  
Furong Shang ◽  
Xinlu Xia ◽  
...  

PurposeThe purpose of this paper is to reveal a symbol – “however” that authors are very interested in, but few research studies pay attention to the existing literature. The authors aim to further insight its function.Design/methodology/approachIn this research, the authors selected 3,329 valid comments on articles published in the British Medical Journal (BMJ) from 2015 to 2020 as the research objects. The authors showed the length distribution of reviewers' comments. In what follows, the authors analyzed the general distribution of words in comments and reviewer comments’ position to understand reviewers' comments qualitatively in word dimension. Specially, the authors analyzed functions of “however” and “but”, words that authors are most concerned with. In addition, the authors also discussed some factors, which may be related to “however,” that reflect reviewers' praise through regression analysis.FindingsThe authors found that there are marked differences in the length of reviewers' comments under different review rounds. By mapping the reviewers' comments to different sections, the authors found that reviewers are deeply concerned to methods section. Adjectives and adverbs in comments written in different sections of the manuscripts also have different characteristics. The authors tried to interpret the turning function of “however” in scientific communication. Its frequency of use is related to reviewers' identities, specifically academic status. More precisely, junior researchers use “however” in praise more frequently than senior researchers do.Research limitations/implicationsThe linguistic feature and function of “however” and “but” in the reviewers' comments of the rejected manuscripts may be different from accepted papers and also worth exploring. Regrettably, the authors cannot obtain the peer review comments of rejected manuscripts. This point may limit the conclusion of the investigation of this article.Originality/valueOverall, the survey results revealed some language features of reviewers' comments, which could provide a basis of future endeavors for many reviewers in open peer review (OPR) field. Specially, the authors also put forward an interesting symbol to examine the review comments, “however”, for the first time.


Author(s):  
Paweł Levchuk

Western Region Variety of the Standard Ukrainian Language in the Interwar Period: A Review of Liudmyla Pidkuĭmukha’s Monograph Mova Lʹvova, abo koly ĭ batiary hovoryly (Kyiv: Klio, 2020, ss. 326)The reviewed monograph is the first extensive paper on the vocabulary of the western variant of the Ukrainian language based on the texts of the ‘Twelve’, an interwar literary circle of writers from Lʹviv. The paper highlights the social dialects that functioned in Lʹviv during the interwar period, in particular the jargon of schoolchildren and athletes. Particular attention is paid to balak, which became a linguistic feature of the Batyar subculture. This subculture reached its peak in the 1920s and 1930s. Material collated from three editions of B. Nyzhankivskyi’s collection Street (1936, 1941, 1995) illustrates the specifics of Soviet editorial practice, which was aimed at limiting the use of western Ukrainian vocabulary in order to artificially bring the Ukrainian language closer to Russian. Zachodnia odmiana standardowego języka ukraińskiego w okresie międzywojennym. Recenzja monografii Liudmyly Pidkuĭmukhy "Mova Lʹvova, abo koly ĭ batiary hovoryly" (Kyiv: Klio, 2020, ss. 326)Recenzowana monografia jest pierwszą obszerną pracą na temat słownictwa zachodniej odmiany języka ukraińskiego na podstawie tekstów "Dwunastki", międzywojennego literackiego ugrupowania pisarzy z Lwowa. W artykule zwrócono uwagę na dialekty społeczne funkcjonujące we Lwowie w okresie międzywojennym, w szczególności na żargon młodzieży szkolnej i sportowców. Szczególną uwagę zwrócono na balak, który stał się językową cechą subkultury batiarskiej. Subkultura ta osiągnęła szczyt rozwoju w latach dwudziestych i trzydziestych XX wieku. Materiał zebrany z trzech wydań zbioru B. Nyzhankivskiego Ulica (1936, 1941, 1995) ilustruje specyfikę radzieckiej praktyki edytorskiej, której celem było ograniczenie użycia zachodnioukraińskiego słownictwa, aby sztucznie zbliżyć język ukraiński do rosyjskiego.


2021 ◽  
Vol 2 (6) ◽  
Author(s):  
Yonas Demeke Woldemariam

AbstractWe develop an NLP method for inferring potential contributors among multitude of users within crowdsourcing forums (CSFs). The method basically provides a way to predict expertise from their structures (syntax–semantic patterns) when crowdsourced votes are unavailable. It primarily deals with tackling core adverse conditions, which hinder the identification of crowds’ expertise levels, and standardization of measuring linguistic quality of crowdsourced text. To solve the former, an expertise estimation and linguistic feature annotation algorithm is developed. To approach the later, a comprehensive linguistic characterization of crowdsourced text, along with extensive joint syntax–punctuation analyses, have been carried out. The entire corpora are comprised of approximately 8 different domains, 3 million and 50,000 sentences, and 32 million and 90,000 words, contributed by a crowd of 50,000 users. The analyses revealed six major linguistic patterns, identified on the basis of ordered lists of structural (syntactic) categories, learned from grammatical constructions, practiced by major groups of experts. In addition, nine different text-oriented expertise dimensions are identified, as crucial steps towards establishing standard linguistic-based expertise-framework for most CSFs. Potentially, the resulting framework simplifies the measurement of crowds’ proficiency, in those particular forums, where crowds’ tasks (e.g., answering questions, technically discerning deep features within images of galaxies for classifying them into certain categories) are intimately connected with their writing (e.g., describing answers illustratively, expressing complex phenomena observed in classified images). Moreover, wide varieties of linguistic annotations: latent topic annotations, named entities, syntactic and punctuation annotations, semantic and character set annotations, word and character n-grams (n = 2 and 3) annotations, are extracted. That is for building baseline and enhanced versions of expertise models (about 20 different models built). The successive achievements of enhancing baseline models, with iteratively adding linguistic feature annotations in a two-stage enhancement process, indicate the adaptability of the learned models.


2021 ◽  
Vol 5 (4) ◽  
pp. 680-687
Author(s):  
Ghina Dwi Salsabila ◽  
Erwin Budi Setiawan

Personality provides a deep insight of someone and has an important part in someone’s job performance. Predicting personality through social media has been studied on several research. The problem is how to improve the performance of personality prediction system. The purpose of this research is to predict personality on Twitter users and increase the performance of the personality prediction system. An online survey using Big Five Inventory (BFI) questionnaire has been distributed and gathered 295 Twitter users with 511,617 tweets data. In this research, we experiment on two different methods using Support Vector Machine (SVM), and the combination of SVM and BERT as the semantic approach. This research also implements Linguistic Inquiry Word Count (LIWC) as the linguistic feature for personality prediction system. The results showed that combination of these two methods achieve 79.35% accuracy score and with the implementation of LIWC can improve the accuracy score up to 80.07%. Overall, these results showed that the combination of SVM and BERT as the semantic approach with the implementation of LIWC is recommended to gain a better performance for the personality prediction system.  


Author(s):  
Michael Stubbs

Abstract In an influential book on literary linguistics, first published in 1981 and revised in 2007, Geoffrey Leech and his colleague Mick Short discuss linguistic methods of analysing long texts of prose fiction. This article develops their arguments in two ways: (1) by relating them to classic puzzles in the philosophy of science; and (2) by illustrating them with a computer-assisted study of Bram Stoker’s 1897 novel Dracula. This case study shows that software can identify a linguistic feature of the novel which is central to its major themes, but which is unlikely to be consciously noticed by human readers. Quantitative data on the novel show that it contains a large number of negatives. Their function is often to deny something which would normally be expected, and therefore to express the protagonists’ distrust of their own senses in the extraordinary world in which they find themselves.


2021 ◽  
Author(s):  
Rachel Soo ◽  
Philip J. Monahan

Heritage speakers contend with at least two languages: the less dominant L1 (heritage language), and the more dominant L2. Maintaining the heritage language allows heritage speakers to communicate with members of their community. In some cases, their L1 and L2 bear striking phonological differences. In the current study, we investigate this in the context of Toronto-born Cantonese heritage speakers and their maintenance of Cantonese lexical tone, a linguistic feature that is absent from English, the more dominant L2. Across two experiments, Cantonese heritage speakers were tested on their phonetic/phonological and lexical encoding of tone in Cantonese. Experiment 1 was an AX discrimination task with varying inter-stimulus intervals (ISIs), which revealed that heritage speakers discriminated tone pairs with distinct pitch contours better than those with shared contours. Experiment 2 was a medium-term repetition priming experiment, designed to extend the findings of Experiment 1 by examining tone representations at the lexical level. We observed a positive correlation between tone minimal pair priming and English dominance. Thus, while increased English dominance does not affect heritage speakers' phonological-level representations, tasks that require lexical access suggest that heritage Cantonese speakers may not robustly and fully distinctively encode Cantonese tone in lexical memory.


Author(s):  
Vittoria Cuteri ◽  
Giulia Minori ◽  
Gloria Gagliardi ◽  
Fabio Tamburini ◽  
Elisabetta Malaspina ◽  
...  

Abstract Purpose Attention has recently been paid to Clinical Linguistics for the detection and support of clinical conditions. Many works have been published on the “linguistic profile” of various clinical populations, but very few papers have been devoted to linguistic changes in patients with eating disorders. Patients with Anorexia Nervosa (AN) share similar psychological features such as disturbances in self-perceived body image, inflexible and obsessive thinking and anxious or depressive traits. We hypothesize that these characteristics can result in altered linguistic patterns and be detected using the Natural Language Processing tools. Methods We enrolled 51 young participants from December 2019 to February 2020 (age range: 14–18): 17 girls with a clinical diagnosis of AN, and 34 normal-weighted peers, matched by gender, age and educational level. Participants in each group were asked to produce three written texts (around 10–15 lines long). A rich set of linguistic features was extracted from the text samples and the statistical significance in pinpointing the pathological process was measured. Results Comparison between the two groups showed several linguistics indexes as statistically significant, with syntactic reduction as the most relevant trait of AN productions. In particular, the following features emerge as statistically significant in distinguishing AN girls and their normal-weighted peers: the length of the sentences, the complexity of the noun phrase, and the global syntactic complexity. This peculiar pattern of linguistic erosion may be due to the severe metabolic impairment also affecting the central nervous system in AN. Conclusion These preliminary data showed the existence of linguistic parameters as probable linguistic markers of AN. However, the analysis of a bigger cohort, still ongoing, is needed to consolidate this assumption. Level of evidence III Evidence obtained from case–control analytic studies.


2021 ◽  
Author(s):  
◽  
Khaled Mamer Ben Milad ◽  

In general, advances in translation technology tools have enhanced translation quality significantly. Unfortunately, however, it seems that this is not the case for all language pairs. A concern arises when the users of translation tools want to work between different language families such as Arabic and English. The main problems facing Arabic<>English translation tools lie in Arabic’s characteristic free word order, richness of word inflection – including orthographic ambiguity – and optionality of diacritics, in addition to a lack of data resources. The aim of this study is to compare the performance of translation memory (TM) and machine translation (MT) systems in translating between Arabic and English.The research evaluates the two systems based on specific criteria relating to needs and expected results. The first part of the thesis evaluates the performance of a set of well-known TM systems when retrieving a segment of text that includes an Arabic linguistic feature. As it is widely known that TM matching metrics are based solely on the use of edit distance string measurements, it was expected that the aforementioned issues would lead to a low match percentage. The second part of the thesis evaluates multiple MT systems that use the mainstream neural machine translation (NMT) approach to translation quality. Due to a lack of training data resources and its rich morphology, it was anticipated that Arabic features would reduce the translation quality of this corpus-based approach. The systems’ output was evaluated using both automatic evaluation metrics including BLEU and hLEPOR, and TAUS human quality ranking criteria for adequacy and fluency.The study employed a black-box testing methodology to experimentally examine the TM systems through a test suite instrument and also to translate Arabic English sentences to collect the MT systems’ output. A translation threshold was used to evaluate the fuzzy matches of TM systems, while an online survey was used to collect participants’ responses to the quality of MT system’s output. The experiments’ input of both systems was extracted from Arabic<>English corpora, which was examined by means of quantitative data analysis. The results show that, when retrieving translations, the current TM matching metrics are unable to recognise Arabic features and score them appropriately. In terms of automatic translation, MT produced good results for adequacy, especially when translating from Arabic to English, but the systems’ output appeared to need post-editing for fluency. Moreover, when retrievingfrom Arabic, it was found that short sentences were handled much better by MT than by TM. The findings may be given as recommendations to software developers.


Sign in / Sign up

Export Citation Format

Share Document