linguistic features
Recently Published Documents


TOTAL DOCUMENTS

2160
(FIVE YEARS 1058)

H-INDEX

38
(FIVE YEARS 5)

2022 ◽  
Vol 24 (3) ◽  
pp. 1-16
Author(s):  
Manvi Breja ◽  
Sanjay Kumar Jain

Why-type non-factoid questions are ambiguous and involve variations in their answers. A challenge in returning one appropriate answer to user requires the process of appropriate answer extraction, re-ranking and validation. There are cases where the need is to understand the meaning and context of a document rather than finding exact words involved in question. The paper addresses this problem by exploring lexico-syntactic, semantic and contextual query-dependent features, some of which are based on deep learning frameworks to depict the probability of answer candidate being relevant for the question. The features are weighted by the score returned by ensemble ExtraTreesClassifier according to features importance. An answer re-ranker model is implemented that finds the highest ranked answer comprising largest value of feature similarity between question and answer candidate and thus achieving 0.64 Mean Reciprocal Rank (MRR). Further, answer is validated by matching the answer type of answer candidate and returns the highest ranked answer candidate with matched answer type to a user.


2022 ◽  
Vol 24 (3) ◽  
pp. 0-0

Why-type non-factoid questions are ambiguous and involve variations in their answers. A challenge in returning one appropriate answer to user requires the process of appropriate answer extraction, re-ranking and validation. There are cases where the need is to understand the meaning and context of a document rather than finding exact words involved in question. The paper addresses this problem by exploring lexico-syntactic, semantic and contextual query-dependent features, some of which are based on deep learning frameworks to depict the probability of answer candidate being relevant for the question. The features are weighted by the score returned by ensemble ExtraTreesClassifier according to features importance. An answer re-ranker model is implemented that finds the highest ranked answer comprising largest value of feature similarity between question and answer candidate and thus achieving 0.64 Mean Reciprocal Rank (MRR). Further, answer is validated by matching the answer type of answer candidate and returns the highest ranked answer candidate with matched answer type to a user.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Krishnadas Nanath ◽  
Supriya Kaitheri ◽  
Sonia Malik ◽  
Shahid Mustafa

Purpose The purpose of this paper is to examine the factors that significantly affect the prediction of fake news from the virality theory perspective. The paper looks at a mix of emotion-driven content, sentimental resonance, topic modeling and linguistic features of news articles to predict the probability of fake news. Design/methodology/approach A data set of over 12,000 articles was chosen to develop a model for fake news detection. Machine learning algorithms and natural language processing techniques were used to handle big data with efficiency. Lexicon-based emotion analysis provided eight kinds of emotions used in the article text. The cluster of topics was extracted using topic modeling (five topics), while sentiment analysis provided the resonance between the title and the text. Linguistic features were added to the coding outcomes to develop a logistic regression predictive model for testing the significant variables. Other machine learning algorithms were also executed and compared. Findings The results revealed that positive emotions in a text lower the probability of news being fake. It was also found that sensational content like illegal activities and crime-related content were associated with fake news. The news title and the text exhibiting similar sentiments were found to be having lower chances of being fake. News titles with more words and content with fewer words were found to impact fake news detection significantly. Practical implications Several systems and social media platforms today are trying to implement fake news detection methods to filter the content. This research provides exciting parameters from a viral theory perspective that could help develop automated fake news detectors. Originality/value While several studies have explored fake news detection, this study uses a new perspective on viral theory. It also introduces new parameters like sentimental resonance that could help predict fake news. This study deals with an extensive data set and uses advanced natural language processing to automate the coding techniques in developing the prediction model.


Author(s):  
María Luna ◽  
Ruth Villalón ◽  
Isabel Martínez-Álvarez ◽  
Mar Mateos

AbstractWriting an argumentative synthesis is a common but demanding task, consequently undergraduates require some instruction. The objective of this study was to test the effectiveness of two interventions on integrative argumentation: one of them was focused on the product features of argumentative texts; and the other one on the processes involved in the written argumentation. Sixty-six undergraduate students participated voluntarily. As an academic task, they were asked to write a pre-test synthesis after reading two sources which presented contradictory positions about an educational issue, then to read two new texts about a different but equivalent issue, and write a post-test synthesis following one of two types of instructional virtual environments. The instructions, implemented in Moodle, presented similar tools, employing videos, graphic organizers, and exercises. The first condition (n = 33) focused on the linguistic features while the second (n = 33), focused on the process, including explicit instruction and a script with critical questions to guide the reading and writing processes. In this study we have also analyzed how the students in the process condition answered some of the critical questions. The results show that the level of integration of the written products improved in both conditions, although this improvement was more pronounced in the process intervention. Nonetheless, the products that achieved medium and maximum integration were still limited. Despite the lack of a relationship between how students answered the critical questions and the level of integration in their post-test, the case analysis highlights certain educational implications and further research.


Author(s):  
José Antonio García-Díaz ◽  
Rafael Valencia-García

AbstractSatirical content on social media is hard to distinguish from real news, misinformation, hoaxes or propaganda when there are no clues as to which medium these news were originally written in. It is important, therefore, to provide Information Retrieval systems with mechanisms to identify which results are legitimate and which ones are misleading. Our contribution for satire identification is twofold. On the one hand, we release the Spanish SatiCorpus 2021, a balanced dataset that contains satirical and non-satirical documents. On the other hand, we conduct an extensive evaluation of this dataset with linguistic features and embedding-based features. All feature sets are evaluated separately and combined using different strategies. Our best result is achieved with a combination of the linguistic features and BERT with an accuracy of 97.405%. Besides, we compare our proposal with existing datasets in Spanish regarding satire and irony.


2022 ◽  
Vol 3 (2) ◽  
Author(s):  
Elena Vasilievna Velikaya

Spoken language production is considered to be one of the most difficult aspects of teaching a foreign language. It usually involves mastering pronunciation of sounds and intonation. If nowadays many teachers do not worry about the phonetic details of sounds, there is still focus on intonation as it has a great impact on the comprehensibility of the learner’s English. This is a very important issue for future teachers because correctness of pronunciation is one of the goals of any spoken language programme, with students asked to produce quite extended spoken monologues and to follow the requirements of various intonational styles. The aim of this study is to analyse textual and prosodic characteristics of stage monologue – a text produced on a theatre stage or in a film. Analytical methods were applied in order to obtain information about textual features and prosodic stylistic markers such as pitch level, range, tone modifications, loudness, and tempo, and also to develop style-forming factors in stage monologue. Results show that the stage monologues analysed possess all necessary characteristics of a text: informational content, delimitation, continuum, coherence, cohesion and completeness. Further analysis of stage monologue showed that it can be characterised by such specific features as expressiveness, normativeness, effectiveness, and conversational character. Stage monologues also possess all necessary prosodic markers. Certain style-forming factors of stage monologue were also developed in this study, including delimitation, accentuation of key words, thematic centres and expressively prominent centres, type of composition scheme, and theme. These results will be of significant pedagogical value to students who intend to become English teachers, and to teachers involved in linguistics research.


2022 ◽  
Vol 10 (1) ◽  
pp. 63-83
Author(s):  
Wei Xiao ◽  
Jin Liu ◽  
Li Li

Recent years have witnessed a growing interest in research article (RA thereafter) introductions. Most previous studies focused on the macro structures, rhetorical functions and linguistic realizations of RA introductions, but few intended to investigate the information content distribution from the perspective of information theory. The current study conducted an entropy-based study on the distributional patterns of information content in RA introductions and their variations across disciplines (humanities, natural sciences, and social sciences). Three indices, that is, one-, two-, and three-gram entropies, were used to analyze 120 RA introductions (40 introductions from each disciplinary area). The results reveal that, first, in RA introductions, the information content is unevenly distributed, with the information content of Move 1 being the highest, followed in sequence by Move 3 and Move 2; second, the three entropy indices may reflect different linguistic features of RA introductions; and, third, disciplinary variations of information content were found. In Move 1, the RA introductions of natural sciences are more informative than those of the other two disciplines, and in Move 3 the RA introductions of social sciences are more informative as well. This study has implications for genre-based instruction in the pedagogy of academic writing, as well as the broadening of the applications of quantitative corpus linguistic methods into less touched fields.


2022 ◽  
pp. 1934-1952
Author(s):  
Stefania Gandin

This study illustrates the preliminary results of a corpus-based analysis aimed at discovering the main linguistic features characterising the promotion of tourism for special-needs travellers. Even if accessible tourism represents an important sector in the market, not only for its social and moral importance but also for its strong economic potential, detailed research on the linguistic properties of tourism for disabled people is still rather limited and mainly tends to focus on the problems of physical access rather than considering the ways to improve its promotional strategies. Through a comparative corpus-based analysis, this paper will investigate the relevant linguistic features of a corpus of promotional materials advertising holidays and tourist services for the disabled, and relate them to the communicative strategies of two other corpora dedicated to the standard and translational language of tourism. The aim of this research is to show how mainstream tourism discourse still considers disability as a taboo topic, mostly ignoring or vaguely mentioning it in the general promotion of tourist destinations. The study will also attempt to suggest new linguistic and social attitudes aimed at stylistically improving and further including the accessible tourism sector within the overall tourism promotion.


2021 ◽  
Vol 12 (1) ◽  
pp. 30-49
Author(s):  
Anne Agersnap ◽  
Kirstine Helboe Johansen

This article discusses the concept of reading and presents a method thatcombines distant and close reading, while drawing on insights fromcomputational humanities. Focusing on basic features in language, distantreading allows for the construction of new types of text. By close reading thesetexts, it is possible to analyse cultural patterns across individual texts. Thismethod of reading is illustrated by two cases stemming from a project basedon a corpus of 11,955 Danish sermons. The first case begins with a distantreading of gendered pronouns in the corpus. The second case begins with adistant reading of named agents.*


2021 ◽  
Vol 36 (2) ◽  
pp. 298-335
Author(s):  
Hugo C. Cardoso

Abstract The Indo-Portuguese creole languages that formed along the former Malabar Coast of southwestern India, currently seriously endangered, are arguably the oldest of all Asian-Portuguese creoles. Recent documentation efforts in Cannanore and the Cochin area have revealed a language that is strikingly similar to its substrate/adstrate Malayalam in several fundamental domains of grammar, often contradicting previous records from the late 19th-century and the input of its main lexifier, Portuguese. In this article, this is shown by comparing Malabar Indo-Portuguese with both Malayalam and Portuguese with respect to features in the domains of word order (head-final syntax and harmonic syntactic patterns) and case-marking (the distribution of the oblique case). Based on older records and certain synchronic linguistic features of the Malabar Creoles, this article proposes that the observed isomorphism between modern Malabar Indo-Portuguese and Malayalam has to be explained as the product of either a gradual process of convergence, or the resolution of historical competition between Dravidian-like and Portuguese-like features.


Sign in / Sign up

Export Citation Format

Share Document