An efficient and effective approach for multi-fact extraction from text corpus

2021 ◽  
Author(s):  
Jianfeng Qu ◽  
Wen Hua ◽  
Dantong Ouyang ◽  
Xiaofang Zhou
Keyword(s):  
2020 ◽  
Vol 14 (6) ◽  
pp. 1-30
Author(s):  
Wathsala Anupama Mohotti ◽  
Richi Nayak

2017 ◽  
Vol 56 (7) ◽  
pp. 1030-1055 ◽  
Author(s):  
Chrystalla Neofytou ◽  
Thanasis Hadzilacos

Viewing its use in language teaching mainly as a text corpus, this article examines the problem of the assessment of suitability of this material for use in the Greek language course in Cyprus schooling. The suitability of texts for use in language teaching is defined by four parameters, which are described in detail in this article: text readability, content, genre, and grammatical information. The literature review shows the research gap as to the ways of finding on the Web a suitable text for use in language teaching according to specific characteristics. The tool diaKeimenou, which is presented in this article, aims to fill this gap and help the teacher choose the most suitable texts for teaching with reasonable effort and time. The results of the usability evaluation of diaKeimenou are also presented in this article.


Author(s):  
Regīna Kvašīte ◽  
◽  
Kazimiers Župerka ◽  

The aim of the research is to find out what words are used in Lithuanian and Latvian to name the rural population. The study was performed by applying descriptive, comparative and quantitative methods. The novelty of the article is the presentation of the Lithuanian language material in Latvian, as well as the analysis of the Latvian language material and the comparison of the meanings and use of Lithuanian and Latvian words. The study is sociolinguistic, not normative; therefore, not only systematic but also contextual, situational synonymy is important. Dictionaries and texts of literary and common languages, synonyms, slang and jargon, the text of the current Lithuanian language (Dabartinės lietuvių kalbos tekstynas) and the Latvian language text corpus (Latviešu valodas tekstu korpuss), are the main sources. A Lithuanian word kaimietis (‘a villager’), which has long been a neutral name for a rural resident or a person born in a village, is a synonym for both neutral and stylistically connoted words. The most common synonyms are sodietis (‘a homestead peasant’) and valstietis (‘a peasant’). In this synonym sequence, a peasant is a remote word that includes the concept “kaimo gyventojas” (‘a rural resident’) and the concept “žemdirbys” (‘an agriculturalist’), thus linking the synonym sequence of the word a villager to a word farmer in the sequence of synonyms ūkininkas (‘a farmer’), laukininkas (‘a field peasant’). Recently, the word kaimietis (‘a villager’) has acquired a second – pejorative – meaning: “sakoma apie neišsilavinusį, prasto skonio ir pan. žmogų, kuris nebūtinai kilęs iš kaimo” (‘it is said of an uneducated, a person of poor taste, and so on, a person who does not necessarily come from the countryside’). It is already recorded in the written dictionary of the common language, which indicates that the common connoted meaning in slang is codified. The word kaimietis (‘a villager’), used in a pejorative sense, appears in the order of words that have a systemic or contextual pejorative meaning, as well as in a despising way: prastuolis, prasčiokas, mužikas, runkelis. The name of the villager in Latvian – the word laucinieks (‘a villager’) – is stylistically neutral, its synonyms consist of the neutral words lauksaimnieks (‘a farmer’) and zemnieks (‘a peasant’). The word zemnieks, similarly to the valstietis (‘a peasant’) in Lithuanian, is the dominant in the order of distant synonyms zemkopis (‘an agriculturalist’) and zemesrūķis [?]. The approach to the synonym sādžinieks (‘a homestead peasant’) is ambiguous: its definition in current dictionaries associates the word either with Latgale or Russia, although according to its origin, it is considered to be a borrowing from the Lithuanian language. The word with root lauk- (from word ‘field’) lauķis [?] is used in a pejorative sense in Latvian (its shade is similar to the Lithuanian words prasčiokas (‘a hick’) and runkelis (‘a person as mindless as a beetroot’)), as well as slang word pāķis [?] and barbarisms – slavism mužiks (‘a kern’), Germanism bauris [?] (in jargon bauers). The material of Lithuanian and Latvian texts shows that in both Lithuanian and Latvian, the words of different connotations are used synonymously in different contexts.


2004 ◽  
Author(s):  
Karunesh Arora ◽  
Sunita Arora ◽  
Kapil Verma ◽  
Shyam Sunder Agrawal

2016 ◽  
Vol 78 (8) ◽  
Author(s):  
Suraya Alias ◽  
Siti Khaotijah Mohammad ◽  
Gan Keng Hoon ◽  
Tan Tien Ping

A text summary extracts serves as a condensed representation of a written input source where important and salient information is kept. However, the condensed representation itself suffer in lack of semantic and coherence if the summary was produced in verbatim using the input itself. Sentence Compression is a technique where unimportant details from a sentence are eliminated by preserving the sentence’s grammar pattern. In this study, we conducted an analysis on our developed Malay Text Corpus to discover the rules and pattern on how human summarizer compresses and eliminates unimportant constituent to construct a summary. A Pattern-Growth based model named Frequent Eliminated Pattern (FASPe) is introduced to represent the text using a set of sequence adjacent words that is frequently being eliminated across the document collection. From the rules obtained, some heuristic knowledge in Sentence Compression is presented with confidence value as high as 85% - that can be used for further reference in the area of Text Summarization for Malay language.


2021 ◽  
pp. 77-96
Author(s):  
Iryna Samoilova ◽  

This article provides an overview of an active type dictionaries in Ukrainian and foreign lexicography. It examines the peculiarities of the structural organization of dictionary articles, presenting of words in a paradigmatic relationship with registry units, displaying the typical syntagmatic properties of units, and case forms. Relying on a wide range of texts in the Ukrainian language corps, the paper describes the words with parametric semantics dalekist’ (farness) and blyz’kist’ (closeness) with respect to the anthropocentric approach. The applied lexicographic experiment is a practical part of studying the topic of compiling an active type explanatory dictionary of the Ukrainian language. This lexicographic experiment was verified on the texts of the corps of semantic paradigms of words presented in the dictionaries of the 20th—21st centuries. It offers checking procedures and vocabulary and text filters. In contrast to the published explanatory dictionaries, the modeled description of language units provides a more complete picture about the functioning of the examined words in texts of different genres, their semantic potential, and stylistics. Keywords: anthropocentric lexicography, active type of dictionary, parametric noun, dictionary entry, text corpus.


2018 ◽  
Vol 24 (2) ◽  
pp. 195-214
Author(s):  
Daniel Śledziński

This paper presents the results of a preliminary investigation of subjective feelings related to the syllabification of Polish words written in orthographic form. The results are part of a wider study, and the data presented here are limited to polysegmental word-internal consonant clusters. In the author’s previous articles it was noted that some morphological boundaries are perceived as syllable boundaries – particularly boundaries between a prefix and a stem. The words that contain such boundaries were excluded from the investigation. The main goal was to verify whether the phonostatistical properties of consonant clusters influence subjective feelings related to syllabification. The investigated statistical proprieties concern the frequency of occurrence of consonant clusters, and of parts of them at the beginning of words, in the text corpus. Another goal was to verify whether the syllabification based on phonology differs from that based on subjective feelings.


2018 ◽  
Vol 8 (2) ◽  
pp. 51-62
Author(s):  
Md Jahurul ISLAM

This study investigated the phonemic status of the nasal vowels in Bangla (aka Bengali). It has been claimed for decades that all the seven monophthongal oral vowels in Bangla have phonemically contrastive nasal counterparts; however, an in-depth investigation of the status of nasality for all the vowels is lacking in the current literature. With a phoneme dictionary build from a text corpus of 8 (eight) million word-tokens and about 275 thousand word-types, this study investigated whether all the oral vowels have phonemically contrastive nasal vowels. Findings revealed that only five of the seven monophthongal vowels form phonemically contrastive relationships with their nasal counterparts; nasality in /æ/ and /ɔ/ are not contrastive phonemically.


Sign in / Sign up

Export Citation Format

Share Document