scholarly journals Chinese Microblog Topic Detection through POS-Based Semantic Expansion

Information ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 203 ◽  
Author(s):  
Lianhong Ding ◽  
Bin Sun ◽  
Peng Shi

A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect topics from Chinese microblogs. Since traditional methods showed low performance on a short text from a microblog, we put forward a topic detection method based on the semantic description of the microblog post. The semantic expansion of the post supplies more information and clues for topic detection. First, semantic features are extracted from a microblog post. Second, the semantic features are expanded according to a thesaurus. Here TongYiCi CiLin is used as the lexical resource to find words with the same meaning. To overcome the polysemy problem, several semantic expansion strategies based on part-of-speech are introduced and compared. Third, an approach to detect topics based on semantic descriptions and an improved incremental clustering algorithm is introduced. A dataset from Sina Weibo is employed to evaluate our method. Experimental results show that our method can bring about better results both for post clustering and topic detection in Chinese microblogs. We also found that the semantic expansion of nouns is far more efficient than for other parts of speech. The potential mechanism of the phenomenon is also analyzed and discussed.

2012 ◽  
Vol 532-533 ◽  
pp. 1716-1720 ◽  
Author(s):  
Chun Xia Jin ◽  
Hai Yan Zhou ◽  
Qiu Chan Bai

To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.


The relevance of the research is due to the increased attention of linguists to grammatical homonymy. Within the framework of grammatical homonymy, morphological, interpart-of-speech and syntactic homonyms are distinguished. The focus is on the problems of part-of-speech homonymy, in particular on the phenomenon of morphological syncretism due to the ambiguity of structural and semantic features of parts of speech and changes of the morphological status of certain words in different syntagmatic environments. Changes in the categorical-semantic meaning of the lexical unit as, synonymous series of components of the specified sound complex, the nature of the syntagmatic environment, syntactic functions, positional fixation in a certain syntax unit, functional phraseology in compound conjunctions and particles are described. The conceptual scope of the term „homocomplex” is considered, it is defined as a sound complex, which is used to denote the title of a group of functional homonyms and words of the zone of syncretism. It is established that the homocomplex as is represented by three grammatical homonyms such as adverb, conjunction and particle. The source word for the formation of derivatives of the conjunction and particle is the adverb as. In the syntactic position of the adverb, this lexical unit appears in the adverbial position, expressing the following meanings: the question of manner (how?); the degree of detection of an action, state (very, extremely); mode of action (how); time of action (when); indefinite way (somehow). In the syntactic sphere of the conjunction, losing the ability to express a sign, the lexical unit „how” often serves as a means of expressing comparative semantic-syntactic relations; forming phraseologized compounds, it can act as an expression of clauses of condition, time and concession. Not denoting defining and adverbial meanings and not combining parts of a compound sentence, the lexical unit as belongs to the class of particles. A typical function of this particle is an amplifying one. It is complemented by additional semantic shades of meaning, such as „very”, „extremely”, „suddenly”, etc., which serve to express the speaker’s surprise, indignation, dissatisfaction, surprise, and others.


2021 ◽  
Vol 22 (4) ◽  
pp. 1126-1133
Author(s):  
O. N. Sadovnikova ◽  
I. V. Sharavieva

The article presents a new view on Chinese modal verbs as a part of speech. Based on the typology of the Chinese language, the authors analyzed modal verbs according to their functional-syntactic, formal-morphological, and semantic features in order of importance. The article discusses the position of modal operators in the sentence and other characteristics. For instance, Chinese modal verbs have no impact on the object and cannot independently form sentences or combine with grammemes. Therefore, the authors believe that Chinese modal verbs (modal operators) belong to the lexical-grammatical group of adverbs as a special category of intentional adverbs. Their intentionality reflects the outward focus of linguistic consciousness, based on the internal reference point of the speaker. The group includes such meanings as "wish", "obligation", "opportunity", "permission", and "will". The research owes its theoretical significance to the fact that it contributes to a better understanding of the essence and nature of modal operators and modality meanings, identifying them as a separate group of adverbs. The obtained results are applicable in the field of theoretical grammar of the Chinese language and can be used in researches related to further analysis of parts of speech problem in the Chinese language.


Author(s):  
Yabing Zhang

This article is devoted to the problem of using Russian time-prepositions by foreigners, especially by the Chinese. An analysis of modern literature allows the author to identify the main areas of the work aimed at foreign students’ development of the skills and abilities to correctly build the prepositional combinations and continuously improve the communication skills by means of the Russian language. In this paper, the time-prepositions in the Russian language have been analyzed in detail; some examples of polysemantic use of prepositions, their semantic and stylistic shades alongside with possible errors made by foreign students are presented. The results of the study are to help in developing a system of teaching Russian time-prepositions to a foreign language audience, taking into account their native language, on the basis of the systemic and functional, communicative and activity-centred basis. The role of Russian time-prepositions in constructing word combinations has been identified; the need for foreign students’ close attention to this secondary part of speech has been specified. It has been stated that prepositions are the most dynamic and open type of secondary language units within the quantitative and qualitative composition of which regular changes take place. The research substantiates the need that students should be aware of the function of time-preposition in speech; they are to get acquainted with the main time-prepositions and their meanings, to distinguish prepositions and other homonymous parts of speech as well as to learn stylistic shades of time-prepositions. Some recommendations related to the means of mastering time-prepositions have been given: to target speakers to assimilate modern literary norms and, therefore, to teach them how to choose and use them correctly by means of linguistic keys that are intended to fill the word with true meaning, to give it an organic structure, an inherent form and an easy combinability in the texts and oral speech.


2019 ◽  
Vol 15 (2) ◽  
pp. 155-182 ◽  
Author(s):  
Issa Alsmadi ◽  
Keng Hoon Gan

PurposeRapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.Design/methodology/approachThe paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.FindingsThis paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.Originality/valueUsing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.


2018 ◽  
Vol 89 (16) ◽  
pp. 3244-3259 ◽  
Author(s):  
Sumit Mandal ◽  
Simon Annaheim ◽  
Andre Capt ◽  
Jemma Greve ◽  
Martin Camenzind ◽  
...  

Fabric systems used in firefighters' thermal protective clothing should offer optimal thermal protective and thermo-physiological comfort performances. However, fabric systems that have very high thermal protective performance have very low thermo-physiological comfort performance. As these performances are inversely related, a categorization tool based on these two performances can help to find the best balance between them. Thus, this study is aimed at developing a tool for categorizing fabric systems used in protective clothing. For this, a set of commercially available fabric systems were evaluated and categorized. The thermal protective and thermo-physiological comfort performances were measured by standard tests and indexed into a normalized scale between 0 (low performance) and 1 (high performance). The indices dataset was first divided into three clusters by using the k-means algorithm. Here, each cluster had a centroid representing a typical Thermal Protective Performance Index (TPPI) value and a typical Thermo-physiological Comfort Performance Index (TCPI) value. By using the ISO 11612:2015 and EN 469:2014 guidelines related to the TPPI requirements, the clustered fabric systems were divided into two groups: Group 1 (high thermal protective performance-based fabric systems) and Group 2 (low thermal protective performance-based fabric systems). The fabric systems in each of these TPPI groups were further categorized based on the typical TCPI values obtained from the k-means clustering algorithm. In this study, these categorized fabric systems showed either high or low thermal protective performance with low, medium, or high thermo-physiological comfort performance. Finally, a tool for using these categorized fabric systems was prepared and presented graphically. The allocations of the fabric systems within the categorization tool have been verified based on their properties (e.g., thermal resistance, weight, evaporative resistance) and construction parameters (e.g., woven, nonwoven, layers), which significantly affect the performance. In this way, we identified key characteristics among the categorized fabric systems which can be used to upgrade or develop high-performance fabric systems. Overall, the categorization tool developed in this study could help clothing manufacturers or textile engineers select and/or develop appropriate fabric systems with maximum thermal protective performance and thermo-physiological comfort performance. Thermal protective clothing manufactured using this type of newly developed fabric system could provide better occupational health and safety for firefighters.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 9225-9231 ◽  
Author(s):  
Chengde Zhang ◽  
Shaozhen Lu ◽  
Chengming Zhang ◽  
Xia Xiao ◽  
Qian Wang ◽  
...  

2018 ◽  
Vol 6 (2) ◽  
pp. 83
Author(s):  
Refat Aljumily

The aim of this paper was to evaluate the efficiency of automated linguistic features to test its capacity or discriminating power as style markers for author identification in short text messages of the Facebook genre. The corpus used to evaluate the automated linguistics features was compiled from 221 Facebook texts (each text is about 2 to 3 lines/35-40 words) written in English, which were written in the same genre and topic and posted in the same year group, totaling 7530 words. To compose the dataset for linguistic features performance or evaluation, frequency values were collected from 16 linguistic feature types involving parts of speech, function words, word bigrams, character tri grams, average sentence length in terms of words, average sentence length in terms of characters, Yule’s K measure, Simpson’s D measure, average words length, FW/CW ratio, average characters, content specific key words, type/token ratio, total number of short words less than four characters, contractions, and total number of characters in words which were selected from five corpora, totalling 328 test features. The evaluation of the 16 linguistic feature types differ from those of other analyses because the study used different variable selection methods including feature type frequency, variance, term frequency/ inverse document frequency (TF.IDF), signal-noise ratio, and Poisson term distribution. The relationships between known and anonymous text messages were examined using hierarchical linear and non-hierarchical nonlinear clustering methods, taking into accounts the nonlinear patterns among the data. There were similarities between the anonymous text messages and the authors of the non-anonymous text messages in terms function word and parts of speech usages based on TF.IDF technique and the efficiency of function word usages (=60%) and the efficiency of parts of speech frequencies (=50%). There were no similarities between the anonymous text messages and the authors of the non-anonymous text messages in terms of the other features using feature type frequency and variance techniques in this test and the efficiency of these features in the corpus (< 40%). There was a positive effect on identification performance using parts of speech and function word frequency usages and applying TF.IDF technique as the length of text messages increased (N≥ 100). Through this way, the performance and efficiency of syntactic features and function word usages to identify anonymous authors or text messages is improved by increasing the length of the text messages using TF.IDF variable selection technique, but decreased as feature type frequency and variance techniques in the selection process apply.


enadakultura ◽  
2021 ◽  
Author(s):  
Tamar Makharoblidze

The question of derivates has been repeatedly raised in the teaching processes of language grammar and general linguistics. This circumstance became the basis for creating this short article. It is well known that a word-form can be changeable or unchangeable, and this fact is determined by the parts of speech. Form-changing words can undergo two types of change: inflectional and derivative. During the inflectional change, the form of the word changes, but the lexical and semantic aspects of the word do not change, i.e. its semantic and content data do not change. A classic example of this type of change is flexion of nouns.Derivation is the formation of a word from another word by the addition of non-inflectional affixes. Derivation can be of two types. The first is lexical derivation, in which the derivative affix produces a word with a different lexical content. A word-form can be another part of speech or the same part of speech but with a different lexical content. The second type of derivation is, first of all, grammatical derivation, when grammatical categories are produced. The grammatical category in general (and a word-form in general as well) includes the unity of morphological and semantical aspects. There is no separate semantics without morphology. Any semantic category and/or content must be conveyed in a specific form, so only a specific form has a specific morphosemantics, which can be produced by the grammatical derivatives. The main difference between the two types of derivation mentioned above (and therefore between the two types of derivatives) is the levels of the language hierarchy. The first type of affixes works at the lexical level of the language, while the second type derivatives produce forms at the morphological and semantic levels. The second type derivatives are inter-level affixes, because they act on two hierarchical levels. Any grammatical category includes specific morphosemantic oppositional forms. Thus, unlike inflectional affixes, the rest of the morphological affixes are all other types of inter-level derivatives. It should be noted that the preverb in Kartvelian languages ​​is the only linguistic unit with all possible functions of affix. DOWNLOADS


2014 ◽  
Vol 971-973 ◽  
pp. 1747-1751 ◽  
Author(s):  
Lei Zhang ◽  
Hai Qiang Chen ◽  
Wei Jie Li ◽  
Yan Zhao Liu ◽  
Run Pu Wu

Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.


Sign in / Sign up

Export Citation Format

Share Document