Improving Semantic Coherence of Gujarati Text Topic Model Using Inflectional Forms Reduction and Single-letter Words Removal

Author(s):  
Uttam Chauhan ◽  
Apurva Shah

A topic model is one of the best stochastic models for summarizing an extensive collection of text. It has accomplished an inordinate achievement in text analysis as well as text summarization. It can be employed to the set of documents that are represented as a bag-of-words, without considering grammar and order of the words. We modeled the topics for Gujarati news articles corpus. As the Gujarati language has a diverse morphological structure and inflectionally rich, Gujarati text processing finds more complexity. The size of the vocabulary plays an important role in the inference process and quality of topics. As the vocabulary size increases, the inference process becomes slower and topic semantic coherence decreases. If the vocabulary size is diminished, then the topic inference process can be accelerated. It may also improve the quality of topics. In this work, the list of suffixes has been prepared that encounters too frequently with words in Gujarati text. The inflectional forms have been reduced to the root words concerning the suffixes in the list. Moreover, Gujarati single-letter words have been eliminated for faster inference and better quality of topics. Experimentally, it has been proved that if inflectional forms are reduced to their root words, then vocabulary length is shrunk to a significant extent. It also caused the topic formation process quicker. Moreover, the inflectional forms reduction and single-letter word removal enhanced the interpretability of topics. The interpretability of topics has been assessed on semantic coherence, word length, and topic size. The experimental results showed improvements in the topical semantic coherence score. Also, the topic size grew notably as the number of tokens assigned to the topics increased.

Author(s):  
Myneni Madhu Bala ◽  
Venkata Krishnaiah Ravilla ◽  
Kamakshi Prasad V ◽  
Akhil Dandamudi

This chapter discusses mainly on dynamic behavior of railway passengers by using twitter data during regular and emergency situations. Social network data is providing dynamic and realistic data in various fields. As per the current chapter theme, if the twitter data of railway field is considered then it can be used for enhancement of railway services. Using this data, a comprehensive framework for modeling passenger tweets data which incorporates passenger opinions towards facilities provided by railways are discussed. The major issues elaborated regarding dynamic data extraction, preparation of twitter text content and text processing for finding sentiment levels is presented by two case studies; which are sentiment analysis on passenger's opinions about quality of railway services and identification of passenger travel demands using geotagged twitter data. The sentiment analysis ascertains passenger opinions towards facilities provided by railways either positive or negative based on their journey experiences.


2017 ◽  
Vol 15 (3) ◽  
pp. 1-14 ◽  
Author(s):  
Sanya Liu ◽  
Cheng Ni ◽  
Zhi Liu ◽  
Xian Peng ◽  
Hercy N.H. Cheng

Nowadays, Massive Open Online Courses (MOOC) has obtained a rapid development and drawn much attention from the areas of learning analytics and artificial intelligence. There are lots of unstructured data being generated in online reviews area. The learning behavioral data become more and more diverse, and they prompt the emergence of big data in education. To mine useful information from these data, we need to use educational data mining and learning analysis technique to study the learning feelings and discussed topics among learners. This paper aims to mine and analyze topic information hidden in the unstructured reviews data in MOOC, a novel author topic model based on an unsupervised learning idea is proposed to extract learning topics for the each learner. According to the experimental results, we will analyze and focuses of interests of learners, which facilitates further personalized course recommendation and improve the quality of online courses.


2021 ◽  
Author(s):  
Yue Niu ◽  
Hongjie Zhang

With the growth of the internet, short texts such as tweets from Twitter, news titles from the RSS, or comments from Amazon have become very prevalent. Many tasks need to retrieve information hidden from the content of short texts. So ontology learning methods are proposed for retrieving structured information. Topic hierarchy is a typical ontology that consists of concepts and taxonomy relations between concepts. Current hierarchical topic models are not specially designed for short texts. These methods use word co-occurrence to construct concepts and general-special word relations to construct taxonomy topics. But in short texts, word cooccurrence is sparse and lacking general-special word relations. To overcome this two problems and provide an interpretable result, we designed a hierarchical topic model which aggregates short texts into long documents and constructing topics and relations. Because long documents add additional semantic information, our model can avoid the sparsity of word cooccurrence. In experiments, we measured the quality of concepts by topic coherence metric on four real-world short texts corpus. The result showed that our topic hierarchy is more interpretable than other methods.


2021 ◽  
Vol 4 (4) ◽  
pp. 692
Author(s):  
Fuyudhatul Husna ◽  
Hesty Widiastuty ◽  
Aris Sugianto

The crucial problem of translating Indonesia to English language are the students’ lack of knowledge and mother tongue (source language) that two of them are grammar and vocabulary. The researcher focused to measure the correlation among grammar mastery and vocabulary size toward translation ability on report text at seventh semester students in State Islamic Institute of Palangka Raya that use quantitative method with a correlational design. The researcher’s instruments were three test which were grammar mastery, vocabulary size, and translation test that were tested to the 32 students’ translation class in academic year 2017/2018. The numerical data were analyzed by Pearson Product Moment that showed that: (1) the most students got “fail” (43.75%) grammar mastery, (2) the most students got “excellent”  (46.875%) vocabulary size, (3) the most students got “enough” (87.5%) translation ability. The significant correlation among three variables were proved by correlation coefficient 0.604 (strong category),  Fchange > Ftable = 8.349 > 3.33, and the contribution of grammar mastery and vocabulary size delivered 36.5%. Thus, it sums that students’ grammar mastery and vocabulary size correlate with the quality of students’ translation ability on report text at seventh semester students in academic year 2017/2018. Keywords:  Grammar Mastery, Vocabulary, Translation


Author(s):  
L. P. Vershinina ◽  

The basis of modern decision support systems is not so much analytical and statistical models as the practical application of specialists ‘ knowledge. Such systems are based on fuzzy technologies. The quality of decisions made depends on how accurately the quality of information is reflected in the fuzzy inference process. Ways to improve the objectivity of fuzzy inference at the stages of fuzzification, aggregation, activation, and accumulation are proposed.


2019 ◽  
Vol 108 ◽  
pp. 21-25 ◽  
Author(s):  
IZABELA BETLEJ

Studies on the diversity of substrate composition in the culture medium of Kombucha microorganisms and its influence on the quality of synthesized cellulose. The paper presents the results of the assessment of the effect of nutrients, specifically different nitrogen concentrations in the growth medium of Kombucha microorganisms, on the morphology of cellulose produced and its sorption capacity. Analyzing the obtained research results, we found that polymers formed in different growth environments differ in morphological structure and swelling index. The polymers synthesized on a nitrogen-rich substrate were characterized by a multilayer structure and a lower swelling index than the polymers obtained on a nutrient-poor substrate


2019 ◽  
Vol 28 (3) ◽  
pp. 263-272 ◽  
Author(s):  
Tobias Hecking ◽  
Loet Leydesdorff

AbstractWe replicate and analyze the topic model which was commissioned to King’s College and Digital Science for the Research Evaluation Framework (REF 2014) in the United Kingdom: 6,638 case descriptions of societal impact were submitted by 154 higher-education institutes. We compare the Latent Dirichlet Allocation (LDA) model with Principal Component Analysis (PCA) of document-term matrices using the same data. Since topic models are almost by definition applied to text corpora which are too large to read, validation of the results of these models is hardly possible; furthermore the models are irreproducible for a number of reasons. However, removing a small fraction of the documents from the sample—a test for reliability—has on average a larger impact in terms of decay on LDA than on PCA-based models. The semantic coherence of LDA models outperforms PCA-based models. In our opinion, results of the topic models are statistical and should not be used for grant selections and micro decision-making about research without follow-up using domain-specific semantic maps.


Foods ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 113 ◽  
Author(s):  
Maria Volpe ◽  
Elena Coccia ◽  
Francesco Siano ◽  
Michele Di Stasio ◽  
Marina Paolucci

In this study different methods were used to evaluate the effectiveness of a carrageenan coating and carrageenan coating incorporating lemon essential oil (ELO) in preserving the physicochemical and olfactory characteristics of trout fillets stored at 4 °C up to 12 days. The fillet morphological structure was analyzed by histological and immunological methods; lipid peroxidation was performed with the peroxide and thiobarbituric acid reactive substances (TBARS) tests. At the same time, two less time-consuming methods, such as Attenuated Total Reflectance-Fourier Transformed Infrared (ATR-FTIR) spectroscopy and the electronic nose, were used. Uncoated trout fillets (UTF) showed a less compact tissue structure than carrageenan-coated threads (CTF) and coated fillets of carrageenan (active) ELO (ACTF), probably due to the degradation of collagen, as indicated by optical microscopy and ATR-FTIR. UTF showed greater lipid oxidation compared to CTF and ACTF, as indicated by the peroxide and TBARS tests and ATR-FTIR spectroscopy. The carrageenan coating containing ELO preserved the olfactory characteristics of the trout fillets better than the carrageenan coating alone, as indicated by the electronic nose analysis. This study confirms that both carrageenan and ELO containing carrageenan coatings slow down the decay of the physicochemical and olfactory characteristics of fresh trout fillets stored at 4 °C, although the latter is more effective.


Sign in / Sign up

Export Citation Format

Share Document