A Novel classification framework for the Thirukkural for building an efficient search system

2021 ◽  
pp. 1-12
Author(s):  
Anita Ramalingam ◽  
Subalalitha Chinnaudayar Navaneethakrishnan

Thirukkural, a Tamil classic literature, which was written in 300 BCE is a didactic literature. Though Thirukkural comprises 1330 couplets which are organized into three sections and 133 chapters, in order to retrieve meaningful Thirukkural for a given query in search systems, a better organization of the Thirukkural is needed. This paper lays such a foundation by classifying the Thirukkural into ten new categories called superclasses that is helpful for building a better Information Retrieval (IR) system. The classifier is trained using Multinomial Naïve Bayes algorithm. Each superclass is further classified into two subcategories based on the didactic information. The proposed classification framework is evaluated using precision, recall and F-score metrics and achieved an overall F-score of 82.33% and a comparison analysis has been done with the Support Vector Machine, Logistic Regression and Random Forest algorithms. An IR system is built on top of the proposed system and the performance comparison has been done with the Google search and a locally built keyword search. The proposed classification framework has achieved a mean average precision score of 89%, whereas the Google search and keyword search have yielded 59% and 68% respectively.

2020 ◽  
Vol 4 (2) ◽  
pp. 362-369
Author(s):  
Sharazita Dyah Anggita ◽  
Ikmah

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2021 ◽  
Vol 11 (10) ◽  
pp. 4443
Author(s):  
Rokas Štrimaitis ◽  
Pavel Stefanovič ◽  
Simona Ramanauskaitė ◽  
Asta Slotkienė

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).


Diagnostics ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 804
Author(s):  
Jasminka Hasic Telalovic ◽  
Serena Pillozzi ◽  
Rachele Fabbri ◽  
Alice Laffi ◽  
Daniele Lavacchi ◽  
...  

The application of machine learning (ML) techniques could facilitate the identification of predictive biomarkers of somatostatin analog (SSA) efficacy in patients with neuroendocrine tumors (NETs). We collected data from 74 patients with a pancreatic or gastrointestinal NET who received SSA as first-line therapy. We developed three classification models to predict whether the patient would experience a progressive disease (PD) after 12 or 18 months based on clinic-pathological factors at the baseline. The dataset included 70 samples and 15 features. We initially developed three classification models with accuracy ranging from 55% to 70%. We then compared ten different ML algorithms. In all but one case, the performance of the Multinomial Naïve Bayes algorithm (80%) was the highest. The support vector machine classifier (SVC) had a higher performance for the recall metric of the progression-free outcome (97% vs. 94%). Overall, for the first time, we documented that the factors that mainly influenced progression-free survival (PFS) included age, the number of metastatic sites and the primary site. In addition, the following factors were also isolated as important: adverse events G3–G4, sex, Ki67, metastatic site (liver), functioning NET, the primary site and the stage. In patients with advanced NETs, ML provides a predictive model that could potentially be used to differentiate prognostic groups and to identify patients for whom SSA therapy as a single agent may not be sufficient to achieve a long-lasting PFS.


2021 ◽  
pp. 191-210
Author(s):  
Nikolay D. Golev ◽  
◽  
Irina P. Falomkina ◽  

The paper is dedicated to describing the word-building system of the Russian language in terms of its vocabulary. Lexical factors are discussed influencing the formation of lexical units’ potential as motivating units of word-building processes and relations and the realization of this potential in language activities. Of most interest for the authors are anthropocentric determinants, most of which are coordinating the lexical system and, through its mediation, the word-building system with the worldview of native speakers of the Russian language. The proposed model of derivational development of vocabulary provides such coordination through studying the deep-seated process of conceptualization of the words that are the potential motivators of neologisms. This study identifies the word frequency as an external manifestation of conceptualization. The frequency data were obtained from Google search system statistical data. Capturing not only usual but also occasional and potential words, this source is an effective tool for studying word-building processes and their results. This study has unveiled the interrelation between the language worldview of native speakers of Russian and their “word-building behavior” in language activities. The worldview has been found, first of all, to be determined by the pragmatic factor, which primarily influences the usage of a word in the speech reflected by its frequency. The frequency ranks lexical units due to their derivational potential and thereby provides a researcher with a reliable instrument for its study.


Sign in / Sign up

Export Citation Format

Share Document