latent semantic indexing
Recently Published Documents


TOTAL DOCUMENTS

335
(FIVE YEARS 34)

H-INDEX

30
(FIVE YEARS 2)

2022 ◽  
pp. 155-170
Author(s):  
Lap-Kei Lee ◽  
Kwok Tai Chui ◽  
Jingjing Wang ◽  
Yin-Chun Fung ◽  
Zhanhui Tan

The dependence on Internet in our daily life is ever-growing, which provides opportunity to discover valuable and subjective information using advanced techniques such as natural language processing and artificial intelligence. In this chapter, the research focus is a convolutional neural network for three-class (positive, neutral, and negative) cross-domain sentiment analysis. The model is enhanced in two-fold. First, a similarity label method facilitates the management between the source and target domains to generate more labelled data. Second, term frequency-inverse document frequency (TF-IDF) and latent semantic indexing (LSI) are employed to compute the similarity between source and target domains. Performance evaluation is conducted using three datasets, beauty reviews, toys reviews, and phone reviews. The proposed method enhances the accuracy by 4.3-7.6% and reduces the training time by 50%. The limitations of the research work have been discussed, which serve as the rationales of future research directions.


Author(s):  
Jesus M. Meneses ◽  
Karen W. Cantilang ◽  
Delbert A. Dala ◽  
Jovito B. Madeja

The purpose of this study was to decode the hidden views and sentiments from the collated written responses of Eastern Samar State University’s Program Heads regarding supervision of instructions amidst the COVID-19 pandemic. This study utilized Exploratory Sequential Mixed Method to explore and understand the perspective or sentiments of Eastern Samar State University program heads towards supervision of instruction in the midst of the COVID-19 pandemic. Data were collected/collated from the participants indirectly using an interview questionnaire containing an open-ended question. The same were processed and analyzed using an open-source machine learning software called Orange toolbox (Demsar et al., 2013) wherein pre-processing, sentiment analysis and topic modelling built-in tools were utilized. The results showed that the most prominent words generated by the machine learning tool from the text file of responses are the words pandemic, performance, program, learning, difficult, supervision, instruction, internet, faculty, online students, teaching, delivery confusing, challenging, poor and connectivity. The dominant sentiment associated thereof lean towards negative polarity which implicate negative sentiments. Hidden topics were automatically generated by the machine which allowed the researchers to come up with the following related themes: “Impact of pandemic in the supervision of instruction of faculty and learning of students”, “Challenges in the delivery of instruction and supervision due to poor internet connectivity”, and “Strategic role of online modalities and connectivity in supervision and delivery of instruction”. There are limited researches navigating in text mining and sentiment analysis with the use of Orange toolbox particularly those that deals with supervision of instruction in a Philippine State University. There are related studies using machine learning software, but nothing like this study directed towards a specific gap in specific locale. KEYWORDS: Pandemic, Latent Semantic Indexing, Orange Toolbox, Sentiment Analysis, Thematic Analysis.


Author(s):  
Ansh Mehta

Abstract: Previous research on emotion recognition of Twitter users centered on the use of lexicons and basic classifiers on pack of words models, despite the recent accomplishments of deep learning in many disciplines of natural language processing. The study's main question is if deep learning can help them improve their performance. Because of the scant contextual information that most posts offer, emotion analysis is still difficult. The suggested method can capture more emotion sematic than existing models by projecting emoticons and words into emoticon space, which improves the performance of emotion analysis. In a microblog setting, this aids in the detection of subjectivity, polarity, and emotion. It accomplishes this by utilizing hash tags to create three large emotion-labeled data sets that can be compared to various emotional orders. Then compare the results of a few words and character-based repetitive and convolutional neural networks to the results of a pack of words and latent semantic indexing models. Furthermore, the specifics examine the transferability of the most recent hidden state representations across distinct emotional classes and whether it is possible to construct a unified model for predicting each of them using a common representation. It's been shown that repetitive neural systems, especially character-based ones, outperform pack-of-words and latent semantic indexing models. The semantics of the token must be considered while classifying the tweet emotion. The semantics of the tokens recorded in the hash map may be simply searched. Despite these models' low exchange capacities, the recently presented training heuristic produces a unity model with execution comparable to the three solo models. Keywords: Hashtags, Sentiment Analysis, Facial Recognition, Emotions.


Author(s):  
József Dr. Menyhárt ◽  
Joao Henrique Gomes Da Costa Cavalcanti

Artificial intelligence is becoming a powerful tool of modernity science, there is even a science consensus about how our society is turning to a data-driven society. Machine learning is a branch of Artificial intelligence that has the ability to learn from data and understand its behavers. Python programming language aiming the challenges of this new era is becoming one of the most popular languages for general programming and scientific computing. Keeping all this new era circumstances in mind, this article has as a goal to show one example of how to use one supervised machine learning method, Support Vector Machine, and to predict movie’s genre according to its description using the programming language of the moment, python. Firstly, Omdb official API was used to gather data about movies, then tuned Support Vector Machine model for Latent semantic indexing capable of predicting movies genres according to its plot was coded. The performance of the model occurred to be satisfactory considering the small dataset used and the occurrence of movies with hybrid genres. Testing the model with larger dataset and using multi-label classification models were purposed to improve the model.


2021 ◽  
Vol 4 (2) ◽  
pp. 64-70
Author(s):  
Agung Hasbi Ardiansyah ◽  
Kurnia Paranita Kartika ◽  
Saiful Nur Budiman

Ketika mendapat temuan atau laporan dugaan kasus pelanggaran pemilu, pengawas pemilu akan melakukan klarifikasi dan pencarian bukti-bukti yang cukup sebelum menentukan temuan atau laporan tersebut termasuk kedalam pelanggaran atau tidak. Pada saat proses klarifikasi, pengawas pemilu mencari pasal yang kemungkinan dilanggar pada temuan atau laporan yang masuk. Banyaknya pasal rujukan untuk masing-masing kasus pada temuan atau laporan terkadang menghambat pekerjaan petugas pengawas pemilu, sehingga dibutuhkan sebuah alat bantu untuk mempercepat proses pencarian pasal berdasarkan kasus pelanggaran. Pada penelitian ini, sistem temu balik informasi digunakan untuk mencari pasal-pasal pada undang-undang nomor 10 tahun 2016 yang relevan pada suatu kasus berdasarkan deskripsi kasus. Pada penelitian ini digunakan metode Latent Semantic Indexing (LSI). LSI menggunakan teknik Singular Value Decomposition (SVD) untuk mereduksi dimensi. Pada penelitian ini digunakan 37 pasal, dan 4 kasus atau deskripsi pelanggaran sebagai query. Sistem menerima masukkan berupa query atau deskripsi kasus pelanggaran kemudian menghitung dan menentukan pasal yang terkait. Tingkat keberhasilan dari metode ini untuk menemukan hasil pencarian yang relevan dapat dilihat melalui besar 100% untuk recall, 70% untuk precision dan 82% untuk f-measure.


2021 ◽  
Vol 9 (Suppl 1) ◽  
pp. e001287
Author(s):  
Robert P Lennon ◽  
Robbie Fraleigh ◽  
Lauren J Van Scoy ◽  
Aparna Keshaviah ◽  
Xindi C Hu ◽  
...  

Qualitative research remains underused, in part due to the time and cost of annotating qualitative data (coding). Artificial intelligence (AI) has been suggested as a means to reduce those burdens, and has been used in exploratory studies to reduce the burden of coding. However, methods to date use AI analytical techniques that lack transparency, potentially limiting acceptance of results. We developed an automated qualitative assistant (AQUA) using a semiclassical approach, replacing Latent Semantic Indexing/Latent Dirichlet Allocation with a more transparent graph-theoretic topic extraction and clustering method. Applied to a large dataset of free-text survey responses, AQUA generated unsupervised topic categories and circle hierarchical representations of free-text responses, enabling rapid interpretation of data. When tasked with coding a subset of free-text data into user-defined qualitative categories, AQUA demonstrated intercoder reliability in several multicategory combinations with a Cohen’s kappa comparable to human coders (0.62–0.72), enabling researchers to automate coding on those categories for the entire dataset. The aim of this manuscript is to describe pertinent components of best practices of AI/machine learning (ML)-assisted qualitative methods, illustrating how primary care researchers may use AQUA to rapidly and accurately code large text datasets. The contribution of this article is providing guidance that should increase AI/ML transparency and reproducibility.


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


2021 ◽  
Vol 11 (3) ◽  
pp. 113-137
Author(s):  
M. Fevzi Esen

A remarkable increase has currently been happening in social media platform content related to COVID-19. Users have created large volumes of content on various topics over a short time, interacting with people in real-time. This also has transformed social media into an indispensable information source for any crisis. This study aims to explore the information content on COVID-19 disseminated through social media and to discover prominent topics in shares on COVID-19. In this regard, we have retrieved 17,542 tweets shared in Turkish. A content analysis of social media shares has been carried out, with latent semantic indexing and network analyses being performed to detect the relationships and interactions among shares. As a result, the most shared topics have been concluded to be on yasak [lockdown], tedbir [precaution], karantina [quarantine], and vaka [case], with communication being frequently passed using this semantic string and information exchanges being faster within the network. In addition, shares related to hygiene, masks, and distancing were determined to have occurred less than shares related to precautions, rules, cases, and lockdowns. The number of likes and retweets for content with social propaganda such as #evdekal [stayathome], #evdehayatvar [lifeathome], and #birliktebaşaracağız [togetherwesucceed] were low and not found in a semantic string. This suggests social propaganda through social media to have had a limited impact on epidemic management. In conclusion, identifying the prominent issues in social media posts and the characteristics of social media networks will help decision-makers determine appropriate policies for controlling and preventing the pandemic’s spread.


Sign in / Sign up

Export Citation Format

Share Document