Lects in Helsinki Finnish - a probabilistic component modeling approach

Abstract This article examines Finnish lects spoken in Helsinki from the 1970s to the 2010s with a probabilistic model called Latent Dirichlet Allocation. The model searches for underlying components based on the linguistic features used in the interviews. Several coherent lects were discovered as components in the data, which counters the results of previous studies that report only weak covariation between features that are assumed to be present in the same lect. The speakers, however, are not categorical in their linguistic behavior and tend to use more than one lect in their speech. This implies that the lects should not be considered in parallel with seemingly uniform linguistic systems such as languages, but as partial systems that constitute a network.

Download Full-text

Modern Probabilistic Model: Filtering Massive Data in E-learning

Iraqi Journal of Science ◽

10.24996/ijs.2021.si.1.8 ◽

2021 ◽

pp. 52-58

Author(s):

Hachem Harouni Alaoui ◽

Elkaber Hachem ◽

Cherif Ziti

Keyword(s):

Machine Learning ◽

Probabilistic Model ◽

Latent Dirichlet Allocation ◽

Web Pages ◽

Massive Data ◽

Learning Session ◽

Learning Procedure ◽

E Learning ◽

Dirichlet Allocation

So muchinformation keeps on being digitized and stored in several forms, web pages, scientific articles, books, etc. so the mission of discovering information has become more and more challenging. The requirement for new IT devices to retrieve and arrange these vastamounts of informationaregrowing step by step. Furthermore, platforms of e-learning are developing to meet the intended needsof students.The aim of this article is to utilize machine learning to determine the appropriate actions that support the learning procedure and the Latent Dirichlet Allocation (LDA) so as to find the topics contained in the connections proposed in a learning session. Ourpurpose is also to introduce a course which moves toward the student's attempts and which reduces the unimportant recommendations (Which aren’t proper to the need of the student grown-up) through the modeling algorithms of the subjects.

Download Full-text

A WORD POSITION-RELATED LDA MODEL

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008890 ◽

2011 ◽

Vol 25 (06) ◽

pp. 909-925 ◽

Cited By ~ 3

Author(s):

LIDONG ZHAI ◽

ZHAOYUN DING ◽

YAN JIA ◽

BIN ZHOU

Keyword(s):

Experimental Data ◽

Probabilistic Model ◽

Latent Dirichlet Allocation ◽

Semantic Information ◽

Experimental Results ◽

New Method ◽

Word Position ◽

Average Improvement ◽

Latent Topics ◽

Dirichlet Allocation

LDA (Latent Dirichlet Allocation) proposed by Blei is a generative probabilistic model of a corpus, where documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words, but not the attributes of word positions of every document in the corpus. In this paper, a Word Position-Related LDA Model is proposed taking into account the attributes of word positions of every document in the corpus, where each word is characterized by a distribution over word positions. At the same time, the precision of the topic-word's interpretability is improved by integrating the distribution of the word-position and the appropriate word degree, taking into account the different word degree in the different word positions. Finally, a new method, a size-aware word intrusion method is proposed to improve the ability of the topic-word's interpretability. Experimental results on the NIPS corpus show that the Word Position-Related LDA Model can improve the precision of the topic-word's interpretability. And the average improvement of the precision in the topic-word's interpretability is about 9.67%. Also, the size-aware word intrusion method can interpret the topic-word's semantic information more comprehensively and more effectively through comparing the different experimental data.

Download Full-text

Estimating News Coverage Patterns using Latent Dirichlet Allocation (LDA)

Vol 3 No 2 - Sukkur IBA Journal of Emerging Technologies ◽

10.30537/sjet.v1i1.142 ◽

2018 ◽

Vol 1 (1) ◽

pp. 51-56

Author(s):

Naeem Ahmed Mahoto

Keyword(s):

Knowledge Discovery ◽

Probabilistic Model ◽

Data Model ◽

Latent Dirichlet Allocation ◽

News Coverage ◽

Multidimensional Data ◽

Large Collection ◽

Allocation Pattern ◽

Textual Data ◽

Dirichlet Allocation

The growing rate of unstructured textual data has made an open challenge for the knowledge discovery, which aims extracting desired information from large collection of data. This study presents a system to derive news coverage patterns with the help of probabilistic model – Latent Dirichlet Allocation. Pattern is an arrangement of words within collected data that more likely appear together in certain context. The news coverage patterns have been computed as number function of news articles comprising of such patterns. A prototype, as a proof, has been developed to estimate the news coverage patterns for a newspaper – The Dawn. Analyzing the news coverage patterns from different aspects has been carried out using multidimensional data model. Further, the extracted news coverage patterns are illustrated by visual graphs to yield in-depth understanding of the topics, which have been covered in the news. The results also assist in identification of schema related to newspaper and journalists’ articles.

Download Full-text

TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation

Information Systems ◽

10.1016/j.is.2013.11.003 ◽

2014 ◽

Vol 42 ◽

pp. 59-77 ◽

Cited By ~ 48

Author(s):

Younghoon Kim ◽

Kyuseok Shim

Keyword(s):

Probabilistic Model ◽

Recommendation System ◽

Latent Dirichlet Allocation ◽

Model Based ◽

Dirichlet Allocation

Download Full-text

From audio to information: Learning topics from audio transcripts

10.5753/kdmile.2020.11967 ◽

2020 ◽

Author(s):

João Pedro Rodrigues ◽

Emerson Paraiso

Keyword(s):

Data Acquisition ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Post Processing ◽

Technical Feasibility ◽

Allocation Algorithm ◽

Modeling Approach ◽

Ideal Number ◽

The Ideal ◽

Dirichlet Allocation

In this work, the technical feasibility of working with audio transcriptions from Youtube is analyzed, as well as presenting a method that allows data acquisition, pre-processing, and post-processing to work with this type of data. A topic modeling approach with the latent dirichlet allocation algorithm is used. An approach is also presented to dynamically determine the ideal number of topics that make up a given corpus. In the experiments, a database of 250 audio transcriptions was used, obtaining a model with coherence in the range of 40%.

Download Full-text

Evaluation of Text Semantic Features using Latent Dirichlet Allocation Model

International Journal of Performability Engineering ◽

10.23940/ijpe.20.06.p15.968978 ◽

2020 ◽

Vol 16 (6) ◽

pp. 968

Author(s):

Zhou Chunjie ◽

Li Nao ◽

Zhang Chi ◽

Yang Xiaoyu

Keyword(s):

Latent Dirichlet Allocation ◽

Semantic Features ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Similarity Detection Using Latent Semantic Analysis Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.124 ◽

2018 ◽

Vol 6 (8) ◽

pp. 102

Author(s):

Priyanka R. Patil ◽

Shital A. Patil

Keyword(s):

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Mining Method ◽

Research Papers ◽

Information Measures ◽

Automated Software ◽

Day By Day ◽

Ways Of Life ◽

Dirichlet Allocation

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.

Download Full-text