Latent semantic analysis for tagging activation states and identifiability in northwestern Mexican news outlets

The present work aims to study the relationship between measures, obtained from Latent Semantic Analysis (LSA) and a variant known as SPAN, and activation and identifiability states (Informative States) of referents in noun phrases present in journalistic notes from Northwestern Mexican news outlets written in Spanish. The aim and challenge is to find a strategy to achieve labelling of new / given information in the discourse rooted in a theoretically linguistic stance. The new / given distinction can be defined from different perspectives in which it varies what linguistic forms are taken into account. Thus, the focus in this work is to work with full referential devices (n = 2 388). Pearson’s R correlation tests, analysis of variance, graphical exploration of the clustering of labels, and a classification experiment with random forests are performed. For the experiment, two groups were used: noun phrases labeled with all 10 tags of informative states and a binary labelling, as well as the use of two bags-of-words for each noun phrase: the interior and the exterior. It was found that using LSA in conjunction with the inner bag of words can be used to classify certain informational states. This same measure showed good results for the binary division, detecting which sentences introduce new referents in discourse. In previous work using a similar method in noun phrases in English, 80% accuracy (n = 478) was reached in their classification exercise. Our best test for Spanish reached 79%. No work on Spanish using this method has been done before and this kind of experiment is important because Spanish exhibits a more complex inflectional morphology.

Download Full-text

Pembentukan Vector Space Model Bahasa Indonesia Menggunakan Metode Word to Vector

Jurnal Buana Informatika ◽

10.24002/jbi.v10i1.2053 ◽

2019 ◽

Vol 10 (1) ◽

pp. 29

Author(s):

Yulius Denny Prabowo ◽

Tedi Lesmana Marselino ◽

Meylisa Suryawiguna

Keyword(s):

Vector Space ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Language Model ◽

Vector Space Model ◽

Online News ◽

Bag Of Words ◽

Space Model ◽

Language Research ◽

Bahasa Indonesia

Extracting information from a large amount of structured data requires expensive computing. The Vector Space Model method works by mapping words in continuous vector space where semantically similar words are mapped in adjacent vector spaces. The Vector Space Model model assumes words that appear in the same context, having the same semantic meaning. In the implementation, there are two different approaches: counting methods (eg: Latent Semantic Analysis) and predictive methods (eg Neural Probabilistic Language Model). This study aims to apply Word2Vec method using the Continuous Bag of Words approach in Indonesian language. Research data was obtained by crawling on several online news portals. The expected result of the research is the Indonesian words vector mapping based on the data used.Keywords: vector space model, word to vector, Indonesian vector space model.Ekstraksi informasi dari sekumpulan data terstruktur dalam jumlah yang besar membutuhkan komputasi yang mahal. Metode Vector Space Model bekerja dengan cara memetakan kata-kata dalam ruang vektor kontinu dimana kata-kata yang serupa secara semantis dipetakan dalam ruang vektor yang berdekatan. Metode Vector Space Model mengasumsikan kata-kata yang muncul pada konteks yang sama, memiliki makna semantik yang sama. Dalam penerapannya ada dua pendekatan yang berbeda yaitu: metode yang berbasis hitungan (misal: Latent Semantic Analysis) dan metode prediktif (misalnya Neural Probabilistic Language Model). Penelitian ini bertujuan untuk menerapkan metode Word2Vec menggunakan pendekatan Continuous Bag Of Words model dalam Bahasa Indonesia. Data penelitian yang digunakan didapatkan dengan cara crawling pada berberapa portal berita online. Hasil penelitian yang diharapkan adalah pemetaan vektor kata Bahasa Indonesia berdasarkan data yang digunakan.Kata Kunci: vector space model, word to vector, vektor kata bahasa Indonesia.

Download Full-text

Del píxel a las resonancias visuales: La imagen con voz propia

AusArt ◽

10.1387/ausart.16670 ◽

2016 ◽

Vol 4 (1) ◽

pp. 19-28

Author(s):

Pilar Rosado Rodrigo ◽

Eva Figueras Ferrer ◽

Ferran Reverter Comes

Keyword(s):

Computer Vision ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Specific Model ◽

Visual Word ◽

Probabilistic Latent Semantic Analysis ◽

Bag Of Words ◽

Regular Grid ◽

Image Description ◽

Visual Words

Esta investigación aborda el problema de la detección aspectos latentes en grandes colecciones de imágenes de obras de artista abstractas, atendiendo sólo a su contenido visual. Se ha programado un algoritmo de descripción de imágenes utilizado en visión artificial cuyo enfoque consiste en colocar una malla regular de puntos de interés en la imagen y seleccionar alrededor de cada uno de sus nodos una región de píxeles para la que se calcula un descriptor que tiene en cuenta los gradientes de grises encontrados. Los descriptores de toda la colección de imágenes se pueden agrupar en función de su similitud y cada grupo resultante pasará a determinar lo que llamamos “palabras visuales”. El método se denomina Bag-of-Words (bolsa de palabras). Teniendo en cuenta la frecuencia con que cada “palabra visual” ocurre en cada imagen, aplicamos el modelo estadístico pLSA (Probabilistic Latent Semantic Analysis), que clasificará de forma totalmente automática las imágenes según su categoría formal. Esta herramienta resulta de utilidad tanto en el análisis de obras de arte como en la producción artística. Palabras-clave: visión artificial; modelo Bag-of-Words; CBIR (Recuperación de imágenes por contenido); pLSA (ANÁLISIS PROBABILÍSTICO DE ASPECTOS LATENTES); palabra visual From pixel to visual resonances: Images with voicesAbstractThe objective of our research is to develop a series of computer vision programs to search for analogies in large datasets—in this case, collections of images of abstract paintings—based solely on their visual content without textual annotation. We have programmed an algorithm based on a specific model of image description used in computer vision. This approach involves placing a regular grid over the image and selecting a pixel region around each node. Dense features computed over this regular grid with overlapping patches are used to represent the images. Analysing the distances between the whole set of image descriptors we are able to group them according to their similarity and each resulting group will determines what we call "visual words". This model is called Bag-of-Words representation Given the frequency with which each visual word occurs in each image, we apply the method pLSA (Probabilistic Latent Semantic Analysis), a statistical model that classifies fully automatically, without any textual annotation, images according to their formal patterns. In this way, the researchers hope to develop a tool both for producing and analysing works of art. Keywords: artificial visión; Bag-of-Words model; CBIR (Content-Based Image Retrieval); pLSA (Probabilistic Latent Semantic Analysis); visual word

Download Full-text

The correspondence between sentence production and corpus frequencies in modifier attachment

The Quarterly Journal of Experimental Psychology Section A ◽

10.1080/02724980143000604 ◽

2002 ◽

Vol 55 (3) ◽

pp. 879-896 ◽

Cited By ~ 22

Author(s):

Timothy Desmet ◽

Marc Brysbaert ◽

Constantijn De Baecke

Keyword(s):

Noun Phrase ◽

Relative Clause ◽

Relative Clauses ◽

Noun Phrases ◽

Sentence Production ◽

Syntactic Ambiguity ◽

Attachment Preference ◽

Attachment Sites ◽

The Relationship ◽

Relative Clause Attachment

We examined the production of relative clauses in sentences with a complex noun phrase containing two possible attachment sites for the relative clause (e.g., “Someone shot the servant of the actress who was on the balcony.”). On the basis of two corpus analyses and two sentence continuation tasks, we conclude that much research about this specific syntactic ambiguity has used complex noun phrases that are quite uncommon. These noun phrases involve the relationship between two humans and, at least in Dutch, induce a different attachment preference from noun phrases referring to non-human entities. We provide evidence that the use of this type of complex noun phrase may have distorted the conclusions about the processes underlying relative clause attachment. In addition, it is shown that, notwithstanding some notable differences between sentence production in the continuation task and in coherent text writing, there seems to be a remarkable correspondence between the attachment patterns obtained with both modes of production.

Download Full-text

Errors in Translation: A Comparative Study of Noun Phrase in English and Malay Abstracts

Advances in Language and Literary Studies ◽

10.7575/aiac.alls.v.9n.5p.17 ◽

2018 ◽

Vol 9 (5) ◽

pp. 17

Author(s):

Ainul Azmin Md Zamin ◽

Raihana Abu Hasan

Keyword(s):

Noun Phrase ◽

Word Order ◽

Semantic Analysis ◽

Noun Phrases ◽

Structure Change ◽

Postgraduate Students ◽

Language Competency ◽

Meaningful Group ◽

Translation Errors ◽

English Noun

Abstract as a summary of a dissertation harbours important information where it serves to attract readers to consider reading the entire passage or to abandon it. This study seeks to investigate the backward translation of abstracts made by 10 randomly selected postgraduate students. This research serves as a guideline for students in composing their abstracts as it aims to compare the differences in noun phrase structure written in Malay as translated from English. It also analyses the types of errors when English noun phrases are translated to Malay. Preliminary findings from this pilot study found that translation errors committed were mainly inaccurate word order, inaccurate translation, added translation, dropped translation and also structure change. For this study, an exploratory mode of semantic analysis is applied by looking at noun phrases, the meaningful group of words that form a major part of any sentence, with the noun as the head of the group. Syntax is inevitably interwoven in the analysis as the structure and grammatical aspects of the translations are also analysed. They are examined by comparing English texts to its corresponding translation in the Malay language. Particularly relevant in this study is the need to emphasize on the semantics and syntax skills of the students before a good transaltion work can be produced. Language practitioners can also tap on translation activities to improve the learners’ language competency.

Download Full-text

The relationship between reading literary novels and predictive inference generation

Scientific Study of Literature ◽

10.1075/ssol.4.1.03ino ◽

2014 ◽

Vol 4 (1) ◽

pp. 46-67

Author(s):

Keisuke Inohara ◽

Ryoko Honma ◽

Takayuki Goto ◽

Takashi Kusumi ◽

Akira Utsumi

Keyword(s):

College Students ◽

Latent Semantic Analysis ◽

Average Method ◽

Semantic Analysis ◽

Predictive Inference ◽

Statistical Structure ◽

Inference Generation ◽

Vector Operation ◽

Predictive Inferences ◽

The Relationship

This study examined the relationship between reading literary novels and generating predictive inferences by analyzing a corpus of Japanese novels. Latent semantic analysis (LSA) was used to capture the statistical structure of the corpus. Then, the authors asked 74 Japanese college students to generate predictive inferences (e.g., “The newspaper burned”) in response to Japanese event sentences (e.g., “A newspaper fell into a bonfire”) and obtained more than 5,000 predicted events. The analysis showed a significant relationship between LSA similarity between the event sentences and the predicted events and frequency of the predicted events. This result suggests that exposure to literary works may help develop readers’ inference generation skills. In addition, two vector operation methods for sentence vector constructions from word vectors were compared: the “Average” method and the “Predication Algorithm” method (Kintsch, 2001). The results support the superiority of the Predication Algorithm method over the Average method.

Download Full-text

Improving Website Usability with Latent Semantic Analysis

PsycEXTRA Dataset ◽

10.1037/e577712012-027 ◽

2006 ◽

Author(s):

Sarah A. Nuehring ◽

Peter W. Foltz

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Website Usability

Download Full-text

Task Estimation Using Latent Semantic Analysis of Visual Scenes and Spoken Words

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.1473 ◽

2012 ◽

Vol 132 (9) ◽

pp. 1473-1480

Author(s):

Masashi Kimura ◽

Shinta Sawada ◽

Yurie Iribe ◽

Kouichi Katsurada ◽

Tsuneo Nitta

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Spoken Words ◽

Visual Scenes

Download Full-text

Similarity Detection Using Latent Semantic Analysis Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.124 ◽

2018 ◽

Vol 6 (8) ◽

pp. 102

Author(s):

Priyanka R. Patil ◽

Shital A. Patil

Keyword(s):

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Mining Method ◽

Research Papers ◽

Information Measures ◽

Automated Software ◽

Day By Day ◽

Ways Of Life ◽

Dirichlet Allocation

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.

Download Full-text

LATENT-SEMANTIC ANALYSIS, SOCIAL NETWORKS AND NON-STRUCTURED DATA: INTERACTION METHOD

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-2-23-29 ◽

2019 ◽

Keyword(s):

Social Networks ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Statistical Processing ◽

Unstructured Data ◽

Average Score ◽

Text Type ◽

Internet Users ◽

The Matrix ◽

Human Thinking

This article examines the method of latent-semantic analysis, its advantages, disadvantages, and the possibility of further transformation for use in arrays of unstructured data, which make up most of the information that Internet users deal with. To extract context-dependent word meanings through the statistical processing of large sets of textual data, an LSA method is used, based on operations with numeric matrices of the word-text type, the rows of which correspond to words, and the columns of text units to texts. The integration of words into themes and the representation of text units in the theme space is accomplished by applying one of the matrix expansions to the matrix data: singular decomposition or factorization of nonnegative matrices. The results of LSA studies have shown that the content of the similarity of words and text is obtained in such a way that the results obtained closely coincide with human thinking. Based on the methods described above, the author has developed and proposed a new way of finding semantic links between unstructured data, namely, information on social networks. The method is based on latent-semantic and frequency analyzes and involves processing the search result received, splitting each remaining text (post) into separate words, each of which takes the round in n words right and left, counting the number of occurrences of each term, working with a pre-created semantic resource (dictionary, ontology, RDF schema, ...). The developed method and algorithm have been tested on six well-known social networks, the interaction of which occurs through the ARI of the respective social networks. The average score for author's results exceeded that of their own social network search. The results obtained in the course of this dissertation can be used in the development of recommendation, search and other systems related to the search, rubrication and filtering of information.

Download Full-text