Statistical Methods for Word Association in Text Mining

Author(s):  
Anacleto Correia ◽  
M. Filomena Teodoro ◽  
Victor Lobo

Author(s):  
Trefor Williams ◽  
Christie Nelson ◽  
John Betak

The FRA railroad grade crossing accident database contains text comment fields that may provide additional information about grade crossing accidents. New text mining algorithms provide the potential to automatically extract information from text that can enhance traditional numeric analyses. Topic modeling algorithms are statistical methods that analyze the words of original texts to automatically discover the themes that run through them. A frequently used topic-modeling algorithm is Latent Dirichlet Analysis (LDA). In this paper we will show several examples of how labeled LDA can be applied to the FRA grade crossing data to better understand categories of words and phrases that are associated with various types of grade crossing accidents.





2019 ◽  
Vol 6 (1) ◽  
pp. 97-116
Author(s):  
Sara Brumfield

AbstractAssyriologists have a variety of methods available to assign unprovenienced materials with educated certainty to its ancient site. The occurrence of specific toponyms and month names as well as the detailed study of prosopography, paleography, orthography, lexicography, tablet shape, format and sealing practices assist specialists in reconstructing the ancient context of a specific object. Now, with the fluorescence of technology, new digital tools are being developed and refined that may contribute to the complex process of provenience assignment. Text mining, the practice of deriving information from blocks of text using pattern recognition or trend analysis, has already been applied to corpora ranging from Shakespeare to Twitter. For an example of previous text mining analysis on cuneiform sources, see, ENEA’s TIGRIS Virtual Lab (http://www.afs.enea.it/project/tigris/indexOpen.php) With the ability to search for statistically significant correlations in large blocks of text following user-defined criteria and rules, statistical methods, here accessed via text mining software, have significant potential for revealing new levels of data in cuneiform texts.



2021 ◽  
pp. 506-512
Author(s):  
Saadat M. Alhashmi ◽  
Mohammed Maree ◽  
Zaina Saadeddin

Over the past few years, numerous studies and research articles have been published in the medical literature review domain. The topics covered by these researches included medical information retrieval, disease statistics, drug analysis, and many other fields and application domains. In this paper, we employ various text mining and data analysis techniques in an attempt to discover trending topics and topic concordance in the healthcare literature and data mining field. This analysis focuses on healthcare literature and bibliometric data and word association rules applied to 1945 research articles that had been published between the years 2006 and 2019. Our aim in this context is to assist saving time and effort required for manually summarizing large-scale amounts of information in such a broad and multi-disciplinary domain. To carry out this task, we employ topic modeling techniques through the utilization of Latent Dirichlet Allocation (LDA), in addition to various document and word embedding and clustering approaches. Findings reveal that since 2010 the interest in the healthcare big data analysis has increased significantly, as demonstrated by the five most commonly used topics in this domain.



2018 ◽  
Vol 7 (2.12) ◽  
pp. 1
Author(s):  
Jin HeeKu ◽  
Yoon Su Jeong

Background/Objectives: As the use of big data increases in various fields, the use of social big data analysis for social media is increasing rapidly.This study proposed a method to apply text clustering for analysis by related topics of texts extracted using text mining of social big data.Methods/Statistical analysis: R was used for data collection and analysis, and social big data was collected from Twitter. The clustering model applicable to the related subject analysis of Twitter text was compared and selected and text clustering was performed. Text clustering is analyzed through a cluster dendrogram by generating a corpus, then grouping similar entities from the term-document matrix, and removing the sparse words.Findings: In this study, text clustering improves the difficulty in analyzing by word association and subject in text mining methods such as word cloud. Especially, in the text clustering model for the related topic analysis of social big data, the hierarchical clustering model based on the cosine similarity was more suitable than the non-hierarchical model for identifying which terms in the tweet have an association with each other. In addition, cluster dendrogram has been found to be effective in analyzing text contexts by grouping several groups of similar texts repeatedly in the visualization process.Improvements/Applications: This study can be used to confirm ideas and opinions of various participants by using Social Big Data, and to analyze more precisely the complex relationship between the prediction of social problems and the phenomenon. 



1978 ◽  
Vol 48 ◽  
pp. 7-29
Author(s):  
T. E. Lutz

This review paper deals with the use of statistical methods to evaluate systematic and random errors associated with trigonometric parallaxes. First, systematic errors which arise when using trigonometric parallaxes to calibrate luminosity systems are discussed. Next, determination of the external errors of parallax measurement are reviewed. Observatory corrections are discussed. Schilt’s point, that as the causes of these systematic differences between observatories are not known the computed corrections can not be applied appropriately, is emphasized. However, modern parallax work is sufficiently accurate that it is necessary to determine observatory corrections if full use is to be made of the potential precision of the data. To this end, it is suggested that a prior experimental design is required. Past experience has shown that accidental overlap of observing programs will not suffice to determine observatory corrections which are meaningful.



1981 ◽  
Vol 24 (3) ◽  
pp. 469-469
Author(s):  
Robert Goldfarb ◽  
Harvey Halpern


1981 ◽  
Vol 24 (2) ◽  
pp. 233-246 ◽  
Author(s):  
Robert Goldfarb ◽  
Harvey Halpern
Keyword(s):  


1973 ◽  
Vol 18 (11) ◽  
pp. 562-562
Author(s):  
B. J. WINER
Keyword(s):  


Sign in / Sign up

Export Citation Format

Share Document