scholarly journals Dirichlet Methods for Bayesian Source Detection in Radio Astronomy Images

2021 ◽  
Author(s):  
◽  
Anna Friedlander

<p>The sheer volume of data to be produced by the next generation of radio telescopes—exabytes of data on hundreds of millions of objects—makes automated methods for the detection of astronomical objects ("sources") essential. Of particular importance are low surface brightness objects, which are not well found by current automated methods.  This thesis explores Bayesian methods for source detection that use Dirichlet or multinomial models for pixel intensity distributions in discretised radio astronomy images. A novel image discretisation method that incorporates uncertainty about how the image should be discretised is developed. Latent Dirichlet allocation — a method originally developed for inferring latent topics in document collections — is used to estimate source and background distributions in radio astronomy images. A new Dirichlet-multinomial ratio, indicating how well a region conforms to a well-specified model of background versus a loosely-specified model of foreground, is derived. Finally, latent Dirichlet allocation and the Dirichlet-multinomial ratio are combined for source detection in astronomical images.   The methods developed in this thesis perform source detection well in comparison to two widely-used source detection packages and, importantly, find dim sources not well found by other algorithms.</p>

2021 ◽  
Author(s):  
◽  
Anna Friedlander

<p>The sheer volume of data to be produced by the next generation of radio telescopes—exabytes of data on hundreds of millions of objects—makes automated methods for the detection of astronomical objects ("sources") essential. Of particular importance are low surface brightness objects, which are not well found by current automated methods.  This thesis explores Bayesian methods for source detection that use Dirichlet or multinomial models for pixel intensity distributions in discretised radio astronomy images. A novel image discretisation method that incorporates uncertainty about how the image should be discretised is developed. Latent Dirichlet allocation — a method originally developed for inferring latent topics in document collections — is used to estimate source and background distributions in radio astronomy images. A new Dirichlet-multinomial ratio, indicating how well a region conforms to a well-specified model of background versus a loosely-specified model of foreground, is derived. Finally, latent Dirichlet allocation and the Dirichlet-multinomial ratio are combined for source detection in astronomical images.   The methods developed in this thesis perform source detection well in comparison to two widely-used source detection packages and, importantly, find dim sources not well found by other algorithms.</p>


2020 ◽  
Author(s):  
Sunil Nagpal ◽  
Divyanshu Srivastava ◽  
Sharmila S. Mande

ABSTRACTTopic modeling is frequently employed for discovering structures (or patterns) in a corpus of documents. Its utility in text-mining and document retrieval tasks in various fields of scientific research is rather well known. An unsupervised machine learning approach, Latent Dirichlet Allocation (LDA) has particularly been utilized for identifying latent (or hidden) topics in document collections and for deciphering the words that define one or more topics using a generative statistical model. Here we describe how SARS-CoV-2 genomic mutation profiles can be structured into a ‘Bag of Words’ to enable identification of signatures (topics) and their probabilistic distribution across various genomes using LDA. Topic models were generated using ~47000 novel corona virus genomes (considered as documents), leading to identification of 16 amino acid mutation signatures and 18 nucleotide mutation signatures (equivalent to topics) in the corpus of chosen genomes through coherence optimization. The document assumption for genomes also helped in identification of contextual nucleotide mutation signatures in the form of conventional N-grams (e.g. bi-grams and tri-grams). We validated the signatures obtained using LDA driven method against the previously reported recurrent mutations and phylogenetic clades for genomes. Additionally, we report the geographical distribution of the identified mutation signatures in SARS-CoV-2 genomes on the global map. Use of the non-phylogenetic albeit classical approaches like topic modeling and other data centric pattern mining algorithms is therefore proposed for supplementing the efforts towards understanding the genomic diversity of the evolving SARS-CoV-2 genomes (and other pathogens/microbes).


2019 ◽  
Vol 14 (1) ◽  
pp. 107-123 ◽  
Author(s):  
Qianqian Zhang ◽  
Shifeng Liu ◽  
Daqing Gong ◽  
Qun Tu

This paper proposed a method for building enterprise's technological innovation domain ontology automatically from plain text corpus based on Latent Dirichlet Allocation (LDA). The proposed method consisted of four modules: 1) introducing the seed ontology for domain of enterprise's technological innovation, 2) using Natural Language Processing (NLP) technique to preprocess the collected textual data, 3) mining domain specific terms from document collections based on LDA, 4) obtaining the relationship between the terms through the defined relevant rules. The experiments have been carried out to demonstrate the effectiveness of this method and the results indicated that many terms in domain of enterprise's technological innovation and the semantic relations between terms are discovered. The proposed method is a process of continuously cycles and iterations, that is the obtained objective ontology can be re-iterated as initial seed ontology. The constant knowledge acquisition in the domain of enterprise's technological innovation to update and perfect the initial seed ontology.


Author(s):  
Priyanka R. Patil ◽  
Shital A. Patil

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.


2021 ◽  
Vol 920 ◽  
Author(s):  
Mohamed Frihat ◽  
Bérengère Podvin ◽  
Lionel Mathelin ◽  
Yann Fraigneau ◽  
François Yvon

Abstract


2021 ◽  
pp. 016555152110077
Author(s):  
Sulong Zhou ◽  
Pengyu Kan ◽  
Qunying Huang ◽  
Janet Silbernagel

Natural disasters cause significant damage, casualties and economical losses. Twitter has been used to support prompt disaster response and management because people tend to communicate and spread information on public social media platforms during disaster events. To retrieve real-time situational awareness (SA) information from tweets, the most effective way to mine text is using natural language processing (NLP). Among the advanced NLP models, the supervised approach can classify tweets into different categories to gain insight and leverage useful SA information from social media data. However, high-performing supervised models require domain knowledge to specify categories and involve costly labelling tasks. This research proposes a guided latent Dirichlet allocation (LDA) workflow to investigate temporal latent topics from tweets during a recent disaster event, the 2020 Hurricane Laura. With integration of prior knowledge, a coherence model, LDA topics visualisation and validation from official reports, our guided approach reveals that most tweets contain several latent topics during the 10-day period of Hurricane Laura. This result indicates that state-of-the-art supervised models have not fully utilised tweet information because they only assign each tweet a single label. In contrast, our model can not only identify emerging topics during different disaster events but also provides multilabel references to the classification schema. In addition, our results can help to quickly identify and extract SA information to responders, stakeholders and the general public so that they can adopt timely responsive strategies and wisely allocate resource during Hurricane events.


Sign in / Sign up

Export Citation Format

Share Document