scholarly journals Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling

Author(s):  
Pooja Kherwa ◽  
Poonam Bansal

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 660 ◽  
Author(s):  
Sergei Koltcov ◽  
Vera Ignatenko ◽  
Olessia Koltsova

Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma–Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground.


2020 ◽  
Vol 13 (44) ◽  
pp. 4474-4482
Author(s):  
Vasantha Kumari Garbhapu ◽  

Objective: To compare the topic modeling techniques, as no free lunch theorem states that under a uniform distribution over search problems, all machine learning algorithms perform equally. Hence, here, we compare Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to identify better performer for English bible data set which has not been studied yet. Methods: This comparative study divided into three levels: In the first level, bible data was extracted from the sources and preprocessed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the preprocessed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, Latent Semantic analysis and Latent Dirichlet Allocations methods were applied over the resultant corpus to study the feasibility of the techniques. Findings: Based on our evaluation, we observed that the LDA achieves 60 to 75% superior performance when compared to LSA using document similarity within-corpus, document similarity with the unseen document. Additionally, LDA showed better coherence score (0.58018) than LSA (0.50395). Moreover, when compared to any word within-corpus, the word association showed better results with LDA. Some words have homonyms based on the context; for example, in the bible; bear has a meaning of punishment and birth. In our study, LDA word association results are almost near to human word associations when compared to LSA. Novelty: LDA was found to be the computationally efficient and interpretable method in adopting the English Bible dataset of New International Version that was not yet created. Keywords: Topic modeling; LSA; LDA; word association; document similarity;Bible data set


Author(s):  
Priyanka R. Patil ◽  
Shital A. Patil

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.


2021 ◽  
Vol 2 (1) ◽  
Author(s):  
Kendall A. Johnson ◽  
Clive H. Bock ◽  
Phillip M. Brannen

Abstract Background Phony peach disease (PPD) is caused by the plant pathogenic bacterium Xylella fastidiosa subsp. multiplex (Xfm). Historically, the disease has caused severe yield loss in Georgia and elsewhere in the southeastern United States, with millions of PPD trees being removed from peach orchards over the last century. The disease remains a production constraint, and management options are few. Limited research has been conducted on PPD since the 1980s, but the advent of new technologies offers the opportunity for new, foundational research to form a basis for informed management of PPD in the U.S. Furthermore, considering the global threat of Xylella to many plant species, preventing import of Xfm to other regions, particularly where peach is grown, should be considered an important phytosanitary endeavor. Main topics We review PPD, its history and impact on peach production, and the eradication efforts that were conducted for 42 years. Additionally, we review the current knowledge of the pathogen, Xfm, and how that knowledge relates to our understanding of the peach—Xylella pathosystem, including the epidemiology of the disease and consideration of the vectors. Methods used to detect the pathogen in peach are discussed, and ramifications of detection in relation to management and control of PPD are considered. Control options for PPD are limited. Our current knowledge of the pathogen diversity and disease epidemiology are described, and based on this, some potential areas for future research are also considered. Conclusion There is a lack of recent foundational research on PPD and the associated strain of Xfm. More research is needed to reduce the impact of this pathogen on peach production in the southeastern U.S., and, should it spread internationally, wherever peaches are grown.


2018 ◽  
Vol 30 (8) ◽  
pp. 1186-1203 ◽  
Author(s):  
Michael R. Smith ◽  
Matthew Petrocelli

In 2010, the Arizona legislature effectively deregulated concealed handgun carry in the state by passing Senate Bill (SB) 1108, which eliminated licensing and training requirements for concealed carry. Although researchers have extensively examined the impact of state adoption of concealed carry laws, almost nothing is known about the effects of deregulating concealed carry altogether. This study contributes to the more guns, less crime debate by examining the impact of Arizona’s decision to deregulate concealed carry. Using a multiple time-series research design with an experimental (Tucson) and control city (El Paso), the present study examines the impact of deregulation on handgun-related violent crime and gun larcenies in Arizona’s second largest city—Tucson. We find that the passage of SB 1108 had no impact on handgun-related offenses that could be expected to change following deregulation. The implications of these findings for policy making and future research are discussed.


Natural Language Processing uses word embeddings to map words into vectors. Context vector is one of the techniques to map words into vectors. The context vector gives importance of terms in the document corpus. The derivation of context vector is done using various methods such as neural networks, latent semantic analysis, knowledge base methods etc. This paper proposes a novel system to devise an enhanced context vector machine called eCVM. eCVM is able to determine the context phrases and its importance. eCVM uses latent semantic analysis, existing context vector machine, dependency parsing, named entities, topics from latent dirichlet allocation and various forms of words like nouns, adjectives and verbs for building the context. eCVM uses context vector and Pagerank algorithm to find the importance of the term in document and is tested on BBC news dataset. Results of eCVM are compared with compared with the state of the art for context detrivation. The proposed system shows improved performance over existing systems for standard evaluation parameters.


2021 ◽  
Author(s):  
Lucas Fery ◽  
Berengere Dubrulle ◽  
Flavio Pons ◽  
Berengere Podvin ◽  
Davide Faranda

Abstract Mid-latitude circulation dynamics is often described in terms of weather regimes, represented by atmospheric field configurations extracted using pattern recognition techniques. Each pattern is given by a given combination of distinct elements, corresponding to synoptic objects (cyclones and anticyclones). Such intrication makes it arduous to detect or quantify shifts in atmospheric circulation - possibly due to anthropogenic forcings - impacting recurrence and intensity of climate extremes. Here we apply Latent Dirichlet Allocation (LDA), typically used for topic modeling in linguistic studies, to build a weather dictionary: in analogy with linguistics, we define daily maps of a gridded target observable as documents, and the grid-points composing the map as words. LDA provides a representation of documents in terms of a combination of spatial patterns named motifs, which are latent patterns inferred from the set of snapshots. For atmospheric data, we find that motifs correspond to pure synoptic objects (cyclones and anticyclones), that can be seen as building blocks of weather regimes. We show that LDA weights provide a natural way to characterize the impact of climate change on the recurrence of regimes associated with extreme events.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Ashley Weeks ◽  
Lisa Waddell ◽  
Andrea Nwosu ◽  
Christina Bancej ◽  
Shalini Desai ◽  
...  

Objective: To create a scoping review on enterovirus D-68 (EV-D68) that will serve as a useful tool to guide future research with the aim of filling critical information gaps and supporting the development of public health preparedness activities.Introduction: EV-D68 is a non-polio enterovirus, primarily resulting in respiratory illness, with clinical symptoms ranging from mild to severe. Infection has also been associated with severe neurological conditions like acute flaccid myelitis (AFM). EV-D68 was first discovered in 1962, with infrequent case reports until 2014 at which point a widespread multi-national outbreak mostly affecting the pediatric population occurred across North America, Europe, Southeast Asia and Africa. This outbreak was associated with an increase in AFM, with cases being reported in Canada, the United States, Norway, and France. With this new and emerging threat, public health and other organizations were called upon to implement response measures such as establishment of case definitions, surveillance mechanisms, and recommendations for clinical and public health management. The response to the 2014 outbreak in Canada highlighted several important EV-D68 evidence gaps including a lack of risk factor and clinical information available for non-severe cases, and uncertainty around seasonal, cyclical and secular trends. Given the increased reporting of EV-D68 cases associated with severe outcomes, it's critical that public health establishes what is known about EV-D68 in order to support decision-making, education and other preparedness activities and to highlight priority areas for future research to fill critical knowledge gaps. Scoping reviews provide a reproducible and updateable synthesis research methodology to identify and characterise all the literature on a broad topic as a means to highlight where evidence exists and where there are knowledge gaps. In order to systematically characterise the EV-D68 knowledge base, a scoping review was conducted to map the current body of evidence.Methods: A literature search of published and grey literature on EV-D68 was conducted on May 1, 2017. A standardized search algorithm was implemented in four bibliographic databases: Medline, Embase, Global Health and Scopus. Relevant grey literature was sought from a prioriidentified sources: the World Health Organization, United States Centers for Disease Control and Prevention, the Public Health Agency of Canada, the European Centre for Disease Prevention and Control, and thesis registries. Two-level relevance screening (title/abstract followed by full-text) was performed in duplicate by two independent reviewers using pretested screening forms. Conflicts between the reviewers were reconciled following group discussion with the study team. English and French articles were included if they reported on EV-D68 as an outcome. There were no limitations by date, publication type, geography or study design. Conference abstracts were excluded if they did not provide sufficient outcome information to characterize. The articles were then characterized by two independent reviewers using a pretested study characterization form. The descriptive characteristics of each article were extracted and categorized into one of the following broad topic categories: 1) Epidemiology and Public Health, 2) Clinical and Infection Prevention and Control (IPC), 3) Guidance Products, 4) Public Health Surveillance, 5) Laboratory, and 6) Impact. The Epidemiology and Public Health category contained citations describing prevalence, epidemiological distribution, outbreak data and public health mitigation strategies. Clinical and IPC citations included details regarding symptoms of EV-D68 infection, patient outcomes, clinical investigation processes, treatment options and infection prevention and control strategies. The Guidance category included citations that assess risk, provide knowledge translation or provide practice guidelines. Public Health Surveillance citations provided details on surveillance systems. Citations in the laboratory category included studies that assessed the genetic characteristics of circulating EV-D68 (phylogeny, taxonomy) and viral characteristics (proteins, viral properties). Lastly, the Impact category contained citations describing the social, economic and resource burden of EV-D68 infection. Each broad topic category was subsequently characterised further into subtopics.Results: The search yielded a total of 384 citations, of which 300 met the inclusion criteria. Twenty-six of forty-three potentially relevant grey literature sources were also included. Preliminary literature characterization suggests that the majority of the published literature fell under the topic categories of Epidemiology, Clinical, and Laboratory. There were limited published articles on public health guidance, IPC, surveillance systems and the impact of EV-D68. The grey literature primarily consisted of webpages directed towards the public (what EV-D68 is, how to prevent it, what to do if ill, etc.). This scoping review work is presently underway and a summary of the full results will be presented at the 2018 Annual Conference.Conclusions: The body of literature on EV-D68 has increased since the 2014 outbreak, but overall remains small and contains knowledge gaps in some areas. To our knowledge, this scoping review is the first to classify the entirety of literature relating to EV-D68. It will serve as a useful tool to guide future research with the aim of filling critical information gaps, and supporting development of public health preparedness activities.


Sign in / Sign up

Export Citation Format

Share Document