scholarly journals Semantics Graph Mining for Topic Discovery and Word Associations

Author(s):  
Alex Romanova

Big Data creates many challenges for data mining experts, in particular in getting meanings of text data. It is beneficial for text mining to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model to determine word associations and discover document topics. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents, get unexpected word associations and uncover document topics. To validate topic discovery method we transfer words to vectors and vectors to images and use CNN deep learning image classification.

2021 ◽  
Author(s):  
Alex Romanova

It is beneficial for document topic analysis to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model, finding document topics and validating topic discovery. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents and uncover document topics as graph clusters. To validate topic discovery method we transfer words to vectors and vectors to images and use deep learning image classification.


2020 ◽  
Vol 81 (4) ◽  
pp. 193
Author(s):  
Kyle Courtney ◽  
Rachael Samberg ◽  
Timothy Vollmer

A wealth of digital texts and the proliferation of automated research methodologies enable researchers to analyze large sets of data at a speed that would be impossible to achieve through manual review. When researchers use these automated techniques and methods for identifying, extracting, and analyzing patterns, trends, and relationships across large volumes of un- or thinly structured digital content, they are applying a methodology called text data mining or TDM. TDM is also referred to, with slightly different emphases, as “computational text analysis” or “content mining.”


2015 ◽  
Vol 7 (1) ◽  
pp. 17-32
Author(s):  
J.S. Shyam Mohan ◽  
P. Shanmugapriya ◽  
Bhamidipati Vinay Pawan Kumar

Abstract Finding out the widely used URL’s from online shopping sites for any particular category is a difficult task as there are many heterogeneous and multi-dimensional data set which depends on various factors. Traditional data mining methods are limited to homogenous data source, so they fail to sufficiently consider the characteristics of heterogeneous data. This paper presents a consistent Big Data mining search which performs analytics on text data to find the top rated URL’s. Though many heuristic search methods are available, our proposed method solves the problem of searching compared with traditional methods in data mining. The sample results are obtained in optimal time and are compared with other methods which is effective and efficient.


2019 ◽  
Vol 11 (15) ◽  
pp. 4235 ◽  
Author(s):  
Kauffmann ◽  
Peral ◽  
Gil ◽  
Ferrández ◽  
Sellers ◽  
...  

Companies have realized the importance of “big data” in creating a sustainable competitive advantage, and user-generated content (UGC) represents one of big data’s most important sources. From blogs to social media and online reviews, consumers generate a huge amount of brand-related information that has a decisive potential business value for marketing purposes. Particularly, we focus on online reviews that could have an influence on brand image and positioning. Within this context, and using the usual quantitative star score ratings, a recent stream of research has employed sentiment analysis (SA) tools to examine the textual content of reviews and categorize buyer opinions. Although many SA tools split comments into negative or positive, a review can contain phrases with different polarities because the user can have different sentiments about each feature of the product. Finding the polarity of each feature can be interesting for product managers and brand management. In this paper, we present a general framework that uses natural language processing (NLP) techniques, including sentiment analysis, text data mining, and clustering techniques, to obtain new scores based on consumer sentiments for different product features. The main contribution of our proposal is the combination of price and the aforementioned scores to define a new global score for the product, which allows us to obtain a ranking according to product features. Furthermore, the products can be classified according to their positive, neutral, or negative features (visualized on dashboards), helping consumers with their sustainable purchasing behavior. We proved the validity of our approach in a case study using big data extracted from Amazon online reviews (specifically cell phones), obtaining satisfactory and promising results. After the experimentation, we could conclude that our work is able to improve recommender systems by using positive, neutral, and negative customer opinions and by classifying customers based on their comments.


Author(s):  
Kiran Kumar S V N Madupu

Big Data has terrific influence on scientific discoveries and also value development. This paper presents approaches in data mining and modern technologies in Big Data. Difficulties of data mining as well as data mining with big data are discussed. Some technology development of data mining as well as data mining with big data are additionally presented.


2019 ◽  
Author(s):  
Meghana Bastwadkar ◽  
Carolyn McGregor ◽  
S Balaji

BACKGROUND This paper presents a systematic literature review of existing remote health monitoring systems with special reference to neonatal intensive care (NICU). Articles on NICU clinical decision support systems (CDSSs) which used cloud computing and big data analytics were surveyed. OBJECTIVE The aim of this study is to review technologies used to provide NICU CDSS. The literature review highlights the gaps within frameworks providing HAaaS paradigm for big data analytics METHODS Literature searches were performed in Google Scholar, IEEE Digital Library, JMIR Medical Informatics, JMIR Human Factors and JMIR mHealth and only English articles published on and after 2015 were included. The overall search strategy was to retrieve articles that included terms that were related to “health analytics” and “as a service” or “internet of things” / ”IoT” and “neonatal intensive care unit” / ”NICU”. Title and abstracts were reviewed to assess relevance. RESULTS In total, 17 full papers met all criteria and were selected for full review. Results showed that in most cases bedside medical devices like pulse oximeters have been used as the sensor device. Results revealed a great diversity in data acquisition techniques used however in most cases the same physiological data (heart rate, respiratory rate, blood pressure, blood oxygen saturation) was acquired. Results obtained have shown that in most cases data analytics involved data mining classification techniques, fuzzy logic-NICU decision support systems (DSS) etc where as big data analytics involving Artemis cloud data analysis have used CRISP-TDM and STDM temporal data mining technique to support clinical research studies. In most scenarios both real-time and retrospective analytics have been performed. Results reveal that most of the research study has been performed within small and medium sized urban hospitals so there is wide scope for research within rural and remote hospitals with NICU set ups. Results have shown creating a HAaaS approach where data acquisition and data analytics are not tightly coupled remains an open research area. Reviewed articles have described architecture and base technologies for neonatal health monitoring with an IoT approach. CONCLUSIONS The current work supports implementation of the expanded Artemis cloud as a commercial offering to healthcare facilities in Canada and worldwide to provide cloud computing services to critical care. However, no work till date has been completed for low resource setting environment within healthcare facilities in India which results in scope for research. It is observed that all the big data analytics frameworks which have been reviewed in this study have tight coupling of components within the framework, so there is a need for a framework with functional decoupling of components.


Sign in / Sign up

Export Citation Format

Share Document