scholarly journals Novel Approach to Transform Unstructured Healthcare Data to Structured Data

Author(s):  
Anusha A R

With the rapid growth in number and dimension of databases and database applications in Healthcare records, it is necessary to design a system to achieve automatic extraction of facts from huge table. At the same point, there is a provocation in controlling unstructured data as it highly difficult to analyze and extract actionable intelligence. Preprocessing is an important task and critical step in Text Mining, Regular Expression and Information retrieval. The accession of key data from unstructured data is often difficult. The objective of this project is to transform the unstructured healthcare data to structured data particularly to gain perception and to generate appropriate structured data.

In this era of competition there is a culture of online reviews or feedbacks. These feedbacks may be about any product or service. However, major issues are their unstructured textual form and big number. It means every user gives feedback in own style. Study and analyzing of such unorganized big number of feedbacks that are growing every year becomes herculean task. This paper describes about mining of structured data (table) and unstructured data (text) both. An application from academic environment for structured and unstructured form of data is considered and discussed to enhance understanding and easiness of researcher. Stanford Parser plays a very useful role to understand the semantic of a sentence. It gives a base that how to separate data from the wellspring of information accessible in the literary structure like web based life, tweets, news, books and so on. It is also helpful to judge a teaching learning process in terms of teacher’s performance and subject’s weakness if any. This paper has five sections first about introduction, second about literature of text mining and its techniques, third about proposed work and result, fourth about future perspectives and finally fifth as a conclusion.


2017 ◽  
Author(s):  
Roshni Kalbhore ◽  
Pravin Malviya

2017 ◽  
Vol 9 (1) ◽  
pp. 19-24 ◽  
Author(s):  
David Domarco ◽  
Ni Made Satvika Iswari

Technology development has affected many areas of life, especially the entertainment field. One of the fastest growing entertainment industry is anime. Anime has evolved as a trend and a hobby, especially for the population in the regions of Asia. The number of anime fans grow every year and trying to dig up as much information about their favorite anime. Therefore, a chatbot application was developed in this study as anime information retrieval media using regular expression pattern matching method. This application is intended to facilitate the anime fans in searching for information about the anime they like. By using this application, user can gain a convenience and interactive anime data retrieval that can’t be found when searching for information via search engines. Chatbot application has successfully met the standards of information retrieval engine with a very good results, the value of 72% precision and 100% recall showing the harmonic mean of 83.7%. As the application of hedonic, chatbot already influencing Behavioral Intention to Use by 83% and Immersion by 82%. Index Terms—anime, chatbot, information retrieval, Natural Language Processing (NLP), Regular Expression Pattern Matching


2015 ◽  
Vol 24 (02) ◽  
pp. 1540010 ◽  
Author(s):  
Patrick Arnold ◽  
Erhard Rahm

We introduce a novel approach to extract semantic relations (e.g., is-a and part-of relations) from Wikipedia articles. These relations are used to build up a large and up-to-date thesaurus providing background knowledge for tasks such as determining semantic ontology mappings. Our automatic approach uses a comprehensive set of semantic patterns, finite state machines and NLP techniques to extract millions of relations between concepts. An evaluation for different domains shows the high quality and effectiveness of the proposed approach. We also illustrate the value of the newly found relations for improving existing ontology mappings.


2021 ◽  
pp. 073563312110435
Author(s):  
Rina P.Y. Lai

As a dynamic and multifaceted construct, computational thinking (CT) has proven to be challenging to conceptualize and assess, which impedes the development of a workable ontology framework. To address this issue, the current article describes a novel approach towards understanding the ontological aspects of CT by using text mining and graph-theoretic techniques to elucidate teachers’ perspectives collected in an online survey (N = 105). In particular, a hierarchical cluster analysis, a knowledge representation method, was applied to identify sub-groups in CT conceptualization and assessment amongst teachers. Five clusters in conceptualization and two clusters in assessment were identified; several relevant and distinct themes were also extracted. The results suggested that teachers attributed CT as a competence domain, relevant in the problem- solving context, as well as applicable and transferrable to various disciplines. The results also shed light on the importance of using multiple approaches to assess the diversity of CT. Overall, the findings collectively contributed to a comprehensive and multi-perspective representation of CT that refine both theory and practice. The methodology employed in this article has suggested a minor but significant step towards addressing the quintessential questions of “what is CT?” and “how is it evidenced?”.


2017 ◽  
Vol 139 (11) ◽  
Author(s):  
Feng Shi ◽  
Liuqing Chen ◽  
Ji Han ◽  
Peter Childs

With the advent of the big-data era, massive information stored in electronic and digital forms on the internet become valuable resources for knowledge discovery in engineering design. Traditional document retrieval method based on document indexing focuses on retrieving individual documents related to the query, but is incapable of discovering the various associations between individual knowledge concepts. Ontology-based technologies, which can extract the inherent relationships between concepts by using advanced text mining tools, can be applied to improve design information retrieval in the large-scale unstructured textual data environment. However, few of the public available ontology database stands on a design and engineering perspective to establish the relations between knowledge concepts. This paper develops a “WordNet” focusing on design and engineering associations by integrating the text mining approaches to construct an unsupervised learning ontology network. Subsequent probability and velocity network analysis are applied with different statistical behaviors to evaluate the correlation degree between concepts for design information retrieval. The validation results show that the probability and velocity analysis on our constructed ontology network can help recognize the high related complex design and engineering associations between elements. Finally, an engineering design case study demonstrates the use of our constructed semantic network in real-world project for design relations retrieval.


Author(s):  
Patrice Bellot ◽  
Ludovic Bonnefoy ◽  
Vincent Bouvier ◽  
Frédéric Duvert ◽  
Young-Min Kim

2020 ◽  
Vol 5 (4) ◽  
pp. 43-55
Author(s):  
Gianpiero Bianchi ◽  
Renato Bruni ◽  
Cinzia Daraio ◽  
Antonio Laureti Palma ◽  
Giulio Perani ◽  
...  

AbstractPurposeThe main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities.Design/methodology/approachWebometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.FindingsThe main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitationsThe results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad.Practical implicationsThe approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites. The approach could be applied to other university systems.Originality/valueThis work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping, optical character recognition and nontrivial text mining operations (Bruni & Bianchi, 2020).


10.2196/20773 ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. e20773 ◽  
Author(s):  
Antoine Neuraz ◽  
Ivan Lerner ◽  
William Digan ◽  
Nicolas Paris ◽  
Rosy Tsopra ◽  
...  

Background A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. Objective The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). Methods We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. Results In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. Conclusions In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.


Sign in / Sign up

Export Citation Format

Share Document