Web Application for User Profiling

Author(s):  
Taous Iggui ◽  
Hassina Nacer ◽  
Youcef Sklab ◽  
Taklit Ait Radi

User's profiles play an important role when information systems try to meet their needs. This work presents a novel approach to build user profiles. It is based on information extraction techniques and proceeds by iterative steps. The use of different statistic metrics, Natural Language Processing (NLP) techniques and semantic descriptions (ontologies) in the authors' approach, has provided it with a good precision degree when extracting information from texts. This has been demonstrated by an application prototype which is an automatic user profile constructor, using the texts of emails job applications (E recruitment field).

JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Craig H Ganoe ◽  
Weiyi Wu ◽  
Paul J Barr ◽  
William Haslett ◽  
Michelle D Dannenberg ◽  
...  

Abstract Objectives The objective of this study is to build and evaluate a natural language processing approach to identify medication mentions in primary care visit conversations between patients and physicians. Materials and Methods Eight clinicians contributed to a data set of 85 clinic visit transcripts, and 10 transcripts were randomly selected from this data set as a development set. Our approach utilizes Apache cTAKES and Unified Medical Language System controlled vocabulary to generate a list of medication candidates in the transcribed text and then performs multiple customized filters to exclude common false positives from this list while including some additional common mentions of the supplements and immunizations. Results Sixty-five transcripts with 1121 medication mentions were randomly selected as an evaluation set. Our proposed method achieved an F-score of 85.0% for identifying the medication mentions in the test set, significantly outperforming existing medication information extraction systems for medical records with F-scores ranging from 42.9% to 68.9% on the same test set. Discussion Our medication information extraction approach for primary care visit conversations showed promising results, extracting about 27% more medication mentions from our evaluation set while eliminating many false positives in comparison to existing baseline systems. We made our approach publicly available on the web as an open-source software. Conclusion Integration of our annotation system with clinical recording applications has the potential to improve patients’ understanding and recall of key information from their clinic visits, and, in turn, to positively impact health outcomes.


2019 ◽  
Author(s):  
Auss Abbood ◽  
Alexander Ullrich ◽  
Rüdiger Busche ◽  
Stéphane Ghozzi

AbstractAccording to the World Health Organization (WHO), around 60% of all outbreaks are detected using informal sources. In many public health institutes, including the WHO and the Robert Koch Institute (RKI), dedicated groups of epidemiologists sift through numerous articles and newsletters to detect relevant events. This media screening is one important part of event-based surveillance (EBS). Reading the articles, discussing their relevance, and putting key information into a database is a time-consuming process. To support EBS, but also to gain insights into what makes an article and the event it describes relevant, we developed a natural-language-processing framework for automated information extraction and relevance scoring. First, we scraped relevant sources for EBS as done at RKI (WHO Disease Outbreak News and ProMED) and automatically extracted the articles’ key data: disease, country, date, and confirmed-case count. For this, we performed named entity recognition in two steps: EpiTator, an open-source epidemiological annotation tool, suggested many different possibilities for each. We trained a naive Bayes classifier to find the single most likely one using RKI’s EBS database as labels. Then, for relevance scoring, we defined two classes to which any article might belong: The article is relevant if it is in the EBS database and irrelevant otherwise. We compared the performance of different classifiers, using document and word embeddings. Two of the tested algorithms stood out: The multilayer perceptron performed best overall, with a precision of 0.19, recall of 0.50, specificity of 0.89, F1 of 0.28, and the highest tested index balanced accuracy of 0.46. The support-vector machine, on the other hand, had the highest recall (0.88) which can be of higher interest for epidemiologists. Finally, we integrated these functionalities into a web application called EventEpi where relevant sources are automatically analyzed and put into a database. The user can also provide any URL or text, that will be analyzed in the same way and added to the database. Each of these steps could be improved, in particular with larger labeled datasets and fine-tuning of the learning algorithms. The overall framework, however, works already well and can be used in production, promising improvements in EBS. The source code is publicly available at https://github.com/aauss/EventEpi.


Semantic Web technology is not new as most of us contemplate; it has evolved over the years. Linked Data web terminology is the name set recently to the Semantic Web. Semantic Web is a continuation of Web 2.0 and it is to replace existing technologies. It is built on Natural Language processing and provides solutions to most of the prevailing issues. Web 3.0 is the version of Semantic Web caters to the information needs of half of the population on earth. This paper links two important current concerns, the security of information and enforced online education due to COVID-19 with Semantic Web. The Steganography requirement for the Semantic web is discussed elaborately, even though encryption is applied which is inadequate in providing protection. Web 2.0 issues concerning online education and semantic Web solutions have been discussed. An extensive literature survey has been conducted related to the architecture of Web 3.0, detailed history of online education, and Security architecture. Finally, Semantic Web is here to stay and data hiding along with encryption makes it robust.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Zhongyu Anna Liu ◽  
Muhammad Mamdani ◽  
Richard Aviv ◽  
Chloe Pou-Prom ◽  
Amy Yu

Introduction: Diagnostic imaging reports contain important data for stroke surveillance and clinical research but converting a large amount of free-text data into structured data with manual chart abstraction is resource-intensive. We determined the accuracy of CHARTextract, a natural language processing (NLP) tool, to extract relevant stroke-related attributes from full reports of computed tomograms (CT), CT angiograms (CTA), and CT perfusion (CTP) performed at a tertiary stroke centre. Methods: We manually extracted data from full reports of 1,320 consecutive CT/CTA/CTP performed between October 2017 and January 2019 in patients presenting with acute stroke. Trained chart abstractors collected data on the presence of anterior proximal occlusion, basilar occlusion, distal intracranial occlusion, established ischemia, haemorrhage, the laterality of these lesions, and ASPECT scores, all of which were used as a reference standard. Reports were then randomly split into a training set (n= 921) and validation set (n= 399). We used CHARTextract to extract the same attributes by creating rule-based information extraction pipelines. The rules were human-defined and created through an iterative process in the training sample and then validated in the validation set. Results: The prevalence of anterior proximal occlusion was 12.3% in the dataset (n=86 left, n=72 right, and n=4 bilateral). In the training sample, CHARTextract identified this attribute with an overall accuracy of 97.3% (PPV 84.1% and NPV 99.4%, sensitivity 95.5% and specificity 97.5%). In the validation set, the overall accuracy was 95.2% (PPV 76.3% and NPV 98.5%, sensitivity 90.0% and specificity 96.0%). Conclusions: We showed that CHARTextract can identify the presence of anterior proximal vessel occlusion with high accuracy, suggesting that NLP can be used to automate the process of data collection for stroke research. We will present the accuracy of CHARTextract for the remaining neurological attributes at ISC 2020.


Author(s):  
Sumathi S. ◽  
Rajkumar S. ◽  
Indumathi S.

Lease abstraction is the method of compartmentalization of key data from a lease document. Lease document for a property contains key business, money, and legal data about a property. A lease abstract report contains details concerning the property location and basic lease details, price schedules, key events, terms and conditions, automobile parking arrangements, and landowner and tenant obligations. Abstracting a true estate contract into electronic type facilitates easy access to key data, exchanging the tedious method of reading the whole contents of the contract every time. Language process may be used for data extraction and abstraction of knowledge from lease documents.


Author(s):  
Ayse Cufoglu ◽  
Mahi Lohi ◽  
Colin Everiss

Personalization is the adaptation of the services to fit the user’s interests, characteristics and needs. The key to effective personalization is user profiling. Apart from traditional collaborative and content-based approaches, a number of classification and clustering algorithms have been used to classify user related information to create user profiles. However, they are not able to achieve accurate user profiles. In this paper, we present a new clustering algorithm, namely Multi-Dimensional Clustering (MDC), to determine user profiling. The MDC is a version of the Instance-Based Learner (IBL) algorithm that assigns weights to feature values and considers these weights for the clustering. Three feature weight methods are proposed for the MDC and, all three, have been tested and evaluated. Simulations were conducted with using two sets of user profile datasets, which are the training (includes 10,000 instances) and test (includes 1000 instances) datasets. These datasets reflect each user’s personal information, preferences and interests. Additional simulations and comparisons with existing weighted and non-weighted instance-based algorithms were carried out in order to demonstrate the performance of proposed algorithm. Experimental results using the user profile datasets demonstrate that the proposed algorithm has better clustering accuracy performance compared to other algorithms. This work is based on the doctoral thesis of the corresponding author.


Sign in / Sign up

Export Citation Format

Share Document