scholarly journals Design and Application of a Multi-Variant Expert System Using Apache Hadoop Framework

2018 ◽  
Vol 10 (11) ◽  
pp. 4280 ◽  
Author(s):  
Muhammad Ibrahim ◽  
Imran Bajwa

Movie recommender expert systems are valuable tools to provide recommendation services to users. However, the existing movie recommenders are technically lacking in two areas: first, the available movie recommender systems give general recommendations; secondly, existing recommender systems use either quantitative (likes, ratings, etc.) or qualitative data (polarity score, sentiment score, etc.) for achieving the movie recommendations. A novel approach is presented in this paper that not only provides topic-based (fiction, comedy, horror, etc.) movie recommendation but also uses both quantitative and qualitative data to achieve a true and relevant recommendation of a movie relevant to a topic. The used approach relies on SentiwordNet and tf-idf similarity measures to calculate the polarity score from user reviews, which represent the qualitative aspect of likeness of a movie. Similarly, three quantitative variables (such as likes, ratings, and votes) are used to get final a recommendation score. A fuzzy logic module decides the recommendation category based on this final recommendation score. The proposed approach uses a big data technology, “Hadoop” to handle data diversity and heterogeneity in an efficient manner. An Android application collaborates with a web-bot to use recommendation services and show topic-based recommendation to users.

2020 ◽  
pp. 1-1
Author(s):  
Ruixin Guo ◽  
Feng Zhang ◽  
Lizhe Wang ◽  
Wusheng Zhang ◽  
Xinya Lei ◽  
...  

2018 ◽  
Vol 29 (1) ◽  
pp. 653-663 ◽  
Author(s):  
Ritu Meena ◽  
Kamal K. Bharadwaj

Abstract Many recommender systems frequently make suggestions for group consumable items to the individual users. There has been much work done in group recommender systems (GRSs) with full ranking, but partial ranking (PR) where items are partially ranked still remains a challenge. The ultimate objective of this work is to propose rank aggregation technique for effectively handling the PR problem. Additionally, in real applications, most of the studies have focused on PR without ties (PRWOT). However, the rankings may have ties where some items are placed in the same position, but where some items are partially ranked to be aggregated may not be permutations. In this work, in order to handle problem of PR in GRS for PRWOT and PR with ties (PRWT), we propose a novel approach to GRS based on genetic algorithm (GA) where for PRWOT Spearman foot rule distance and for PRWT Kendall tau distance with bucket order are used as fitness functions. Experimental results are presented that clearly demonstrate that our proposed GRS based on GA for PRWOT (GRS-GA-PRWOT) and PRWT (GRS-GA-PRWT) outperforms well-known baseline GRS techniques.


Author(s):  
Navin Tatyaba Gopal ◽  
Anish Raj Khobragade

The Knowledge graphs (KGs) catches structured data and relationships among a bunch of entities and items. Generally, constitute an attractive origin of information that can advance the recommender systems. But, present methodologies of this area depend on manual element thus don’t permit for start to end training. This article proposes, Knowledge Graph along with Label Smoothness (KG-LS) to offer better suggestions for the recommender Systems. Our methodology processes user-specific entities by prior application of a function capability that recognizes key KG-relationships for a specific user. In this manner, we change the KG in a specific-user weighted graph followed by application of a graph neural network to process customized entity embedding. To give better preliminary predisposition, label smoothness comes into picture, which places items in the KG which probably going to have identical user significant names/scores. Use of, label smoothness gives regularization above the edge weights thus; we demonstrate that it is comparable to a label propagation plan on the graph. Additionally building-up a productive usage that symbolizes solid adaptability concerning the size of knowledge graph. Experimentation on 4 datasets shows that our strategy beats best in class baselines. This process likewise accomplishes solid execution in cold start situations where user-entity communications remain meager.


Author(s):  
Laura Macia

In this article I discuss cluster analysis as an exploratory tool to support the identification of associations within qualitative data. While not appropriate for all qualitative projects, cluster analysis can be particularly helpful in identifying patterns where numerous cases are studied. I use as illustration a research project on Latino grievances to offer a detailed explanation of the main steps in cluster analysis, providing specific considerations for its use with qualitative data. I specifically describe the issues of data transformation, the choice of clustering methods and similarity measures, the identification of a cluster solution, and the interpretation of the data in a qualitative context.


Author(s):  
Flavius Frasincar ◽  
Wouter IJntema ◽  
Frank Goossen ◽  
Frederik Hogenboom

News items play an increasingly important role in the current business decision processes. Due to the large amount of news published every day it is difficult to find the new items of one’s interest. One solution to this problem is based on employing recommender systems. Traditionally, these recommenders use term extraction methods like TF-IDF combined with the cosine similarity measure. In this chapter, we explore semantic approaches for recommending news items by employing several semantic similarity measures. We have used existing semantic similarities as well as proposed new solutions for computing semantic similarities. Both traditional and semantic recommender approaches, some new, have been implemented in Athena, an extension of the Hermes news personalization framework. Based on the performed evaluation, we conclude that semantic recommender systems in general outperform traditional recommenders systems with respect to accuracy, precision, and recall, and that the new semantic recommenders have a better F-measure than existing semantic recommenders.


2019 ◽  
Vol 28 (05) ◽  
pp. 1950019 ◽  
Author(s):  
Nicolás Torres ◽  
Marcelo Mendoza

Clustering-based recommender systems bound the seek of similar users within small user clusters providing fast recommendations in large-scale datasets. Then groups can naturally be distributed into different data partitions scaling up in the number of users the recommender system can handle. Unfortunately, while the number of users and items included in a cluster solution increases, the performance in terms of precision of a clustering-based recommender system decreases. We present a novel approach that introduces a cluster-based distance function used for neighborhood computation. In our approach, clusters generated from the training data provide the basis for neighborhood selection. Then, to expand the search of relevant users, we use a novel measure that can exploit the global cluster structure to infer cluster-outside user’s distances. Empirical studies on five widely known benchmark datasets show that our proposal is very competitive in terms of precision, recall, and NDCG. However, the strongest point of our method relies on scalability, reaching speedups of 20× in a sequential computing evaluation framework and up to 100× in a parallel architecture. These results show that an efficient implementation of our cluster-based CF method can handle very large datasets providing also good results in terms of precision, avoiding the high computational costs involved in the application of more sophisticated techniques.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 485
Author(s):  
Samson Fadiya ◽  
Arif Sari

The adoption of Web 2.0 technologies, Internet of Things, etc. by individuals and organization has led to an explosion of data. As it stands, existing Relational Database Management Systems (RDBMSs) are incapable of handling this deluge of data. The term Big Data was coined to represent these vast, fast and complex datasets that regular RDBMSs could not handle. Special tools or frameworks were developed to deal with processing, managing and storing this big data. These tools are capable of functioning in distributed industry- standard environments thereby maintaining efficiency and effectiveness at a business level. Apache Hadoop is an example of such a framework. This report discusses big data, it origins, opportunities and challenges that it presents, big data analytics and the application of big data using existing big data tools or frameworks. It also discusses Apache Hadoop as a big data framework and provides a basic overview of this technology from technological and business perspectives.  


2020 ◽  
pp. 001857872091834
Author(s):  
Diana Altshuler ◽  
Kenny Yu ◽  
John Papadopoulos ◽  
Arash Dabestani

Purpose: The intent of this article is to evaluate a novel approach, using rapid cycle analytics and real world evidence, to optimize and improve the medication evaluation process to help the formulary decision making process, while reducing time for clinicians. Summary: The Pharmacy and Therapeutics (P&T) Committee within each health system is responsible for evaluating medication requests for formulary addition. Members of the pharmacy staff prepare the drug monograph or a medication use evaluation (MUE) and allocate precious clinical resources to review patient charts to assess efficacy and value. We explored a novel approach to evaluate the value of our intravenous acetaminophen (IV APAP) formulary admittance. This new methodology, called rapid cycle analytics, can assist hospitals in meeting and/or exceeding the minimum criteria of formulary maintenance as defined by the Joint Commission Standards. In this particular study, we assessed the effectiveness of IV APAP in total hip arthroplasty (THA) and total knee arthroplasty (TKA) procedures. We assessed the correlation to same-stay opioid utilization, average length of inpatient stay and post anesthesia care unit (PACU) time. Conclusion: We were able to explore and improve our organization’s approach in evaluating medications by partnering with an external analytics expert to help organize and normalize our data in a more robust, yet time efficient manner. Additionally, we were able to use a significantly larger external data set as a point of reference. Being able to perform this detailed analytical exercise for thousands of encounters internally and using a data warehouse of over 130 million patients as a point of reference in a short time has improved the depth of our assessment, as well as reducing valuable clinical resources allocated to MUEs to allow for more direct patient care. This clinically real-world and data-rich analytics model is the necessary foundation for using Artificial or Augmented Intelligence (AI) to make real-time formulary and drug selection decisions


Author(s):  
Bharat Tidke ◽  
Rupa Mehta ◽  
Dipti Rana ◽  
Hullash Jangir

Social media data (SMD) is driven by statistical and analytical technologies to obtain information for various decisions. SMD is vast and evolutionary in nature which makes traditional data warehouses ill suited. The research aims to propose and implement novel framework that analyze tweets data from online social networking site (OSN; i.e., Twitter). The authors fetch streaming tweets from Twitter API using Apache Flume to detect clusters of users having similar sentiment. Proposed approach utilizes scalable and fault tolerant system (i.e., Hadoop) that typically harness HDFS for data storage and map-reduce paradigm for data processing. Apache Hive is used to work on top of Hadoop for querying data. The experiments are performed to test the scalability of proposed framework by examining various sizes of data. The authors' goal is to handle big social data effectively using cost-effective tools for fetching as well as querying unstructured data and algorithms for analysing scalable, uninterrupted data streams with finite memory and resources.


2014 ◽  
Vol 22 (4) ◽  
pp. 358-370 ◽  
Author(s):  
John Haggerty ◽  
Sheryllynne Haggerty ◽  
Mark Taylor

Purpose – The purpose of this paper is to propose a novel approach that automates the visualisation of both quantitative data (the network) and qualitative data (the content) within emails to aid the triage of evidence during a forensics investigation. Email remains a key source of evidence during a digital investigation, and a forensics examiner may be required to triage and analyse large email data sets for evidence. Current practice utilises tools and techniques that require a manual trawl through such data, which is a time-consuming process. Design/methodology/approach – This paper applies the methodology to the Enron email corpus, and in particular one key suspect, to demonstrate the applicability of the approach. Resulting visualisations of network narratives are discussed to show how network narratives may be used to triage large evidence data sets. Findings – Using the network narrative approach enables a forensics examiner to quickly identify relevant evidence within large email data sets. Within the case study presented in this paper, the results identify key witnesses, other actors of interest to the investigation and potential sources of further evidence. Practical implications – The implications are for digital forensics examiners or for security investigations that involve email data. The approach posited in this paper demonstrates the triage and visualisation of email network narratives to aid an investigation and identify potential sources of electronic evidence. Originality/value – There are a number of network visualisation applications in use. However, none of these enable the combined visualisation of quantitative and qualitative data to provide a view of what the actors are discussing and how this shapes the network in email data sets.


Sign in / Sign up

Export Citation Format

Share Document