Design and Application of a Multi-Variant Expert System Using Apache Hadoop Framework

Movie recommender expert systems are valuable tools to provide recommendation services to users. However, the existing movie recommenders are technically lacking in two areas: first, the available movie recommender systems give general recommendations; secondly, existing recommender systems use either quantitative (likes, ratings, etc.) or qualitative data (polarity score, sentiment score, etc.) for achieving the movie recommendations. A novel approach is presented in this paper that not only provides topic-based (fiction, comedy, horror, etc.) movie recommendation but also uses both quantitative and qualitative data to achieve a true and relevant recommendation of a movie relevant to a topic. The used approach relies on SentiwordNet and tf-idf similarity measures to calculate the polarity score from user reviews, which represent the qualitative aspect of likeness of a movie. Similarly, three quantitative variables (such as likes, ratings, and votes) are used to get final a recommendation score. A fuzzy logic module decides the recommendation category based on this final recommendation score. The proposed approach uses a big data technology, “Hadoop” to handle data diversity and heterogeneity in an efficient manner. An Android application collaborates with a web-bot to use recommendation services and show topic-based recommendation to users.

Download Full-text

BaPa: A Novel Approach of Improving Load Balance in Parallel Matrix Factorization for Recommender Systems

IEEE Transactions on Computers ◽

10.1109/tc.2020.2997051 ◽

2020 ◽

pp. 1-1

Author(s):

Ruixin Guo ◽

Feng Zhang ◽

Lizhe Wang ◽

Wusheng Zhang ◽

Xinya Lei ◽

...

Keyword(s):

Recommender Systems ◽

Matrix Factorization ◽

Load Balance ◽

Novel Approach

Download Full-text

A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0561 ◽

2018 ◽

Vol 29 (1) ◽

pp. 653-663 ◽

Cited By ~ 1

Author(s):

Ritu Meena ◽

Kamal K. Bharadwaj

Keyword(s):

Genetic Algorithm ◽

Recommender Systems ◽

Rank Aggregation ◽

Partial Ranking ◽

Novel Approach ◽

Kendall Tau Distance ◽

Ultimate Objective ◽

Aggregation Technique ◽

The Individual ◽

Group Recommender

Abstract Many recommender systems frequently make suggestions for group consumable items to the individual users. There has been much work done in group recommender systems (GRSs) with full ranking, but partial ranking (PR) where items are partially ranked still remains a challenge. The ultimate objective of this work is to propose rank aggregation technique for effectively handling the PR problem. Additionally, in real applications, most of the studies have focused on PR without ties (PRWOT). However, the rankings may have ties where some items are placed in the same position, but where some items are partially ranked to be aggregated may not be permutations. In this work, in order to handle problem of PR in GRS for PRWOT and PR with ties (PRWT), we propose a novel approach to GRS based on genetic algorithm (GA) where for PRWOT Spearman foot rule distance and for PRWT Kendall tau distance with bucket order are used as fitness functions. Experimental results are presented that clearly demonstrate that our proposed GRS based on GA for PRWOT (GRS-GA-PRWOT) and PRWT (GRS-GA-PRWT) outperforms well-known baseline GRS techniques.

Download Full-text

MOVIE RECOMMENDATION USING KNOWLEDGE GRAPH

EPRA International Journal of Research & Development (IJRD) ◽

10.36713/epra6692 ◽

2020 ◽

pp. 99-103

Author(s):

Navin Tatyaba Gopal ◽

Anish Raj Khobragade

Keyword(s):

Neural Network ◽

Recommender Systems ◽

Weighted Graph ◽

Cold Start ◽

Structured Data ◽

Label Propagation ◽

Knowledge Graph ◽

Movie Recommendation ◽

Knowledge Graphs ◽

Prior Application

The Knowledge graphs (KGs) catches structured data and relationships among a bunch of entities and items. Generally, constitute an attractive origin of information that can advance the recommender systems. But, present methodologies of this area depend on manual element thus don’t permit for start to end training. This article proposes, Knowledge Graph along with Label Smoothness (KG-LS) to offer better suggestions for the recommender Systems. Our methodology processes user-specific entities by prior application of a function capability that recognizes key KG-relationships for a specific user. In this manner, we change the KG in a specific-user weighted graph followed by application of a graph neural network to process customized entity embedding. To give better preliminary predisposition, label smoothness comes into picture, which places items in the KG which probably going to have identical user significant names/scores. Use of, label smoothness gives regularization above the edge weights thus; we demonstrate that it is comparable to a label propagation plan on the graph. Additionally building-up a productive usage that symbolizes solid adaptability concerning the size of knowledge graph. Experimentation on 4 datasets shows that our strategy beats best in class baselines. This process likewise accomplishes solid execution in cold start situations where user-entity communications remain meager.

Download Full-text

Using Clustering as a Tool: Mixed Methods in Qualitative Data Analysis

The Qualitative Report ◽

10.46743/2160-3715/2015.2201 ◽

2015 ◽

Author(s):

Laura Macia

Keyword(s):

Cluster Analysis ◽

Mixed Methods ◽

Data Analysis ◽

Qualitative Data ◽

Similarity Measures ◽

Cluster Solution ◽

Clustering Methods ◽

Qualitative Data Analysis ◽

Research Project ◽

Detailed Explanation

In this article I discuss cluster analysis as an exploratory tool to support the identification of associations within qualitative data. While not appropriate for all qualitative projects, cluster analysis can be particularly helpful in identifying patterns where numerous cases are studied. I use as illustration a research project on Latino grievances to offer a detailed explanation of the main steps in cluster analysis, providing specific considerations for its use with qualitative data. I specifically describe the issues of data transformation, the choice of clustering methods and similarity measures, the identification of a cluster solution, and the interpretation of the data in a qualitative context.

Download Full-text

A Semantic Approach for News Recommendation

Business Intelligence Applications and the Web - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-61350-038-5.ch005 ◽

2011 ◽

pp. 102-121 ◽

Cited By ~ 3

Author(s):

Flavius Frasincar ◽

Wouter IJntema ◽

Frank Goossen ◽

Frederik Hogenboom

Keyword(s):

Recommender Systems ◽

Similarity Measures ◽

Extraction Methods ◽

Decision Processes ◽

Semantic Approach ◽

Business Decision ◽

Term Extraction ◽

Cosine Similarity Measure ◽

News Recommendation ◽

F Measure

News items play an increasingly important role in the current business decision processes. Due to the large amount of news published every day it is difficult to find the new items of one’s interest. One solution to this problem is based on employing recommender systems. Traditionally, these recommenders use term extraction methods like TF-IDF combined with the cosine similarity measure. In this chapter, we explore semantic approaches for recommending news items by employing several semantic similarity measures. We have used existing semantic similarities as well as proposed new solutions for computing semantic similarities. Both traditional and semantic recommender approaches, some new, have been implemented in Athena, an extension of the Hermes news personalization framework. Based on the performed evaluation, we conclude that semantic recommender systems in general outperform traditional recommenders systems with respect to accuracy, precision, and recall, and that the new semantic recommenders have a better F-measure than existing semantic recommenders.

Download Full-text

Clustering Approaches for Top-k Recommender Systems

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019500192 ◽

2019 ◽

Vol 28 (05) ◽

pp. 1950019 ◽

Cited By ~ 1

Author(s):

Nicolás Torres ◽

Marcelo Mendoza

Keyword(s):

Recommender Systems ◽

Recommender System ◽

Large Scale ◽

Empirical Studies ◽

Cluster Structure ◽

Parallel Architecture ◽

Evaluation Framework ◽

Training Data ◽

Cluster Solution ◽

Novel Approach

Clustering-based recommender systems bound the seek of similar users within small user clusters providing fast recommendations in large-scale datasets. Then groups can naturally be distributed into different data partitions scaling up in the number of users the recommender system can handle. Unfortunately, while the number of users and items included in a cluster solution increases, the performance in terms of precision of a clustering-based recommender system decreases. We present a novel approach that introduces a cluster-based distance function used for neighborhood computation. In our approach, clusters generated from the training data provide the basis for neighborhood selection. Then, to expand the search of relevant users, we use a novel measure that can exploit the global cluster structure to infer cluster-outside user’s distances. Empirical studies on five widely known benchmark datasets show that our proposal is very competitive in terms of precision, recall, and NDCG. However, the strongest point of our method relies on scalability, reaching speedups of 20× in a sequential computing evaluation framework and up to 100× in a parallel architecture. These results show that an efficient implementation of our cluster-based CF method can handle very large datasets providing also good results in terms of precision, avoiding the high computational costs involved in the application of more sophisticated techniques.

Download Full-text

The importance of big data technology

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.21139 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 485

Author(s):

Samson Fadiya ◽

Arif Sari

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Management Systems ◽

Apache Hadoop ◽

Industry Standard ◽

Data Framework ◽

Efficiency And Effectiveness ◽

Relational Database Management ◽

Big Data Technology

The adoption of Web 2.0 technologies, Internet of Things, etc. by individuals and organization has led to an explosion of data. As it stands, existing Relational Database Management Systems (RDBMSs) are incapable of handling this deluge of data. The term Big Data was coined to represent these vast, fast and complex datasets that regular RDBMSs could not handle. Special tools or frameworks were developed to deal with processing, managing and storing this big data. These tools are capable of functioning in distributed industry- standard environments thereby maintaining efficiency and effectiveness at a business level. Apache Hadoop is an example of such a framework. This report discusses big data, it origins, opportunities and challenges that it presents, big data analytics and the application of big data using existing big data tools or frameworks. It also discusses Apache Hadoop as a big data framework and provides a basic overview of this technology from technological and business perspectives.

Download Full-text

Is P&T Ready to Add Rapid Cycle Analytics to Formulary?

Hospital Pharmacy ◽

10.1177/0018578720918341 ◽

2020 ◽

pp. 001857872091834

Author(s):

Diana Altshuler ◽

Kenny Yu ◽

John Papadopoulos ◽

Arash Dabestani

Keyword(s):

Real World ◽

Average Length ◽

Evaluation Process ◽

Direct Patient Care ◽

Pharmacy Staff ◽

Efficient Manner ◽

Data Set ◽

Novel Approach ◽

Minimum Criteria ◽

Point Of Reference

Purpose: The intent of this article is to evaluate a novel approach, using rapid cycle analytics and real world evidence, to optimize and improve the medication evaluation process to help the formulary decision making process, while reducing time for clinicians. Summary: The Pharmacy and Therapeutics (P&T) Committee within each health system is responsible for evaluating medication requests for formulary addition. Members of the pharmacy staff prepare the drug monograph or a medication use evaluation (MUE) and allocate precious clinical resources to review patient charts to assess efficacy and value. We explored a novel approach to evaluate the value of our intravenous acetaminophen (IV APAP) formulary admittance. This new methodology, called rapid cycle analytics, can assist hospitals in meeting and/or exceeding the minimum criteria of formulary maintenance as defined by the Joint Commission Standards. In this particular study, we assessed the effectiveness of IV APAP in total hip arthroplasty (THA) and total knee arthroplasty (TKA) procedures. We assessed the correlation to same-stay opioid utilization, average length of inpatient stay and post anesthesia care unit (PACU) time. Conclusion: We were able to explore and improve our organization’s approach in evaluating medications by partnering with an external analytics expert to help organize and normalize our data in a more robust, yet time efficient manner. Additionally, we were able to use a significantly larger external data set as a point of reference. Being able to perform this detailed analytical exercise for thousands of encounters internally and using a data warehouse of over 130 million patients as a point of reference in a short time has improved the depth of our assessment, as well as reducing valuable clinical resources allocated to MUEs to allow for more direct patient care. This clinically real-world and data-rich analytics model is the necessary foundation for using Artificial or Augmented Intelligence (AI) to make real-time formulary and drug selection decisions

Download Full-text

Topic Sensitive User Clustering Using Sentiment Score and Similarity Measures

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.2020040103 ◽

2020 ◽

Vol 15 (2) ◽

pp. 34-45

Author(s):

Bharat Tidke ◽

Rupa Mehta ◽

Dipti Rana ◽

Hullash Jangir

Keyword(s):

Data Storage ◽

Fault Tolerant ◽

Similarity Measures ◽

Cost Effective ◽

Social Networking Site ◽

Social Media Data ◽

User Clustering ◽

Finite Memory ◽

Sentiment Score ◽

Media Data

Social media data (SMD) is driven by statistical and analytical technologies to obtain information for various decisions. SMD is vast and evolutionary in nature which makes traditional data warehouses ill suited. The research aims to propose and implement novel framework that analyze tweets data from online social networking site (OSN; i.e., Twitter). The authors fetch streaming tweets from Twitter API using Apache Flume to detect clusters of users having similar sentiment. Proposed approach utilizes scalable and fault tolerant system (i.e., Hadoop) that typically harness HDFS for data storage and map-reduce paradigm for data processing. Apache Hive is used to work on top of Hadoop for querying data. The experiments are performed to test the scalability of proposed framework by examining various sizes of data. The authors' goal is to handle big social data effectively using cost-effective tools for fetching as well as querying unstructured data and algorithms for analysing scalable, uninterrupted data streams with finite memory and resources.

Download Full-text

Forensic triage of email network narratives through visualisation

Information Management & Computer Security ◽

10.1108/imcs-11-2013-0080 ◽

2014 ◽

Vol 22 (4) ◽

pp. 358-370 ◽

Cited By ~ 6

Author(s):

John Haggerty ◽

Sheryllynne Haggerty ◽

Mark Taylor

Keyword(s):

Qualitative Data ◽

Data Sets ◽

Content Type ◽

Electronic Evidence ◽

Relevant Evidence ◽

Novel Approach ◽

Potential Sources ◽

Digital Investigation ◽

Tools And Techniques

Purpose – The purpose of this paper is to propose a novel approach that automates the visualisation of both quantitative data (the network) and qualitative data (the content) within emails to aid the triage of evidence during a forensics investigation. Email remains a key source of evidence during a digital investigation, and a forensics examiner may be required to triage and analyse large email data sets for evidence. Current practice utilises tools and techniques that require a manual trawl through such data, which is a time-consuming process. Design/methodology/approach – This paper applies the methodology to the Enron email corpus, and in particular one key suspect, to demonstrate the applicability of the approach. Resulting visualisations of network narratives are discussed to show how network narratives may be used to triage large evidence data sets. Findings – Using the network narrative approach enables a forensics examiner to quickly identify relevant evidence within large email data sets. Within the case study presented in this paper, the results identify key witnesses, other actors of interest to the investigation and potential sources of further evidence. Practical implications – The implications are for digital forensics examiners or for security investigations that involve email data. The approach posited in this paper demonstrates the triage and visualisation of email network narratives to aid an investigation and identify potential sources of electronic evidence. Originality/value – There are a number of network visualisation applications in use. However, none of these enable the combined visualisation of quantitative and qualitative data to provide a view of what the actors are discussing and how this shapes the network in email data sets.

Download Full-text