scholarly journals The Hierarchies of Multivalued Attribute Domains and Corresponding Applications in Data Mining

2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Yuxia Lei ◽  
Yushu Yan ◽  
Yonghua Han ◽  
Feng Jiang

In mobile computing, machine learning models for natural language processing (NLP) have become one of the most attractive focus areas in research. Association rules among attributes are common knowledge patterns, which can often provide potential and useful information such as mobile users' interests. Actually, almost each attribute is associated with a hierarchy of the domain. Given an relation R=(U,A) and any cut αa on the hierarchy for every attribute a, there is another rough relation RΦ, where Φ=(αa:a∈A). This paper will establish the connection between the functional dependencies in R and RΦ, propose the method for extracting reducts in RΦ, and demonstrate the implementation of proposed method on an application in data mining of association rules. The method for acquiring association rules consists of the following three steps: (1) translating natural texts into relations, by NLP; (2) translating relations into rough ones, by attributes analysis or fuzzy k-means (FKM) clustering; and (3) extracting association rules from concept lattices, by formal concept analysis (FCA). Our experimental results show that the proposed methods, which can be applied directly to regular mobile data such as healthcare data, improved quality, and relevance of rules.

Association Rule Mining (ARM) is a data mining approach for discovering rules that reveal latent associations among persisted entity sets. ARM has many significant applications in the real world such as finding interesting incidents, analyzing stock market data and discovering hidden relationships in healthcare data to mention few. Many algorithms that are efficient to mine association rules are found in the existing literature, apriori-based and Pattern-Growth. Comprehensive understanding of them helps data mining community and its stakeholders to make expert decisions. Dynamic update of association rules that have been discovered already is very challenging due to the fact that the changes are arbitrary and heterogeneous in the kind of operations. When new instances are added to existing dataset that has been subjected to ARM, only those instances are to be used in order to go for incremental mining of rules instead of considering the whole dataset again. Recently some algorithms were developed by researchers especially to achieve incremental ARM. They are broadly grouped into Apriori-based and Pattern-Growth. This paper provides review of Apriori-based and Pattern-Growth techniques that support incremental ARM.


Author(s):  
Ahmed Abdullah Awadh Koofan ◽  
Mohammed Kaleem

-Data mining is a powerful technology for analyzing huge data, it has many techniques such as; classification, clustering, prediction and association rules etc., In this research Association rule will be used for analyzing data, which will help to extract the data related to combinations of items. Numerous customers tends to purchase items regularly, each time they visit supermarket, customer’s need to move around from shelf to shelf for the product of their interest which is time consuming. This research will help to minimize the time consumption for customers by analyzing the customer’s invoices and letting know the supermarket about the patterns of customer's orientations. In this work python tool will be used for data mining, by using association rule to analyze the customer’s purchases and retrieve the relevant information which will help to determine the customer’s pattern and know the association between products. In this rationale, the data of customer’s purchases were collected from Lulu hypermarket for data analysis and the outcomes of the analysis is to know the customer’s patterns and making the shopping easy by reorganizing the related items and the most buying items together on same shelf.


Author(s):  
Giulia Bruno ◽  
Paolo Garza ◽  
Elisa Quintarelli

In the context of anomaly detection, the data mining technique of extracting association rules can be used to identify rare rules which represent infrequent situations. A method to detect rare rules is to first infer the normal behavior of objects in the form of quasi-functional dependencies (i.e. functional dependencies that frequently hold), and then analyzing rare violations with respect to them. The quasi-functional dependencies are usually inferred from the current instance of a database. However, in several applications, the database is not static, but new data are added or deleted continuously. Thus, the anomalies have to be updated because they change over time. In this chapter, we propose an incremental algorithm to efficiently maintain up-to-date rules (i.e., functional and quasi-functional dependencies). The impact of the cardinality of the data set and the number of new tuples on the execution time is evaluated through a set of experiments on synthetic and real databases, whose results are here reported.


2014 ◽  
Vol 1 (1) ◽  
pp. 339-342
Author(s):  
Mirela Danubianu ◽  
Dragos Mircea Danubianu

AbstractSpeech therapy can be viewed as a business in logopaedic area that aims to offer services for correcting language. A proper treatment of speech impairments ensures improved efficiency of therapy, so, in order to do that, a therapist must continuously learn how to adjust its therapy methods to patient's characteristics. Using Information and Communication Technology in this area allowed collecting a lot of data regarding various aspects of treatment. These data can be used for a data mining process in order to find useful and usable patterns and models which help therapists to improve its specific education. Clustering, classification or association rules can provide unexpected information which help to complete therapist's knowledge and to adapt the therapy to patient's needs.


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2014 ◽  
Vol 5 (3) ◽  
pp. 11-28
Author(s):  
Ljiljana Kašćelan ◽  
Vladimir Kašćelan ◽  
Milijana Novović-Burić

This paper has proposed a data mining approach for risk assessment in car insurance. Standard methods imply classification of policies to great number of tariff classes and assessment of risk on basis of them. With application of data mining techniques, it is possible to get functional dependencies between the level of risk and risk factors as well as better results in predictions. On the case study data it has been proved that data mining techniques can, with better accuracy than the standard methods, predict claim sizes and occurrence of claims, and this represents the basis for calculation of net risk premium and risk classification. This paper, also, discusses advantages of data mining methods compared to standard methods for risk assessment in car insurance, as well as the specificities of the obtained results due to small insurance market, such is the one in Montenegro.


Sign in / Sign up

Export Citation Format

Share Document