Forecasting association rules using existing data sets

2003 ◽  
Vol 15 (6) ◽  
pp. 1448-1459 ◽  
Author(s):  
S.Y. Sung ◽  
Zhao Li ◽  
C.L. Tan ◽  
P.A. Ng
Author(s):  
Arif Hanafi ◽  
Sulaiman Harun ◽  
Sofika Enggari ◽  
Larissa Navia Rani

The way that email has extraordinary significance in present day business communication is certain. Consistently, a bulk of emails is sent from organizations to clients and suppliers, from representatives to their managers and starting with one colleague then onto the next. In this way there is vast of email in data warehouse. Data cleaning is an activity performed on the data sets of data warehouse to upgrade and keep up the quality and consistency of the data. This paper underlines the issues related with dirty data, detection of duplicatein email column. The paper identifies the strategy of data cleaning from adifferent point of view. It provides an algorithm to the discovery of error and duplicates entries in the data sets of existing data warehouse. The paper characterizes the alliance rules based on the concept of mathematical association rules to determine the duplicate entries in email column in data sets.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Eleanor F. Miller ◽  
Andrea Manica

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.


1987 ◽  
Vol 65 (11) ◽  
pp. 2822-2824 ◽  
Author(s):  
W. A. Montevecchi ◽  
J. F. Piatt

We present evidence to indicate that dehydration of prey transported by seabirds from capture sites at sea to chicks at colonies inflates estimates of wet weight energy densities. These findings and a comparison of wet and dry weight energy densities reported in the literature emphasize the importance of (i) accurate measurement of the fresh weight and water content of prey, (ii) use of dry weight energy densities in comparisons among species, seasons, and regions, and (iii) cautious interpretation and extrapolation of existing data sets.


2012 ◽  
Vol 132 (2) ◽  
pp. 485-487 ◽  
Author(s):  
Matthew H. Law ◽  
Grant W. Montgomery ◽  
Kevin M. Brown ◽  
Nicholas G. Martin ◽  
Graham J. Mann ◽  
...  

2015 ◽  
Vol 2015 ◽  
pp. 1-14
Author(s):  
Mengling Zhao ◽  
Hongwei Liu

As a computational intelligence method, artificial immune network (AIN) algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new classification algorithm an associate rules mining algorithm based on artificial immune network (ARM-AIN). The new method uses the association rules to represent immune cells and mine the best association rules rather than searching optimal clustering centers. The proposed algorithm has been extensively compared with artificial immune network classification (AINC) algorithm, artificial immune network classification algorithm based on self-adaptive PSO (SPSO-AINC), and PSO-AINC over several large-scale data sets, target recognition of remote sensing image, and segmentation of three different SAR images. The result of experiment indicates the superiority of ARM-AIN in classification accuracy and running time.


2021 ◽  
Author(s):  
Ilke Aydogan

Prior beliefs and their updating play a crucial role in decisions under uncertainty, and theories about them have been well established in classical Bayesianism. Yet, they are almost absent for ambiguous decisions from experience. This paper proposes a new decision model that incorporates the role of prior beliefs, beyond the role of ambiguity attitudes, into the analysis of such decisions. Hence, it connects ambiguity theories, popular in economics, with decision from experience, popular (mostly) in psychology, to the benefit of both. A reanalysis of some existing data sets from the literature on decisions from experience shows that the model that incorporates prior beliefs into the estimation of subjective probabilities outperforms the commonly used model that approximates subjective probabilities with observed relative frequencies. Controlling for subjective priors, we obtain more accurate measurements of ambiguity attitudes, and thus a new explanation of the gap between decision from description and decision from experience. This paper was accepted by Manel Baucells, decision analysis.


Author(s):  
Anthony Scime ◽  
Karthik Rajasethupathy ◽  
Kulathur S. Rajasethupathy ◽  
Gregg R. Murray

Data mining is a collection of algorithms for finding interesting and unknown patterns or rules in data. However, different algorithms can result in different rules from the same data. The process presented here exploits these differences to find particularly robust, consistent, and noteworthy rules among much larger potential rule sets. More specifically, this research focuses on using association rules and classification mining to select the persistently strong association rules. Persistently strong association rules are association rules that are verifiable by classification mining the same data set. The process for finding persistent strong rules was executed against two data sets obtained from the American National Election Studies. Analysis of the first data set resulted in one persistent strong rule and one persistent rule, while analysis of the second data set resulted in 11 persistent strong rules and 10 persistent rules. The persistent strong rule discovery process suggests these rules are the most robust, consistent, and noteworthy among the much larger potential rule sets.


2008 ◽  
pp. 2105-2120
Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


<strong><em>Abstract. </em><strong>Fishers have often complained that standard United Kingdom groundfish survey data do not adequately reflect the grounds targeted by commercial fishers, and hence, scientists tend to make overcautious estimates of fish abundance. Such criticisms are of particular importance if we are to make a creditable attempt to classify potential essential fish habitat (EFH) using existing data from groundfish surveys. Nevertheless, these data sets provide a powerful tool to examine temporal abundance of fish on a large spatial scale. Here, we report a questionnaire-type survey of fishers (2001–2002) that invited them to plot the location of grounds of key importance in the Irish Sea and to comment on key habitat features that might constitute EFH for Atlantic cod <em>Gadus morhua</em>, haddock <em>Melanogrammus aeglefinus</em>, and European whiting <em>Merlangius merlangus</em>. Plotted grounds were cross-checked using records of vessel sightings by fishery protection aircraft (1985–1999). A comparison of the areas of seabed highlighted by fishers and the observations made on groundfish surveys were broadly compatible for all three species of gadoids examined. Both methods indicated important grounds for cod and European whiting off northern Wales, the Ribble estuary, Solway Firth, north of Dublin, and Belfast Lough. The majority of vessel sightings by aircraft did not match the areas plotted by fishers. However, fishing restrictions, adverse weather conditions, and seasonal variation of fish stocks may have forced fishers to operate outside their favored areas on the (few) occasions that they had been recorded by aircraft. Fishers provided biological observations that were consistent among several independent sources (e.g., the occurrence of haddock over brittle star [ophiuroid] beds). We conclude that fishers’ knowledge is a useful supplement to existing data sets that can better focus more detailed EFH studies.


Sign in / Sign up

Export Citation Format

Share Document