Alliance Rules- based Algorithm on Detecting Duplicate Entry Email

Journal of Computer Science and Information Technology ◽

10.35134/jcsitech.v7i2.7 ◽

2021 ◽

pp. 46-53

Author(s):

Arif Hanafi ◽

Sulaiman Harun ◽

Sofika Enggari ◽

Larissa Navia Rani

Keyword(s):

Data Warehouse ◽

Association Rules ◽

Data Cleaning ◽

Business Communication ◽

Point Of View ◽

Data Detection ◽

Data Sets ◽

Mathematical Association ◽

Dirty Data ◽

Existing Data

The way that email has extraordinary significance in present day business communication is certain. Consistently, a bulk of emails is sent from organizations to clients and suppliers, from representatives to their managers and starting with one colleague then onto the next. In this way there is vast of email in data warehouse. Data cleaning is an activity performed on the data sets of data warehouse to upgrade and keep up the quality and consistency of the data. This paper underlines the issues related with dirty data, detection of duplicatein email column. The paper identifies the strategy of data cleaning from adifferent point of view. It provides an algorithm to the discovery of error and duplicates entries in the data sets of existing data warehouse. The paper characterizes the alliance rules based on the concept of mathematical association rules to determine the duplicate entries in email column in data sets.

Download Full-text

A Data Warehouse Cleansing Approach Based on Mathematical Association Rules

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.490-495.1878 ◽

2012 ◽

Vol 490-495 ◽

pp. 1878-1882

Author(s):

Yu Xiang Song

Keyword(s):

Data Mining ◽

Data Warehouse ◽

Association Rules ◽

Data Sets ◽

Mathematical Association ◽

Manual Intervention ◽

Mining Association Rules ◽

High Degree

The alliance rules stated above based on the principle of data mining association rules provide a solution for detecting errors in the data sets. The errors are detected automatically. The manual intervention in the proposed algorithm is highly negligible resulting in high degree of automation and accuracy. The duplicity in the names field of the data warehouse has been remarkably cleansed and worked out. Domain independency has been achieved using the concept of integer domain which even adds on to the memory saving capability of the algorithm.

Download Full-text

Forecasting association rules using existing data sets

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2003.1245284 ◽

2003 ◽

Vol 15 (6) ◽

pp. 1448-1459 ◽

Cited By ~ 1

Author(s):

S.Y. Sung ◽

Zhao Li ◽

C.L. Tan ◽

P.A. Ng

Keyword(s):

Association Rules ◽

Data Sets ◽

Existing Data

Download Full-text

Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622021500188 ◽

2021 ◽

pp. 1-33

Author(s):

Kumar Rahul ◽

Rohitash Kumar Banyal

Keyword(s):

Optimization Algorithm ◽

Optimization Problems ◽

Data Cleaning ◽

Huffman Coding ◽

Grey Wolf Optimizer ◽

Data Detection ◽

Test Case ◽

Cleaning Process ◽

Data Prediction ◽

Dirty Data

Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.

Download Full-text

Confidentiality of Statistical Records: A Threat-Monitoring Scheme for On Line Dialogue

Methods of Information in Medicine ◽

10.1055/s-0038-1635718 ◽

1976 ◽

Vol 15 (01) ◽

pp. 36-42 ◽

Cited By ~ 14

Author(s):

J. Schlörer

Keyword(s):

Statistical Data ◽

Cost Benefit ◽

Data Bank ◽

High Ratio ◽

Point Of View ◽

Data Sets ◽

Monitoring Scheme ◽

Access Controls ◽

On Line ◽

Bona Fide

From a statistical data bank containing only anonymous records, the records sometimes may be identified and then retrieved, as personal records, by on line dialogue. The risk mainly applies to statistical data sets representing populations, or samples with a high ratio n/N. On the other hand, access controls are unsatisfactory as a general means of protection for statistical data banks, which should be open to large user communities. A threat monitoring scheme is proposed, which will largely block the techniques for retrieval of complete records. If combined with additional measures (e.g., slight modifications of output), it may be expected to render, from a cost-benefit point of view, intrusion attempts by dialogue valueless, if not absolutely impossible. The bona fide user has to pay by some loss of information, but considerable flexibility in evaluation is retained. The proposal of controlled classification included in the scheme may also be useful for off line dialogue systems.

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text

The position of the farm holiday in Austrian tourism

Open Agriculture ◽

10.1515/opag-2019-0069 ◽

2019 ◽

Vol 4 (1) ◽

pp. 697-711 ◽

Cited By ~ 2

Author(s):

Erika Quendler

Keyword(s):

Empirical Research ◽

Relative Position ◽

Secondary Data ◽

Point Of View ◽

Local Context ◽

Supply Side ◽

Data Sets ◽

Demand Side ◽

Tourist Destinations ◽

Tourist Accommodation

AbstractTourism is vitally important to the Austrian economy. The number of tourist destinations, both farms and other forms of accommodation, in the different regions of Austria is considerably and constantly changing. This paper discusses the position of the ‘farm holiday’ compared to other forms of tourism. Understanding the resilience of farm holidays is especially important but empirical research on this matter remains limited. The term ‘farm holiday’ covers staying overnight on a farm that is actively engaged in agriculture and has a maximum of 10 guest beds. The results reported in this paper are based on an analysis of secondary data from 2000 and 2018 by looking at two types of indicator: (i) accommodation capacity (supply side) and (ii) attractiveness of a destination (demand side). The data sets cover Austria and its NUTS3 regions. The results show the evolution of farm holidays vis-à-vis other forms of tourist accommodation. In the form of a quadrant matrix they also show the relative position of farm holidays regionally. While putting into question the resilience of farm holidays, the data also reveals where farm holidays could act to expand this niche or learn and improve to effect a shift in their respective position relative to the market ‘leaders’. However, there is clearly a need to learn more about farm holidays within the local context. This paper contributes to our knowledge of farm holidays from a regional point of view and tries to elaborate on the need for further research.

Download Full-text

Dehydration of seabird prey during transport to the colony: effects on wet weight energy densities

Canadian Journal of Zoology ◽

10.1139/z87-427 ◽

1987 ◽

Vol 65 (11) ◽

pp. 2822-2824 ◽

Cited By ~ 18

Author(s):

W. A. Montevecchi ◽

J. F. Piatt

Keyword(s):

Water Content ◽

Dry Weight ◽

Data Sets ◽

Fresh Weight ◽

Present Evidence ◽

Wet Weight ◽

Cautious Interpretation ◽

Existing Data

We present evidence to indicate that dehydration of prey transported by seabirds from capture sites at sea to chicks at colonies inflates estimates of wet weight energy densities. These findings and a comparison of wet and dry weight energy densities reported in the literature emphasize the importance of (i) accurate measurement of the fresh weight and water content of prey, (ii) use of dry weight energy densities in comparisons among species, seasons, and regions, and (iii) cautious interpretation and extrapolation of existing data sets.

Download Full-text

Suppressing Data Sets to Prevent Discovery of Association Rules

Fifth IEEE International Conference on Data Mining (ICDM'05) ◽

10.1109/icdm.2005.140 ◽

2006 ◽

Cited By ~ 6

Author(s):

A.A. Hintoglu ◽

A. Inan ◽

Y. Saygin ◽

M. Keskinoz

Keyword(s):

Association Rules ◽

Data Sets

Download Full-text

Meta-Analysis Combining New and Existing Data Sets Confirms that the TERT–CLPTM1L Locus Influences Melanoma Risk

Journal of Investigative Dermatology ◽

10.1038/jid.2011.322 ◽

2012 ◽

Vol 132 (2) ◽

pp. 485-487 ◽

Cited By ~ 35

Author(s):

Matthew H. Law ◽

Grant W. Montgomery ◽

Kevin M. Brown ◽

Nicholas G. Martin ◽

Graham J. Mann ◽

...

Keyword(s):

Meta Analysis ◽

Data Sets ◽

Melanoma Risk ◽

Existing Data

Download Full-text