An Exact Algorithm for Finding K-Biclique Vertex Partitions of Bipartites

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.710.687 ◽

2013 ◽

Vol 710 ◽

pp. 687-691

Author(s):

Pei Qiang Liu

Keyword(s):

Data Mining ◽

Information Security ◽

Computational Biology ◽

Exact Algorithm ◽

Experimental Results ◽

Sufficient Condition ◽

Partition Problem ◽

Fast Speed ◽

Vertex Partition ◽

Vertex Partitions

Biclustering has been extensively studied in many fields such as data mining, e-commerce, computational biology, information security, etc. Problems of finding bicliques in bipartite, which are variants of biclustering, have received much attention in recent years due to its importance for biclustering. The k-biclique vertex partition problem proposed by Bein et al. is one of finding bicliques problems in bipartite. Its aim is to find k bicliques (kk) such that each vertex of the bipartite occurs in exactly one member of these bicliques. First, we give a sufficient condition of the k-biclique vertex partition problem. Moreover, we present an exact algorithm for finding k-biclique vertex partitions of a bipartite. Finally, we propose a method to generate simulated datasets used to test the algorithm. Experimental results on simulated datasets show that the algorithm can find k-biclique vertex partitions of a bipartite with relatively fast speed.

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

A UMDA-Based Discretization Method for Continuous Attributes

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.1834 ◽

2011 ◽

Vol 403-408 ◽

pp. 1834-1838

Author(s):

Jing Zhao ◽

Chong Zhao Han ◽

Bin Wei ◽

De Qiang Han

Keyword(s):

Machine Learning ◽

Data Mining ◽

Evolutionary Algorithms ◽

Marginal Distribution ◽

Convergence Speed ◽

Fast Convergence ◽

Experimental Results ◽

Discretization Method ◽

Bottom Up ◽

Global Dynamic

Discretization of continuous attributes have played an important role in machine learning and data mining. They can not only improve the performance of the classifier, but also reduce the space of the storage. Univariate Marginal Distribution Algorithm is a modified Evolutionary Algorithms, which has some advantages over classical Evolutionary Algorithms such as the fast convergence speed and few parameters need to be tuned. In this paper, we proposed a bottom-up, global, dynamic, and supervised discretization method on the basis of Univariate Marginal Distribution Algorithm.The experimental results showed that the proposed method could effectively improve the accuracy of classifier.

Download Full-text

Data Mining Usage in Corporate Information Security: Intrusion Detection Applications

Business Systems Research Journal ◽

10.1515/bsrj-2017-0005 ◽

2017 ◽

Vol 8 (1) ◽

pp. 51-59 ◽

Cited By ~ 2

Author(s):

Masoud Al Quhtani

Keyword(s):

Data Mining ◽

Information Security ◽

Intrusion Detection ◽

Mining System ◽

Systematic Analysis ◽

New Methods ◽

Use Of Data ◽

Efficient Data ◽

Abuse Risk ◽

Corporate Information

AbstractBackground: The globalization era has brought with it the development of high technology, and therefore new methods of preserving and storing data. New data storing techniques ensure data are stored for longer periods of time, more efficiently and with a higher quality, but also with a higher data abuse risk. Objective: The goal of the paper is to provide a review of the data mining applications for the purpose of corporate information security, and intrusion detection in particular. Methods/approach: The review was conducted using the systematic analysis of the previously published papers on the usage of data mining in the field of corporate information security. Results: This paper demonstrates that the use of data mining applications is extremely useful and has a great importance for establishing corporate information security. Data mining applications are directly related to issues of intrusion detection and privacy protection. Conclusions: The most important fact that can be specified based on this study is that corporations can establish a sustainable and efficient data mining system that will ensure privacy and successful protection against unwanted intrusions.

Download Full-text

Data Mining in Computational Biology

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_2367 ◽

2009 ◽

pp. 599-599

Keyword(s):

Data Mining ◽

Computational Biology

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

Информационно-кибернетический подход к проектированию педагогического эксперимента

Revistă de Ştiinţe Socio-Umane = Journal of Social and Human Sciences ◽

10.46727/jshs.2021.v48.i2.p111-124 ◽

2021 ◽

Vol 48 (2) ◽

pp. 111-124

Author(s):

Виолетта Богданова

Keyword(s):

Information Security ◽

Experimental Results ◽

Whitney Test ◽

Mann Whitney Test ◽

Angular Transformation ◽

Teaching Information

The stages of a pedagogical experiment on teaching information security of future economists are presented from a cybernetic informational perspective. The experimental results were statistically processed, evaluated using the nonparametric Mann-Whitney test and the φ *criterion - Fisher's angular transformation.

Download Full-text

Research on Data Mining Optimization and Security Based on MapReduce

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.631-632.1053 ◽

2014 ◽

Vol 631-632 ◽

pp. 1053-1056

Author(s):

Hui Xia

Keyword(s):

Data Mining ◽

Execution Time ◽

Cluster Computing ◽

Limited Resource ◽

Experimental Results ◽

Computing Environment ◽

Cluster Systems ◽

National Education ◽

Distributed Cluster ◽

Data Optimization

The paper addressed the issues of limited resource for data optimization for efficiency, reliability, scalability and security of data in distributed, cluster systems with huge datasets. The study’s experimental results predicted that the MapReduce tool developed improved data optimization. The system exhibits undesired speedup with smaller datasets, but reasonable speedup is achieved with a larger enough datasets that complements the number of computing nodes reducing the execution time by 30% as compared to normal data mining and processing. The MapReduce tool is able to handle data growth trendily, especially with larger number of computing nodes. Scaleup gracefully grows as data and number of computing nodes increases. Security of data is guaranteed at all computing nodes since data is replicated at various nodes on the cluster system hence reliable. Our implementation of the MapReduce runs on distributed cluster computing environment of a national education web portal and is highly scalable.

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

Emerging Technologies in Data Mining and Information Security

10.1007/978-981-13-1951-8 ◽

2019 ◽

Keyword(s):

Data Mining ◽

Information Security ◽

Emerging Technologies

Download Full-text

Mining Allocating Patterns in Investment Portfolios

Data Mining Applications for Empowering Knowledge Societies ◽

10.4018/978-1-59904-657-0.ch007 ◽

2009 ◽

pp. 110-135

Author(s):

Yanbo J. Wang ◽

Xinwei Zheng ◽

Frans Coenen

Keyword(s):

Data Mining ◽

Portfolio Management ◽

Association Rule ◽

Common Type ◽

Experimental Results ◽

Transaction Database ◽

Investment Portfolios ◽

The Given

An association rule (AR) is a common type of mined knowledge in data mining that describes an implicative co-occurring relationship between two sets of binary-valued transaction-database attributes, expressed in the form of an ? rule. A variation of ARs is the (WARs), which addresses the weighting issue in ARs. In this chapter, the authors introduce the concept of “one-sum” WAR and name such WARs as allocating patterns (ALPs). An algorithm is proposed to extract hidden and interesting ALPs from data. The authors further indicate that ALPs can be applied in portfolio management. Firstly by modelling a collection of investment portfolios as a one-sum weighted transaction- database that contains hidden ALPs. Secondly the authors show that ALPs, mined from the given portfolio-data, can be applied to guide future investment activities. The experimental results show good performance that demonstrates the effectiveness of using ALPs in the proposed application.

Download Full-text