An Exact Algorithm for Finding K-Biclique Vertex Partitions of Bipartites

2013 ◽  
Vol 710 ◽  
pp. 687-691
Author(s):  
Pei Qiang Liu

Biclustering has been extensively studied in many fields such as data mining, e-commerce, computational biology, information security, etc. Problems of finding bicliques in bipartite, which are variants of biclustering, have received much attention in recent years due to its importance for biclustering. The k-biclique vertex partition problem proposed by Bein et al. is one of finding bicliques problems in bipartite. Its aim is to find k bicliques (kk) such that each vertex of the bipartite occurs in exactly one member of these bicliques. First, we give a sufficient condition of the k-biclique vertex partition problem. Moreover, we present an exact algorithm for finding k-biclique vertex partitions of a bipartite. Finally, we propose a method to generate simulated datasets used to test the algorithm. Experimental results on simulated datasets show that the algorithm can find k-biclique vertex partitions of a bipartite with relatively fast speed.

Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2011 ◽  
Vol 403-408 ◽  
pp. 1834-1838
Author(s):  
Jing Zhao ◽  
Chong Zhao Han ◽  
Bin Wei ◽  
De Qiang Han

Discretization of continuous attributes have played an important role in machine learning and data mining. They can not only improve the performance of the classifier, but also reduce the space of the storage. Univariate Marginal Distribution Algorithm is a modified Evolutionary Algorithms, which has some advantages over classical Evolutionary Algorithms such as the fast convergence speed and few parameters need to be tuned. In this paper, we proposed a bottom-up, global, dynamic, and supervised discretization method on the basis of Univariate Marginal Distribution Algorithm.The experimental results showed that the proposed method could effectively improve the accuracy of classifier.


2017 ◽  
Vol 8 (1) ◽  
pp. 51-59 ◽  
Author(s):  
Masoud Al Quhtani

AbstractBackground: The globalization era has brought with it the development of high technology, and therefore new methods of preserving and storing data. New data storing techniques ensure data are stored for longer periods of time, more efficiently and with a higher quality, but also with a higher data abuse risk. Objective: The goal of the paper is to provide a review of the data mining applications for the purpose of corporate information security, and intrusion detection in particular. Methods/approach: The review was conducted using the systematic analysis of the previously published papers on the usage of data mining in the field of corporate information security. Results: This paper demonstrates that the use of data mining applications is extremely useful and has a great importance for establishing corporate information security. Data mining applications are directly related to issues of intrusion detection and privacy protection. Conclusions: The most important fact that can be specified based on this study is that corporations can establish a sustainable and efficient data mining system that will ensure privacy and successful protection against unwanted intrusions.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


Author(s):  
Виолетта Богданова

The stages of a pedagogical experiment on teaching information security of future economists are presented from a cybernetic informational perspective. The experimental results were statistically processed, evaluated using the nonparametric Mann-Whitney test and the φ *criterion - Fisher's angular transformation.


2014 ◽  
Vol 631-632 ◽  
pp. 1053-1056
Author(s):  
Hui Xia

The paper addressed the issues of limited resource for data optimization for efficiency, reliability, scalability and security of data in distributed, cluster systems with huge datasets. The study’s experimental results predicted that the MapReduce tool developed improved data optimization. The system exhibits undesired speedup with smaller datasets, but reasonable speedup is achieved with a larger enough datasets that complements the number of computing nodes reducing the execution time by 30% as compared to normal data mining and processing. The MapReduce tool is able to handle data growth trendily, especially with larger number of computing nodes. Scaleup gracefully grows as data and number of computing nodes increases. Security of data is guaranteed at all computing nodes since data is replicated at various nodes on the cluster system hence reliable. Our implementation of the MapReduce runs on distributed cluster computing environment of a national education web portal and is highly scalable.


2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


Author(s):  
Yanbo J. Wang ◽  
Xinwei Zheng ◽  
Frans Coenen

An association rule (AR) is a common type of mined knowledge in data mining that describes an implicative co-occurring relationship between two sets of binary-valued transaction-database attributes, expressed in the form of an ? rule. A variation of ARs is the (WARs), which addresses the weighting issue in ARs. In this chapter, the authors introduce the concept of “one-sum” WAR and name such WARs as allocating patterns (ALPs). An algorithm is proposed to extract hidden and interesting ALPs from data. The authors further indicate that ALPs can be applied in portfolio management. Firstly by modelling a collection of investment portfolios as a one-sum weighted transaction- database that contains hidden ALPs. Secondly the authors show that ALPs, mined from the given portfolio-data, can be applied to guide future investment activities. The experimental results show good performance that demonstrates the effectiveness of using ALPs in the proposed application.


Sign in / Sign up

Export Citation Format

Share Document