Parallel algorithm research of graph search and depth learning based on data mining

Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.

Download Full-text

The parallel algorithm of Clique and it's application on Data Mining Grid system

2009 IEEE International Conference on Network Infrastructure and Digital Content ◽

10.1109/icnidc.2009.5361004 ◽

2009 ◽

Author(s):

Ping Chen ◽

Zhen Liu ◽

Xiuquan Qiao ◽

Xiaoping Tian

Keyword(s):

Data Mining ◽

Parallel Algorithm ◽

Grid System

Download Full-text

Parallel Computing for Mining Association Rules in Distributed P2P Networks

E-Activity and Intelligent Web Construction - Advances in Web Technologies and Engineering ◽

10.4018/978-1-61520-871-5.ch005 ◽

2011 ◽

pp. 47-62

Author(s):

Huiwei Guan

Keyword(s):

Data Mining ◽

Parallel Computing ◽

Distributed Computing ◽

Parallel Algorithm ◽

Association Rules ◽

Distributed Databases ◽

Computing System ◽

Computing Systems ◽

P2p Computing ◽

P2p Systems

Distributed computing and Peer-to-Peer (P2P) systems have emerged as an active research field that combines techniques which cover networks, distributed computing, distributed database, and the various distributed applications. Distributed Computing and P2P systems realize information systems that scale to voluminous information on very large numbers of participating nodes. Data mining on large distributed databases is a very important research area. Recently, most work for mining association rules focused on a single machine or client-server network model. However, this traditional approach does not satisfy the requirements from the large distributed databases and applications in a P2P computing system. Two important challenges are raised, one is how to implement data mining for large distributed databases in P2P computing systems, and the other is how to develop parallel data mining algorithms and tools for the distributed P2P computing systems to improve the efficiency. In this chapter, a parallel association rule mining approach in a P2P computing system is designed and implemented, which satisfies the distribution of the P2P computing system well and makes parallel computing become true. The performance and comparison of the parallel algorithm with the sequential algorithm is analyzed and evaluated, which presents the parallel algorithm features consistent implementation, higher performance, and fine scalable ability.

Download Full-text

Parallel Algorithm for Spatial Data Mining Using CUDA

JOURNAL OF ADVANCED INFORMATION TECHNOLOGY AND CONVERGENCE ◽

10.14801/jaitc.2019.9.2.89 ◽

2019 ◽

Vol 9 (2) ◽

pp. 89-97

Author(s):

Byoung-Woo Oh

Keyword(s):

Data Mining ◽

Parallel Algorithm ◽

Spatial Data ◽

Spatial Data Mining

Download Full-text

Implementation of Parallel Algorithm Technology for Time Series Data Mining

Journal of Physics Conference Series ◽

10.1088/1742-6596/2066/1/012043 ◽

2021 ◽

Vol 2066 (1) ◽

pp. 012043

Author(s):

Wei Wang ◽

Xiaohui Hu ◽

Mingye Wang ◽

Yao Du

Keyword(s):

Data Mining ◽

Time Series ◽

Data Analysis ◽

Parallel Algorithm ◽

Time Series Data ◽

Internet Technology ◽

Series Data ◽

Data Set ◽

Time Series Data Mining ◽

Analysis Tools

Abstract With the rapid development of computer technology, Internet technology and artificial intelligence technology, the amount of global data has exploded. However, the single-machine serial mode of traditional data mining cannot be directly transplanted to the cloud platform. Only by parallelizing and improving many classic data mining algorithms can the cloud computing platform and data mining be effectively combined. Therefore, it is of great significance to the research and implementation of parallel algorithm technology for time series data mining. The purpose of this paper is to study the research and implementation of parallel algorithm technology for time series data mining. This paper adopts the method of literature data, mathematical statistics, logic analysis and other research methods to study the parallel algorithm technology research and realization of time series data mining, mainly to make useful explorations of time series data mining and visualization technology. It embodies the design ideas of big data analysis tools, and finally reflects the power and market value of data analysis tools through the display of the platform. Research shows that running in the same data set and the same experimental environment, the improved parallel collaborative filtering algorithm ACF in this paper has higher time running efficiency than the parallel algorithm MCF based on the cooccurrence matrix, and in the case of larger data sets, the more obvious the time difference.

Download Full-text

Parallel Computing for Mining Association Rules in Distributed P2P Networks

Data Mining ◽

10.4018/978-1-4666-2455-9.ch006 ◽

2013 ◽

pp. 107-124

Author(s):

Huiwei Guan

Keyword(s):

Data Mining ◽

Parallel Computing ◽

Distributed Computing ◽

Parallel Algorithm ◽

Association Rules ◽

Distributed Databases ◽

Computing System ◽

Computing Systems ◽

P2p Computing ◽

P2p Systems

Distributed computing and Peer-to-Peer (P2P) systems have emerged as an active research field that combines techniques which cover networks, distributed computing, distributed database, and the various distributed applications. Distributed Computing and P2P systems realize information systems that scale to voluminous information on very large numbers of participating nodes. Data mining on large distributed databases is a very important research area. Recently, most work for mining association rules focused on a single machine or client-server network model. However, this traditional approach does not satisfy the requirements from the large distributed databases and applications in a P2P computing system. Two important challenges are raised, one is how to implement data mining for large distributed databases in P2P computing systems, and the other is how to develop parallel data mining algorithms and tools for the distributed P2P computing systems to improve the efficiency. In this chapter, a parallel association rule mining approach in a P2P computing system is designed and implemented, which satisfies the distribution of the P2P computing system well and makes parallel computing become true. The performance and comparison of the parallel algorithm with the sequential algorithm is analyzed and evaluated, which presents the parallel algorithm features consistent implementation, higher performance, and fine scalable ability.

Download Full-text

Machine Learning and Data mining on the innovation of E-sports industry

International Journal of Education and Information Technologies ◽

10.46300/9109.2020.14.15 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Traditional Education ◽

Good Prediction ◽

Experimental Platform ◽

Sports Industry ◽

Advantages And Disadvantages ◽

Future Education ◽

Different Levels ◽

Depth Learning

AI technology brings many revolutionary innovation opportunities to the e-sports industry. With the help of data mining, we can analyze the advantages and disadvantages of competitors, and predict the trend of the situation in the future. With the help of the agent created by intensive in-depth learning, it can assist players of different levels to carry out routine training, so as to improve the overall activity of the game. With the help of AI's big data advantage, AI can assist E-sports teaching to regard E-sports specialty as an experimental platform for using cutting-edge technology to reform and innovate traditional education and provide forward-looking guidance for future education. This paper uses CNN, LSTM, and LSTM + CNN three model to predict the outcome of the game according to the heroes selected by both teams, and has achieved good prediction results.

Download Full-text