3A1-T03 Large-scale database using the Hadoop Distributed File System and RT-Middleware(RT Middleware and Open Systems)

2014 ◽  
Vol 2014 (0) ◽  
pp. _3A1-T03_1-_3A1-T03_2
Author(s):  
Isao HARA ◽  
Seisho IRIE ◽  
Mamoru SEKIYAMA ◽  
Tamio TANIKAWA
2021 ◽  
Vol 30 (1) ◽  
pp. 479-486
Author(s):  
Lingrui Bu ◽  
Hui Zhang ◽  
Haiyan Xing ◽  
Lijun Wu

Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.


2018 ◽  
Vol 3 (1) ◽  
pp. 49-60
Author(s):  
M. Elshayeb ◽  
◽  
Leelavathi Rajamanickam ◽  

Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. In order to analyse complex data and to identify patterns it is very important to securely store, manage, and share large amounts of complex data. In recent years an increasing of database size according to the various forms (text, images and videos), in huge volumes and with high velocity, the services issues that use internet and desires big data come to leading edge (data-intensive services), (HDFS) Apache’s Hadoop distributed file system is in progress as outstanding software component for cloud computing joint with integrated pieces such as MapReduce. GoogleMapReduce implemented an open source which is Hadoop, having a distributed file system, present to software programmers the perception of the map and reduce. The research shows the security approaches for Big Data Hadoop distributed file system and the best security solution, also this research will help business by big data visualization which will help in better data analysis. In today’s data-centric world, big-data processing and analytics have become critical to most enterprise and government applications.


2010 ◽  
Vol 30 (8) ◽  
pp. 2060-2065 ◽  
Author(s):  
Ning CAO ◽  
Zhong-hai WU ◽  
Hong-zhi LIU ◽  
Qi-xun ZHANG

2020 ◽  
Vol 1444 ◽  
pp. 012012
Author(s):  
Meisuchi Naisuty ◽  
Achmad Nizar Hidayanto ◽  
Nabila Clydea Harahap ◽  
Ahmad Rosyiq ◽  
Agus Suhanto ◽  
...  

2016 ◽  
pp. 1220-1243
Author(s):  
Ilias K. Savvas ◽  
Georgia N. Sofianidou ◽  
M-Tahar Kechadi

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.


Sign in / Sign up

Export Citation Format

Share Document