Efficient Indexing RDF Query Algorithm for Big Data

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.

Download Full-text

Improvement of K-Means Algorithm for Accelerated Big Data Clustering

International Journal of Information Technologies and Systems Approach ◽

10.4018/ijitsa.2021070107 ◽

2021 ◽

Vol 14 (2) ◽

pp. 99-119

Author(s):

Chunqiong Wu ◽

Bingwen Yan ◽

Rongrui Yu ◽

Zhangshu Huang ◽

Baoqin Yu ◽

...

Keyword(s):

Data Mining ◽

Data Clustering ◽

Large Scale ◽

Rapid Development ◽

Large Data ◽

Data Retrieval ◽

Research Directions ◽

Large Scale Data ◽

Rich Information ◽

Scale Data

With the rapid development of the computer level, especially in recent years, “Internet +,” cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.

Download Full-text

A Ranking-Based Hashing Algorithm Based on the Distributed Spark Platform

Information ◽

10.3390/info11030148 ◽

2020 ◽

Vol 11 (3) ◽

pp. 148

Author(s):

Anbang Yang ◽

Jiangbo Qian ◽

Huahui Chen ◽

Yihong Dong

Keyword(s):

Large Scale ◽

Rapid Development ◽

Modern Society ◽

Training Time ◽

Large Scale Data ◽

Huge Data ◽

Hashing Algorithm ◽

Similarity Searches ◽

Spark Framework ◽

Scale Data

With the rapid development of modern society, generated data has increased exponentially. Finding required data from this huge data pool is an urgent problem that needs to be solved. Hashing technology is widely used in similarity searches of large-scale data. Among them, the ranking-based hashing algorithm has been widely studied due to its accuracy and speed regarding the search results. At present, most ranking-based hashing algorithms construct loss functions by comparing the rank consistency of data in Euclidean and Hamming spaces. However, most of them have high time complexity and long training times, meaning they cannot meet requirements. In order to solve these problems, this paper introduces a distributed Spark framework and implements the ranking-based hashing algorithm in a parallel environment on multiple machines. The experimental results show that the Spark-RLSH (Ranking Listwise Supervision Hashing) can greatly reduce the training time and improve the training efficiency compared with other ranking-based hashing algorithms.

Download Full-text

Large-Scale Data Mining and Distributed Processing in Big Data Internet

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.4594 ◽

2014 ◽

Vol 989-994 ◽

pp. 4594-4597

Author(s):

Chun Zhi Xing

Keyword(s):

Data Mining ◽

Big Data ◽

Decision Tree ◽

Large Scale ◽

Distributed Processing ◽

Processing Method ◽

Decision Tree Algorithm ◽

Data Query ◽

Large Scale Data ◽

Scale Data

With the development of Internet, various Internet-based large-scale data are facing increasing competition. With the hope of satisfying the need of data query, it is necessary to use data mining and distributed processing. As a consequence, this paper proposes a large-scale data mining and distributed processing method based on decision tree algorithm.

Download Full-text

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

Future Internet ◽

10.3390/fi14010019 ◽

2021 ◽

Vol 14 (1) ◽

pp. 19

Author(s):

Zineddine Kouahla ◽

Ala-Eddine Benrazek ◽

Mohamed Amine Ferrag ◽

Brahim Farou ◽

Hamid Seridi ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Search Time ◽

Large Data ◽

Open Problems ◽

Large Scale Data ◽

Indexing Techniques ◽

Efficient Retrieval ◽

Data Collections ◽

Scale Data

The past decade has been characterized by the growing volumes of data due to the widespread use of the Internet of Things (IoT) applications, which introduced many challenges for efficient data storage and management. Thus, the efficient indexing and searching of large data collections is a very topical and urgent issue. Such solutions can provide users with valuable information about IoT data. However, efficient retrieval and management of such information in terms of index size and search time require optimization of indexing schemes which is rather difficult to implement. The purpose of this paper is to examine and review existing indexing techniques for large-scale data. A taxonomy of indexing techniques is proposed to enable researchers to understand and select the techniques that will serve as a basis for designing a new indexing scheme. The real-world applications of the existing indexing techniques in different areas, such as health, business, scientific experiments, and social networks, are presented. Open problems and research challenges, e.g., privacy and large-scale data mining, are also discussed.

Download Full-text

ANALISA POLA PEKERJAAN LULUSAN STMIK BUDI DARMA MENERAPKAN METODE C4.5

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.974 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Anisa Anisa ◽

Mesran Mesran

Keyword(s):

Data Mining ◽

Large Scale ◽

Large Data ◽

Analysis Data ◽

Mining Method ◽

Training Set ◽

Data Mining Method ◽

Large Scale Data ◽

Scale Data

Data mining is mining or discovery information to the process of looking for patterns or information that contains the search trends in a number of very large data in taking decisions on the future.In determining the patterns of classification techniques garnered record (Training set). The class attribute, which is a decision tree with method C 4.5 builds upon an algorithm of induction can be minimised.By utilizing data jobs graduates expected to generate information about interest & talent, work with benefit from graduate quisioner alumni. A pattern of work that sought from large-scale data and analyzed by various algorithms to compute the C 4.5 can do that work based on the pattern of investigation patterns that affect so that it found the rules are interconnected that can result from the results of the classification of objects of different classes or categories of attributes that influence to shape the patterns of work. The application used is software that used Tanagra data mining for academic and research purposes.That contains data mining method explored starting from the data analysis, and classification data mining.Keywords: analysis, Data Mining, method C 4.5, Tanagra, patterns of work

Download Full-text

Massive Data Management and Sharing Module for Connectome Reconstruction

Brain Sciences ◽

10.3390/brainsci10050314 ◽

2020 ◽

Vol 10 (5) ◽

pp. 314

Author(s):

Jingbin Yuan ◽

Jing Zhang ◽

Lijun Shen ◽

Dandan Zhang ◽

Wenhuan Yu ◽

...

Keyword(s):

Data Management ◽

Data Storage ◽

Large Scale ◽

Rapid Development ◽

Massive Data ◽

Storage And Retrieval ◽

Server Side ◽

Large Scale Data ◽

Client Side ◽

Scale Data

Recently, with the rapid development of electron microscopy (EM) technology and the increasing demand of neuron circuit reconstruction, the scale of reconstruction data grows significantly. This brings many challenges, one of which is how to effectively manage large-scale data so that researchers can mine valuable information. For this purpose, we developed a data management module equipped with two parts, a storage and retrieval module on the server-side and an image cache module on the client-side. On the server-side, Hadoop and HBase are introduced to resolve massive data storage and retrieval. The pyramid model is adopted to store electron microscope images, which represent multiresolution data of the image. A block storage method is proposed to store volume segmentation results. We design a spatial location-based retrieval method for fast obtaining images and segments by layers rapidly, which achieves a constant time complexity. On the client-side, a three-level image cache module is designed to reduce latency when acquiring data. Through theoretical analysis and practical tests, our tool shows excellent real-time performance when handling large-scale data. Additionally, the server-side can be used as a backend of other similar software or a public database to manage shared datasets, showing strong scalability.

Download Full-text

Research based on large-scale data query with mapreduce technology in cloud computing

2012 International Conference on Wavelet Active Media Technology and Information Processing (ICWAMTIP) ◽

10.1109/icwamtip.2012.6413484 ◽

2012 ◽

Author(s):

Feiping Wang ◽

Xiaofeng Gu

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Data Query ◽

Large Scale Data ◽

Scale Data

Download Full-text

Assessment of the Awareness of Nigerian Professionals in the Built Environment on the Big Data analytics (BDA) Applications in the Construction Industry.

10.36265/arejoen.2021.010101 ◽

2021 ◽

pp. 1-7

Author(s):

Emmanuel Jesse Amadosi

Keyword(s):

Big Data ◽

Built Environment ◽

Data Analytics ◽

Large Scale ◽

Rapid Development ◽

Big Data Analytics ◽

Strong Relationship ◽

Large Scale Data ◽

Scale Data ◽

Structured Questionnaire

With rapid development in technology, the built industry’s capacity to generate large-scale data is not in doubt. This trend of data upsurge labelled “Big Data” is currently being used to seek intelligent solutions in many industries including construction. As a result of this, the appeal to embrace Big Data Analytics has also gained wide advocacy globally. However, the general knowledge of Nigeria’s built environment professionals on Big Data Analytics is still limited and this gap continues to account for the slow pace of adoption of digital technologies like Big Data Analytics and the value it projects. This study set out to assess the level of awareness and knowledge of professionals within the Nigerian built environment with a view to promoting the adoption of Big Data Analytics for improved productivity. To achieve this aim, a structured questionnaire survey was carried out among a total of 283 professionals drawn from 9 disciplines within the built environment in the Federal Capital Territory, Abuja. The findings revealed that: a) a low knowledge level of Big Data exists among professionals, b) knowledge among professional and the level of Big Data Analytics application have strong relationship c) professional are interested in knowing more about the Big Data concept and how Big Data Analytics can be leveraged upon. The study, therefore recommends an urgent paradigm shift towards digitisation to fully embrace and adopt Big Data Analytics and enjoin stakeholders to promote collaborative schemes among practice-based professionals and the academia in seeking intelligent and smart solutions to construction-related problems.

Download Full-text

Unique Path Method of the Pinch-Out Profile Based on Unified Stratigraphic Sequence

Geosciences ◽

10.3390/geosciences11060251 ◽

2021 ◽

Vol 11 (6) ◽

pp. 251

Author(s):

Zhen Liu ◽

Jin Luo ◽

Xiangdong Wang ◽

Weihua Ming ◽

Cuiying Zhou

Keyword(s):

High Speed ◽

Large Scale ◽

Rapid Development ◽

Slow Speed ◽

Stratigraphic Sequence ◽

Large Scale Data ◽

Unique Path ◽

Stratigraphic Sequences ◽

Scale Data ◽

Excessive Number

Pinch-outs refers to the gradual thinning of the thickness of the sedimentary layer laterally until there is no deposition and are a major topic of modern research on the automated drawing of geological profiles. The rapid development of smart geological systems imposed an urgent need for high-speed, accurate methods to plot pinch-outs. However, because of their complexity, excessive number of branch paths, low rendering speed, and poor reliability in the case of large-scale data, the existing pinch-out drawing methods are inadequate and cannot satisfy the modeling needs of large-scale geological projects. To resolve these problems, based on unified stratigraphic sequences, this paper proposes a unique path method for drawing pinch-out profiles by converting the principle of plotting of pinch-outs into controlling the appearance of stratigraphic boundaries, and a high-speed and reliable method for drawing pinch-out in digital profiles is also proposed. The proposed method is successfully applied to drawing geological profiles for an urban geological project in East China, and greatly reduces the complexity of the method without reducing the drawing accuracy. Compared with those of other methods, the speed and reliability are significantly improved. Therefore, the unique path method for drawing pinch-out profiles based on a unified stratigraphic sequence proposed in the writers’ previous paper effectively avoids the excessive branch paths, slow speed, and insufficient reliability of the existing methods and provides effective and reliable support for the rapid drawing of profiles in smart geological systems.

Download Full-text

Extracting Functional Dependencies in Large Datasets Using MapReduce Model

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014070102 ◽

2014 ◽

Vol 10 (3) ◽

pp. 19-35 ◽

Cited By ~ 8

Author(s):

K. Amshakala ◽

R. Nedunchezhian ◽

M. Rajalakshmi

Keyword(s):

Data Processing ◽

Data Quality ◽

Large Scale ◽

Programming Model ◽

Large Data ◽

Large Datasets ◽

Functional Dependencies ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data

Over the last few years, data are generated in large volume at a faster rate and there has been a remarkable growth in the need for large scale data processing systems. As data grows larger in size, data quality is compromised. Functional dependencies representing semantic constraints in data are important for data quality assessment. Executing functional dependency discovery algorithms on a single computer is hard and laborious with large data sets. MapReduce provides an enabling technology for large scale data processing. The open-source Hadoop implementation of MapReduce has provided researchers a powerful tool for tackling large-data problems in a distributed manner. The objective of this study is to extract functional dependencies between attributes from large datasets using MapReduce programming model. Attribute entropy is used to measure the inter attribute correlations, and exploited to discover functional dependencies hidden in the data.

Download Full-text