Efficient Indexing RDF Query Algorithm for Big Data

2013 ◽  
Vol 441 ◽  
pp. 691-694
Author(s):  
Yi Qun Zeng ◽  
Jing Bin Wang

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.

Author(s):  
Chunqiong Wu ◽  
Bingwen Yan ◽  
Rongrui Yu ◽  
Zhangshu Huang ◽  
Baoqin Yu ◽  
...  

With the rapid development of the computer level, especially in recent years, “Internet +,” cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.


Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 148
Author(s):  
Anbang Yang ◽  
Jiangbo Qian ◽  
Huahui Chen ◽  
Yihong Dong

With the rapid development of modern society, generated data has increased exponentially. Finding required data from this huge data pool is an urgent problem that needs to be solved. Hashing technology is widely used in similarity searches of large-scale data. Among them, the ranking-based hashing algorithm has been widely studied due to its accuracy and speed regarding the search results. At present, most ranking-based hashing algorithms construct loss functions by comparing the rank consistency of data in Euclidean and Hamming spaces. However, most of them have high time complexity and long training times, meaning they cannot meet requirements. In order to solve these problems, this paper introduces a distributed Spark framework and implements the ranking-based hashing algorithm in a parallel environment on multiple machines. The experimental results show that the Spark-RLSH (Ranking Listwise Supervision Hashing) can greatly reduce the training time and improve the training efficiency compared with other ranking-based hashing algorithms.


2014 ◽  
Vol 989-994 ◽  
pp. 4594-4597
Author(s):  
Chun Zhi Xing

With the development of Internet, various Internet-based large-scale data are facing increasing competition. With the hope of satisfying the need of data query, it is necessary to use data mining and distributed processing. As a consequence, this paper proposes a large-scale data mining and distributed processing method based on decision tree algorithm.


2021 ◽  
Vol 14 (1) ◽  
pp. 19
Author(s):  
Zineddine Kouahla ◽  
Ala-Eddine Benrazek ◽  
Mohamed Amine Ferrag ◽  
Brahim Farou ◽  
Hamid Seridi ◽  
...  

The past decade has been characterized by the growing volumes of data due to the widespread use of the Internet of Things (IoT) applications, which introduced many challenges for efficient data storage and management. Thus, the efficient indexing and searching of large data collections is a very topical and urgent issue. Such solutions can provide users with valuable information about IoT data. However, efficient retrieval and management of such information in terms of index size and search time require optimization of indexing schemes which is rather difficult to implement. The purpose of this paper is to examine and review existing indexing techniques for large-scale data. A taxonomy of indexing techniques is proposed to enable researchers to understand and select the techniques that will serve as a basis for designing a new indexing scheme. The real-world applications of the existing indexing techniques in different areas, such as health, business, scientific experiments, and social networks, are presented. Open problems and research challenges, e.g., privacy and large-scale data mining, are also discussed.


Author(s):  
Anisa Anisa ◽  
Mesran Mesran

Data mining is mining or discovery information to the process of looking for patterns or information that contains the search trends in a number of very large data in taking decisions on the future.In determining the patterns of classification techniques garnered record (Training set). The class attribute, which is a decision tree with method C 4.5 builds upon an algorithm of induction can be minimised.By utilizing data jobs graduates expected to generate information about interest & talent, work with benefit from graduate quisioner alumni. A pattern of work that sought from large-scale data and analyzed by various algorithms to compute the C 4.5 can do that work based on the pattern of investigation patterns that affect so that it found the rules are interconnected that can result from the results of the classification of objects of different classes or categories of attributes that influence to shape the patterns of work. The application used is software that used Tanagra data mining for academic and research purposes.That contains data mining method explored starting from the data analysis, and classification data mining.Keywords: analysis, Data Mining, method C 4.5, Tanagra, patterns of work


2020 ◽  
Vol 10 (5) ◽  
pp. 314
Author(s):  
Jingbin Yuan ◽  
Jing Zhang ◽  
Lijun Shen ◽  
Dandan Zhang ◽  
Wenhuan Yu ◽  
...  

Recently, with the rapid development of electron microscopy (EM) technology and the increasing demand of neuron circuit reconstruction, the scale of reconstruction data grows significantly. This brings many challenges, one of which is how to effectively manage large-scale data so that researchers can mine valuable information. For this purpose, we developed a data management module equipped with two parts, a storage and retrieval module on the server-side and an image cache module on the client-side. On the server-side, Hadoop and HBase are introduced to resolve massive data storage and retrieval. The pyramid model is adopted to store electron microscope images, which represent multiresolution data of the image. A block storage method is proposed to store volume segmentation results. We design a spatial location-based retrieval method for fast obtaining images and segments by layers rapidly, which achieves a constant time complexity. On the client-side, a three-level image cache module is designed to reduce latency when acquiring data. Through theoretical analysis and practical tests, our tool shows excellent real-time performance when handling large-scale data. Additionally, the server-side can be used as a backend of other similar software or a public database to manage shared datasets, showing strong scalability.


2021 ◽  
pp. 1-7
Author(s):  
Emmanuel Jesse Amadosi

With rapid development in technology, the built industry’s capacity to generate large-scale data is not in doubt. This trend of data upsurge labelled “Big Data” is currently being used to seek intelligent solutions in many industries including construction. As a result of this, the appeal to embrace Big Data Analytics has also gained wide advocacy globally. However, the general knowledge of Nigeria’s built environment professionals on Big Data Analytics is still limited and this gap continues to account for the slow pace of adoption of digital technologies like Big Data Analytics and the value it projects. This study set out to assess the level of awareness and knowledge of professionals within the Nigerian built environment with a view to promoting the adoption of Big Data Analytics for improved productivity. To achieve this aim, a structured questionnaire survey was carried out among a total of 283 professionals drawn from 9 disciplines within the built environment in the Federal Capital Territory, Abuja. The findings revealed that: a) a low knowledge level of Big Data exists among professionals, b) knowledge among professional and the level of Big Data Analytics application have strong relationship c) professional are interested in knowing more about the Big Data concept and how Big Data Analytics can be leveraged upon. The study, therefore recommends an urgent paradigm shift towards digitisation to fully embrace and adopt Big Data Analytics and enjoin stakeholders to promote collaborative schemes among practice-based professionals and the academia in seeking intelligent and smart solutions to construction-related problems.


Geosciences ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 251
Author(s):  
Zhen Liu ◽  
Jin Luo ◽  
Xiangdong Wang ◽  
Weihua Ming ◽  
Cuiying Zhou

Pinch-outs refers to the gradual thinning of the thickness of the sedimentary layer laterally until there is no deposition and are a major topic of modern research on the automated drawing of geological profiles. The rapid development of smart geological systems imposed an urgent need for high-speed, accurate methods to plot pinch-outs. However, because of their complexity, excessive number of branch paths, low rendering speed, and poor reliability in the case of large-scale data, the existing pinch-out drawing methods are inadequate and cannot satisfy the modeling needs of large-scale geological projects. To resolve these problems, based on unified stratigraphic sequences, this paper proposes a unique path method for drawing pinch-out profiles by converting the principle of plotting of pinch-outs into controlling the appearance of stratigraphic boundaries, and a high-speed and reliable method for drawing pinch-out in digital profiles is also proposed. The proposed method is successfully applied to drawing geological profiles for an urban geological project in East China, and greatly reduces the complexity of the method without reducing the drawing accuracy. Compared with those of other methods, the speed and reliability are significantly improved. Therefore, the unique path method for drawing pinch-out profiles based on a unified stratigraphic sequence proposed in the writers’ previous paper effectively avoids the excessive branch paths, slow speed, and insufficient reliability of the existing methods and provides effective and reliable support for the rapid drawing of profiles in smart geological systems.


2014 ◽  
Vol 10 (3) ◽  
pp. 19-35 ◽  
Author(s):  
K. Amshakala ◽  
R. Nedunchezhian ◽  
M. Rajalakshmi

Over the last few years, data are generated in large volume at a faster rate and there has been a remarkable growth in the need for large scale data processing systems. As data grows larger in size, data quality is compromised. Functional dependencies representing semantic constraints in data are important for data quality assessment. Executing functional dependency discovery algorithms on a single computer is hard and laborious with large data sets. MapReduce provides an enabling technology for large scale data processing. The open-source Hadoop implementation of MapReduce has provided researchers a powerful tool for tackling large-data problems in a distributed manner. The objective of this study is to extract functional dependencies between attributes from large datasets using MapReduce programming model. Attribute entropy is used to measure the inter attribute correlations, and exploited to discover functional dependencies hidden in the data.


Sign in / Sign up

Export Citation Format

Share Document