A Ranking-Based Hashing Algorithm Based on the Distributed Spark Platform

Anbang Yang; Jiangbo Qian; Huahui Chen; Yihong Dong

doi:10.3390/info11030148

Efficient Indexing RDF Query Algorithm for Big Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.691 ◽

2013 ◽

Vol 441 ◽

pp. 691-694

Author(s):

Yi Qun Zeng ◽

Jing Bin Wang

Keyword(s):

Large Scale ◽

Rapid Development ◽

Large Data ◽

Index Structure ◽

Data Query ◽

Large Scale Data ◽

Tree Index ◽

Rdf Data ◽

Query Algorithm ◽

Scale Data

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.

Download Full-text

On the Effectiveness of Hybrid Canopy with Hoeffding Adaptive Naive Bayes Trees

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2017040102 ◽

2017 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Mrutyunjaya Panda

Keyword(s):

Big Data ◽

Clustering Analysis ◽

Large Scale ◽

Data Sets ◽

Recent Past ◽

Large Scale Data ◽

Huge Data ◽

With Memory ◽

Memory Constraints ◽

Scale Data

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.

Download Full-text

Multi-omics investigations within the Phylum Mollusca, Class Gastropoda: from ecological application to breakthrough phylogenomic studies

Briefings in Functional Genomics ◽

10.1093/bfgp/elz017 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anne H Klein ◽

Kaylene R Ballard ◽

Kenneth B Storey ◽

Cherie A Motti ◽

Min Zhao ◽

...

Keyword(s):

Large Scale ◽

Environmental Stressors ◽

Data Organization ◽

Large Scale Data ◽

Huge Data ◽

The Future ◽

Ecological Application ◽

The Impact ◽

Generation Sequencing ◽

Scale Data

Abstract Gastropods are the largest and most diverse class of mollusc and include species that are well studied within the areas of taxonomy, aquaculture, biomineralization, ecology, microbiome and health. Gastropod research has been expanding since the mid-2000s, largely due to large-scale data integration from next-generation sequencing and mass spectrometry in which transcripts, proteins and metabolites can be readily explored systematically. Correspondingly, the huge data added a great deal of complexity for data organization, visualization and interpretation. Here, we reviewed the recent advances involving gastropod omics (‘gastropodomics’) research from hundreds of publications and online genomics databases. By summarizing the current publicly available data, we present an insight for the design of useful data integrating tools and strategies for comparative omics studies in the future. Additionally, we discuss the future of omics applications in aquaculture, natural pharmaceutical biodiscovery and pest management, as well as to monitor the impact of environmental stressors.

Download Full-text

Improvement of K-Means Algorithm for Accelerated Big Data Clustering

International Journal of Information Technologies and Systems Approach ◽

10.4018/ijitsa.2021070107 ◽

2021 ◽

Vol 14 (2) ◽

pp. 99-119

Author(s):

Chunqiong Wu ◽

Bingwen Yan ◽

Rongrui Yu ◽

Zhangshu Huang ◽

Baoqin Yu ◽

...

Keyword(s):

Data Mining ◽

Data Clustering ◽

Large Scale ◽

Rapid Development ◽

Large Data ◽

Data Retrieval ◽

Research Directions ◽

Large Scale Data ◽

Rich Information ◽

Scale Data

With the rapid development of the computer level, especially in recent years, “Internet +,” cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.

Download Full-text

Online celebrity discourses on Facebook

The Journal of Fandom Studies ◽

10.1386/jfs_00026_1 ◽

2020 ◽

Vol 8 (3) ◽

pp. 305-319 ◽

Cited By ~ 1

Author(s):

Dániel Hegedűs

Keyword(s):

Social Media ◽

Large Scale ◽

Modern Society ◽

Large Scale Data ◽

Cognitive Patterns ◽

Social Media Platforms ◽

Altered Form ◽

The Individual ◽

Facebook Pages ◽

Scale Data

The web 2.0 phenomenon and social media – without question – have reshaped our everyday experiences. These changes that they have generated affect how we consume, communicate and present ourselves, just to name a few aspects of life, and moreover, opened up new perspectives for sociology. Though many social practices persist in a somewhat altered form, brand new types of entities have emerged on different social media platforms: one of them is the video blogger. These actors have gained great visibility through so-called micro-celebrity practices and have become potential large-scale distributors of ideas, values and knowledge. Celebrities, in this case micro-celebrities (video bloggers), may disseminate such cognitive patterns through their constructed discourse which is objectified in the online space through a peculiar digital face (a social media profile) where fans can react, share and comment according to the affordances of the digital space. Most importantly, all of these interactions are accessible for scholars to examine the fan and celebrity practices of our era. This research attempts to reconstruct these discursive interactions on the Facebook pages of ten top Hungarian video bloggers. All findings are based on a large-scale data collection using the Netvizz application. As part of the interpretation of the results, a further consideration was that celebrity discourses may be a sort of disciplinary force in (post)modern society, which normalizes the individual to some extent by providing adequate schemas of attitude, mentality and ways of consumption.

Download Full-text

Parallel clustering method for non-disjoint partitioning of large-scale data based on spark framework

2016 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2016.7840708 ◽

2016 ◽

Cited By ~ 2

Author(s):

Abir Zayani ◽

Chiheb-Eddine Ben N'Cir ◽

Nadia Essoussi

Keyword(s):

Large Scale ◽

Clustering Method ◽

Large Scale Data ◽

Spark Framework ◽

Parallel Clustering ◽

Scale Data

Download Full-text

Massive Data Management and Sharing Module for Connectome Reconstruction

Brain Sciences ◽

10.3390/brainsci10050314 ◽

2020 ◽

Vol 10 (5) ◽

pp. 314

Author(s):

Jingbin Yuan ◽

Jing Zhang ◽

Lijun Shen ◽

Dandan Zhang ◽

Wenhuan Yu ◽

...

Keyword(s):

Data Management ◽

Data Storage ◽

Large Scale ◽

Rapid Development ◽

Massive Data ◽

Storage And Retrieval ◽

Server Side ◽

Large Scale Data ◽

Client Side ◽

Scale Data

Recently, with the rapid development of electron microscopy (EM) technology and the increasing demand of neuron circuit reconstruction, the scale of reconstruction data grows significantly. This brings many challenges, one of which is how to effectively manage large-scale data so that researchers can mine valuable information. For this purpose, we developed a data management module equipped with two parts, a storage and retrieval module on the server-side and an image cache module on the client-side. On the server-side, Hadoop and HBase are introduced to resolve massive data storage and retrieval. The pyramid model is adopted to store electron microscope images, which represent multiresolution data of the image. A block storage method is proposed to store volume segmentation results. We design a spatial location-based retrieval method for fast obtaining images and segments by layers rapidly, which achieves a constant time complexity. On the client-side, a three-level image cache module is designed to reduce latency when acquiring data. Through theoretical analysis and practical tests, our tool shows excellent real-time performance when handling large-scale data. Additionally, the server-side can be used as a backend of other similar software or a public database to manage shared datasets, showing strong scalability.

Download Full-text

Assessment of the Awareness of Nigerian Professionals in the Built Environment on the Big Data analytics (BDA) Applications in the Construction Industry.

10.36265/arejoen.2021.010101 ◽

2021 ◽

pp. 1-7

Author(s):

Emmanuel Jesse Amadosi

Keyword(s):

Big Data ◽

Built Environment ◽

Data Analytics ◽

Large Scale ◽

Rapid Development ◽

Big Data Analytics ◽

Strong Relationship ◽

Large Scale Data ◽

Scale Data ◽

Structured Questionnaire

With rapid development in technology, the built industry’s capacity to generate large-scale data is not in doubt. This trend of data upsurge labelled “Big Data” is currently being used to seek intelligent solutions in many industries including construction. As a result of this, the appeal to embrace Big Data Analytics has also gained wide advocacy globally. However, the general knowledge of Nigeria’s built environment professionals on Big Data Analytics is still limited and this gap continues to account for the slow pace of adoption of digital technologies like Big Data Analytics and the value it projects. This study set out to assess the level of awareness and knowledge of professionals within the Nigerian built environment with a view to promoting the adoption of Big Data Analytics for improved productivity. To achieve this aim, a structured questionnaire survey was carried out among a total of 283 professionals drawn from 9 disciplines within the built environment in the Federal Capital Territory, Abuja. The findings revealed that: a) a low knowledge level of Big Data exists among professionals, b) knowledge among professional and the level of Big Data Analytics application have strong relationship c) professional are interested in knowing more about the Big Data concept and how Big Data Analytics can be leveraged upon. The study, therefore recommends an urgent paradigm shift towards digitisation to fully embrace and adopt Big Data Analytics and enjoin stakeholders to promote collaborative schemes among practice-based professionals and the academia in seeking intelligent and smart solutions to construction-related problems.

Download Full-text

Research on Large Scale Data Set Processing Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.216.738 ◽

2011 ◽

Vol 216 ◽

pp. 738-741

Author(s):

Yue E Chen ◽

Bai Li Ren

Keyword(s):

Large Scale ◽

Support Vector ◽

Simulation Experiments ◽

Data Set ◽

Training Time ◽

Training Support ◽

Large Scale Data ◽

Vector Machines ◽

Speech Classification ◽

Scale Data

SVM has got very good results in the area of solving the classification, regression and density estimation problem in machine learning, has been successfully applied to practical problems of text recognition, speech classification, but the training time is too long is a big drawback. A new reduction strategy is proposed for training support vector machines. This method is fast in convergence without learning machine’s generalization performance, the results of simulation experiments show the feasibility and effectiveness of that method through this method.

Download Full-text

On the Effectiveness of Hybrid Canopy With Hoeffding Adaptive Naive Bayes Trees

Web Services ◽

10.4018/978-1-5225-7501-6.ch043 ◽

2019 ◽

pp. 788-802

Author(s):

Mrutyunjaya Panda

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Data Sets ◽

Large Scale Data ◽

Huge Data ◽

With Memory ◽

Memory Constraints ◽

Scale Data

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.

Download Full-text