THE ALGORITHM FOR MARKING VERTEXES. COMPARISON OF ALGORITHMS FOR SEARCHING FOR CONNECTIVITY COMPONENTS IN A GRAPH USING THE EXAMPLE OF COMBINING WALLETS IN A BITCOIN DATABASE

The article is devoted to comparing the efficiency of algorithms for processing Bitcoin blockchain transaction database. The article describes the algorithm of vertex marking developed by the group. Based on the comparison of this and other algorithms, it is expected to identify the most effective algorithm for clustering addresses based on belonging to a single user. The Bitcoin database contains information about millions of financial transactions. Even though information about transactions is anonymous, there are methods for combining user addresses into wallets. In this article, we study algorithms of searching connectivity components, which are based on one of the methods of combining wallets based on the heuristic feature of the «total waste» of one user. The emphasis is placed on the practical aspects of implementation – hardware limitations in processing big data sets, as well as the choice of a solution for many graph connectivity components – the maximum connected set of graph vertices, in other words, a set of nonempty vertex sets and a set of vertex pairs.

Download Full-text

Construction of 3-D Terrain Models from BIG Data Sets

10.21236/ada607383 ◽

2014 ◽

Author(s):

Pankaj K. Agarwal ◽

Thomas Moelhave

Keyword(s):

Big Data ◽

Data Sets ◽

Terrain Models

Download Full-text

Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190822095422 ◽

2020 ◽

Vol 13 (4) ◽

pp. 790-797

Author(s):

Gurjit Singh Bhathal ◽

Amardeep Singh Dhiman

Keyword(s):

Big Data ◽

Data Security ◽

Data Sets ◽

Security Framework ◽

Hadoop Distributed File System ◽

Current Scenario ◽

Hadoop Cluster ◽

Ciphertext Policy ◽

In Transit ◽

Hadoop Framework

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.

Download Full-text

DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

Journal Of Big Data ◽

10.1186/s40537-021-00437-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

Tuning Active Sampling Techniques for Evolutionary Learner from Big Data Sets: Review and Discussion

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) ◽

10.1109/uic-atc-scalcom-cbdcom-iop-smartworld.2016.0184 ◽

2016 ◽

Author(s):

Sana Ben Hamida ◽

Marta Rukoz

Keyword(s):

Big Data ◽

Sampling Techniques ◽

Data Sets ◽

Active Sampling

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

CLASSROOM VIGNETTESBias in algorithms and the misuse of big data sets

ACM Inroads ◽

10.1145/3392050 ◽

2020 ◽

Vol 11 (2) ◽

pp. 12-17

Author(s):

Henry M. Walker

Keyword(s):

Big Data ◽

Data Sets

Download Full-text

Model of predicting customer behavior based on big data analysis technologies

Bulletin of the National Technical University KhPI A series of Information and Modeling ◽

10.20998/2411-0558.2021.02.06 ◽

2021 ◽

Vol 1 (2 (6)) ◽

Author(s):

Anastasiia Ivanitska ◽

Dmytro Ivanov ◽

Ludmila Zubik

Keyword(s):

Machine Learning ◽

Big Data ◽

Customer Behavior ◽

Learning Technologies ◽

User Preferences ◽

Data Sets ◽

Learning Technology ◽

Behavior Prediction ◽

Network Information ◽

Selection Of

The analysis of the available methods and models of formation of recommendations for the potential buyer in network information systems for the purpose of development of effective modules of selection of advertising is executed. The effectiveness of the use of machine learning technologies for the analysis of user preferences based on the processing of data on purchases made by users with a similar profile is substantiated. A model of recommendation formation based on machine learning technology is proposed, its work on test data sets is tested and the adequacy of the RMSE model is assessed. Keywords: behavior prediction; advertising based on similarity; collaborative filtering; matrix factorization; big data; machine learning

Download Full-text

A multilevel approach to big data analysis using analytic tools and actor network theory

SA Journal of Information Management ◽

10.4102/sajim.v20i1.914 ◽

2018 ◽

Vol 20 (1) ◽

Cited By ~ 4

Author(s):

Tiko Iyamu

Keyword(s):

Big Data ◽

Data Analysis ◽

Network Theory ◽

Data Analytics ◽

Big Data Analytics ◽

Actor Network Theory ◽

Big Data Analysis ◽

Data Sets ◽

Multilevel Approach ◽

Actor Network

Background: Over the years, big data analytics has been statically carried out in a programmed way, which does not allow for translation of data sets from a subjective perspective. This approach affects an understanding of why and how data sets manifest themselves into various forms in the way that they do. This has a negative impact on the accuracy, redundancy and usefulness of data sets, which in turn affects the value of operations and the competitive effectiveness of an organisation. Also, the current single approach lacks a detailed examination of data sets, which big data deserve in order to improve purposefulness and usefulness.Objective: The purpose of this study was to propose a multilevel approach to big data analysis. This includes examining how a sociotechnical theory, the actor network theory (ANT), can be complementarily used with analytic tools for big data analysis.Method: In the study, the qualitative methods were employed from the interpretivist approach perspective.Results: From the findings, a framework that offers big data analytics at two levels, micro- (strategic) and macro- (operational) levels, was developed. Based on the framework, a model was developed, which can be used to guide the analysis of heterogeneous data sets that exist within networks.Conclusion: The multilevel approach ensures a fully detailed analysis, which is intended to increase accuracy, reduce redundancy and put the manipulation and manifestation of data sets into perspectives for improved organisations’ competitiveness.

Download Full-text