THE ALGORITHM FOR MARKING VERTEXES. COMPARISON OF ALGORITHMS FOR SEARCHING FOR CONNECTIVITY COMPONENTS IN A GRAPH USING THE EXAMPLE OF COMBINING WALLETS IN A BITCOIN DATABASE

2021 ◽  
Vol 1 (2) ◽  
pp. 91-98
Author(s):  
V. I. Glotov ◽  
◽  
D. M. Mikhailov ◽  
A. A. Yurov ◽  
M. I. Volkova ◽  
...  

The article is devoted to comparing the efficiency of algorithms for processing Bitcoin blockchain transaction database. The article describes the algorithm of vertex marking developed by the group. Based on the comparison of this and other algorithms, it is expected to identify the most effective algorithm for clustering addresses based on belonging to a single user. The Bitcoin database contains information about millions of financial transactions. Even though information about transactions is anonymous, there are methods for combining user addresses into wallets. In this article, we study algorithms of searching connectivity components, which are based on one of the methods of combining wallets based on the heuristic feature of the «total waste» of one user. The emphasis is placed on the practical aspects of implementation – hardware limitations in processing big data sets, as well as the choice of a solution for many graph connectivity components – the maximum connected set of graph vertices, in other words, a set of nonempty vertex sets and a set of vertex pairs.

2014 ◽  
Author(s):  
Pankaj K. Agarwal ◽  
Thomas Moelhave
Keyword(s):  
Big Data ◽  

2020 ◽  
Vol 13 (4) ◽  
pp. 790-797
Author(s):  
Gurjit Singh Bhathal ◽  
Amardeep Singh Dhiman

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


ACM Inroads ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 12-17
Author(s):  
Henry M. Walker
Keyword(s):  
Big Data ◽  

Author(s):  
Anastasiia Ivanitska ◽  
Dmytro Ivanov ◽  
Ludmila Zubik

The analysis of the available methods and models of formation of recommendations for the potential buyer in network information systems for the purpose of development of effective modules of selection of advertising is executed. The effectiveness of the use of machine learning technologies for the analysis of user preferences based on the processing of data on purchases made by users with a similar profile is substantiated. A model of recommendation formation based on machine learning technology is proposed, its work on test data sets is tested and the adequacy of the RMSE model is assessed. Keywords: behavior prediction; advertising based on similarity; collaborative filtering; matrix factorization; big data; machine learning


2018 ◽  
Vol 20 (1) ◽  
Author(s):  
Tiko Iyamu

Background: Over the years, big data analytics has been statically carried out in a programmed way, which does not allow for translation of data sets from a subjective perspective. This approach affects an understanding of why and how data sets manifest themselves into various forms in the way that they do. This has a negative impact on the accuracy, redundancy and usefulness of data sets, which in turn affects the value of operations and the competitive effectiveness of an organisation. Also, the current single approach lacks a detailed examination of data sets, which big data deserve in order to improve purposefulness and usefulness.Objective: The purpose of this study was to propose a multilevel approach to big data analysis. This includes examining how a sociotechnical theory, the actor network theory (ANT), can be complementarily used with analytic tools for big data analysis.Method: In the study, the qualitative methods were employed from the interpretivist approach perspective.Results: From the findings, a framework that offers big data analytics at two levels, micro- (strategic) and macro- (operational) levels, was developed. Based on the framework, a model was developed, which can be used to guide the analysis of heterogeneous data sets that exist within networks.Conclusion: The multilevel approach ensures a fully detailed analysis, which is intended to increase accuracy, reduce redundancy and put the manipulation and manifestation of data sets into perspectives for improved organisations’ competitiveness.


2018 ◽  
Vol 14 (9) ◽  
pp. 1213-1225 ◽  
Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Sign in / Sign up

Export Citation Format

Share Document