scholarly journals A Novel Clustering Algorithm to Process Big Data Using Hadoop Framework

The real challenge for data miners lies in extracting useful information from huge datasets. Moreover, choosing an efficient algorithm to analyze and process these unstructured data is itself a challenge. Cluster analysis is an unsupervised practice to attain data insight in the era of Big Data. Hyperflated PIC is a Big Data processing solution designed to take advantage over clustering. It is a scalable efficient algorithm to address the shortcomings of existing clustering algorithm and it can process huge datasets quickly. HPIC algorithms have been validated by experimenting them with synthetic and real datasets using different evaluation measure. The quality of clustering results has also been analyzed and proved to be highly efficient and suitable for Big Data processing.

2020 ◽  
Vol 19 (3) ◽  
pp. 416-429
Author(s):  
V.V. Makrusev ◽  
A.A. Sobol'

Subject. The article considers prospects for enhancing the quality of analytical activities of the Customs authorities through a cognitive approach implementation. Objectives. The aim is to formulate promising areas for improving the quality of analytical work of the Customs authorities by using a cognitive approach, to develop a concept for managing the analytical activities based on knowledge. Methods. The study rests on systems methodology and institutional theory. It also employs cognitive modeling techniques. Results. We show the process of transferring disparate data into knowledge, consider basic methods of big data processing, and identify the most acceptable method of customs data analysis. The paper discloses the contents and elements of the cognitive approach in analytical activities of on-line monitoring centers and describes an experiment with the application of data mining technology on the basis of the Federal Customs Service of Russia. We recommend the said approach to analytical and ICT units of organizations operating in the field of customs services. Conclusions. Current trends in software development, the use of electronic forms of customs documents, and continuously expanded list of analytical tools for big data processing entail the need for changing traditional approaches to information analysis to assess customs risks. The expert method should be supplemented with new, previously unused decision support tools, such as tools that enable automated big data analysis.


Author(s):  
Saifuzzafar Jaweed Ahmed

Big Data has become a very important part of all industries and organizations sectors nowadays. All sectors like energy, banking, retail, hardware, networking, etc all generate a huge amount of unstructured data which is processed and analyzed accurately in a structured form. Then the structured data can reveal very useful information for their business growth. Big Data helps in getting useful data from unstructured or heterogeneous data by analyzing them. Big data initially defined by the volume of a data set. Big data sets are generally huge, measuring tens of terabytes and sometimes crossing the sting of petabytes. Today, big data falls under three categories structured, unstructured, and semi-structured. The size of big data is improving in a fast phase from Terabytes to Exabytes Of data. Also, Big data requires techniques that help to integrate a huge amount of heterogeneous data and to process them. Data Analysis which is a big data process has its applications in various areas such as business processing, disease prevention, cybersecurity, and so on. Big data has three major issues such as data storage, data management, and information retrieval. Big data processing requires a particular setup of hardware and virtual machines to derive results. The processing is completed simultaneously to realize results as quickly as possible. These days big data processing techniques include Text mining and sentimental analysis. Text analytics is a very large field under which there are several techniques, models, methods for automatic and quantitative analysis of textual data. The purpose of this paper is to show how the text analysis and sentimental analysis process the unstructured data and how these techniques extract meaningful information and, thus make information available to the various data mining statistical and machine learning) algorithms.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0246718
Author(s):  
Hongyan Ma

The purposes are to evaluate the Distributed Clustering Algorithm (DCA) applicability in the power system’s big data processing and find the information economic dispatch strategy suitable for new energy consumption in power systems. A two-layer DCA algorithm is proposed based on K-Means Clustering (KMC) and Affinity Propagation (AP) clustering algorithms. Then the incentive Demand Response (DR) is introduced, and the DR flexibility of the user side is analyzed. Finally, the day-ahead dispatch and real-time dispatch schemes are combined, and a multi-period information economic dispatch model is constructed. The algorithm performance is analyzed according to case analyses of new energy consumption. Results demonstrate that the two-layer DCA’s calculation time is 5.23s only, the number of iterations is small, and the classification accuracy rate reaches 0.991. Case 2 corresponding to the proposed model can consume the new energy, and the income of the aggregator can be maximized. In short, the multi-period information economic dispatch model can consume the new energy and meet the DR of the user side.


Sign in / Sign up

Export Citation Format

Share Document