Cloud Based Heterogeneous Big Data Integration and Data Analysis for Business Intelligence

Author(s):  
T. Jayaraj ◽  
J. Abdul Samath
2020 ◽  
Vol 26 (4) ◽  
pp. 190-194
Author(s):  
Jacek Pietraszek ◽  
Norbert Radek ◽  
Andrii V. Goroshko

AbstractThe introduction of solutions conventionally called Industry 4.0 to the industry resulted in the need to make many changes in the traditional procedures of industrial data analysis based on the DOE (Design of Experiments) methodology. The increase in the number of controlled and observed factors considered, the intensity of the data stream and the size of the analyzed datasets revealed the shortcomings of the existing procedures. Modifying procedures by adapting Big Data solutions and data-driven methods is becoming an increasingly pressing need. The article presents the current methods of DOE, considers the existing problems caused by the introduction of mass automation and data integration under Industry 4.0, and indicates the most promising areas in which to look for possible problem solutions.


2018 ◽  
Vol 10 (10) ◽  
pp. 3778 ◽  
Author(s):  
Dong-Hui Jin ◽  
Hyun-Jung Kim

Efficient decision making based on business intelligence (BI) is essential to ensure competitiveness for sustainable growth. The rapid development of information and communication technology has made collection and analysis of big data essential, resulting in a considerable increase in academic studies on big data and big data analysis (BDA). However, many of these studies are not linked to BI, as companies do not understand and utilize the concepts in an integrated way. Therefore, the purpose of this study is twofold. First, we review the literature on BI, big data, and BDA to show that they are not separate methods but an integrated decision support system. Second, we explore how businesses use big data and BDA practically in conjunction with BI through a case study of sorting and logistics processing of a typical courier enterprise. We focus on the company’s cost efficiency as regards to data collection, data analysis/simulation, and the results from actual application. Our findings may enable companies to achieve management efficiency by utilizing big data through efficient BI without investing in additional infrastructure. It could also give them indirect experience, thereby reducing trial and error in order to maintain or increase competitiveness.


Entity Resolution (ER) is the process of identifying records that refer to the same real-world entity. It plays a key role in many applications as data warehouse, data integration, and business intelligence. Comparing every record with all corresponding records is infeasible especially for a big dataset. To overcome such a problem, blocking techniques have been implemented. In this paper, we propose a novel Efficient Multi-Phase Blocking Strategy (EMPBS) for resolving duplicates in big data. As per our knowledge, some state of art blocking techniques may result in overlapping blocks (i.e. Q-grams) which cause redundant comparisons and hence increase the time complexity. Our proposed blocking strategy has disjoint blocks and less time complexity compared to Q-grams and slandered blocking techniques. In addition, EMPBS is general and requires no restrictions on the type of blocking keys. EMPBS consists of three phases. The first one generates three single efficient blocking keys. The second phase takes the output of the first phase as an input to construct a compound key. The compound key is composed of concatenation of two single blocking keys. Three compound blocking keys are the output of this phase that will be used as an input for the last phase, which is generating the Efficient Multi-Phase Blocking Key (EMPBK). EMPBK is constructed using the union of two compound blocking keys. The implementation of EMPBS presents promising results in terms of Reduction Ratio (RR) as it achieves a higher value of RR than adopting only a single blocking key, while at the same time maintains nearly the same precision and recall. EMPBS reduced about 84% of the average number of comparisons accomplished in a single blocking key. To evaluate EMPBS, we developed a Duplicate Generation tool (DupGen) that accepts a clean semi-structured file as an input and generates labeled duplicate records according to certain criteria.


Author(s):  
Vishnu VandanaKolisetty ◽  
Dharmendra Singh Rajput

AbstractThe process of integration through classification provides a unified representation of diverse data sources in Big data. The main challenges of big data analysis are due to the various granularities, irreconcilable data models, and multipart interdependencies between data content. Previously designed models were facing problems in integrating and analyzing big data due to highly complex and dynamic multi-source and heterogeneous information variation and also in processing and classifying the association among the attributes in a schema. In this paper, we propose an integration and classification approach through designing a Probabilistic Semantic Association (PSA) method to generate the feature pattern for the sources of big data. The PSA approach is trained to understand the data association and dependency pattern between the data class and incoming data to map the data objects accurately. It initially builds a data integration mechanism by transforming data into structured and learn to utilize the trained knowledge to classify the probabilistic association among the data and knowledge patterns. Later it builds a data analysis mechanism to analyze the mapped data through PSA to evaluate the integration efficiency. An experimental evaluation is performed over a real-time crime dataset generated from multiple locations having various events classes. The analysis of results confined that the utilization of knowledge patterns of accurate classification to enhance the integration of multiple source data is appropriate. The measure of precision, recall, fall-out rate, and F-measure approve the efficiency of the proposed PSA method. Even in comparison with the state-of-art classification method and with SC-LDA algorithm shows an improvisation in the prediction accuracy and enhance the data integration.


2019 ◽  
Vol 9 (1) ◽  
pp. 01-12 ◽  
Author(s):  
Kristy F. Tiampo ◽  
Javad Kazemian ◽  
Hadi Ghofrani ◽  
Yelena Kropivnitskaya ◽  
Gero Michel

Sign in / Sign up

Export Citation Format

Share Document