data warehousing Latest Research Papers

TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing

10.21203/rs.3.rs-944044/v1 ◽

2022 ◽

Author(s):

M. Asif Naeem ◽

Wasiullah Waqar ◽

Farhaan Mirza ◽

Ali Tahir

Keyword(s):

Real Time ◽

Data Warehousing ◽

Cost Model ◽

Research Problem ◽

Daily Basis ◽

Stream Data ◽

Time Data ◽

Business Decisions ◽

Real Time Data ◽

Modern Era

Abstract Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of $R$ in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.

Large Scale System for Social Media Data Warehousing

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.290890 ◽

2022 ◽

Vol 18 (1) ◽

pp. 0-0

Keyword(s):

Social Media ◽

Information Extraction ◽

Data Warehouse ◽

Large Scale ◽

Data Warehousing ◽

Very High Frequency ◽

Social Media Data ◽

Large Scale System ◽

Linguistic Rules ◽

Media Data

Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.

Understanding the Concept of Data Warehousing and Challenges in Its Implementation

Mathematical Statistician and Engineering Applications ◽

10.17762/msea.v71i1.30 ◽

2022 ◽

Vol 71 (1) ◽

Author(s):

Monika Soni

Keyword(s):

Data Analysis ◽

Data Warehousing ◽

Analysis Process ◽

Data Ware

The aim of this paper is to understand the concept of Data ware housing and how it is implemented. It is related to the data analysis of the data in an organisation. It facilitates and makes the analysis process easy for the workers of the organisation. The paper will also explain two approaches that are followed in data ware housing. The process of implementation of data ware house will also discussed further in this paper. There are certain challenges to create data ware house.

An Innovative Method to Extract Data in a Real-time Data Warehousing Environment

10.5121/csit.2021.112401 ◽

2021 ◽

Author(s):

Flavio de Assis Vilela ◽

Ricardo Rodrigues Ciferri

Keyword(s):

Real Time ◽

Data Warehousing ◽

Data Extraction ◽

Synthetic Data ◽

Knowledge Discovery In Databases ◽

Data Repository ◽

Time Data ◽

Innovative Method ◽

Real Time Data ◽

Time Requirements

ETL (Extract, Transform, and Load) is an essential process required to perform data extraction in knowledge discovery in databases and in data warehousing environments. The ETL process aims to gather data that is available from operational sources, process and store them into an integrated data repository. Also, the ETL process can be performed in a real-time data warehousing environment and store data into a data warehouse. This paper presents a new and innovative method named Data Extraction Magnet (DEM) to perform the extraction phase of ETL process in a real-time data warehousing environment based on non-intrusive, tag and parallelism concepts. DEM has been validated on a dairy farming domain using synthetic data. The results showed a great performance gain in comparison to the traditional trigger technique and the attendance of real-time requirements.

Data Warehousing for Formula One (Racing) Popularity Rating Using Pentaho Tools

10.1109/iccca52192.2021.9666247 ◽

2021 ◽

Author(s):

Deshak Bhatnagar ◽

Siddhaling Urolagin

Keyword(s):

Data Warehousing ◽

Formula One

A Framework for Developing an Enterprise Data Warehousing Solution

10.1201/9780429114878-70 ◽

2021 ◽

pp. 755-764

Author(s):

Ali H. Murtaza

Keyword(s):

Data Warehousing

Developing a Corporate Data Warehousing Strategy

Data Management ◽

10.1201/9780429114878-67 ◽

2021 ◽

pp. 723-738

Author(s):

Manjit Sidhu

Keyword(s):

Data Warehousing

Mapping the road to elimination: a 5-year evaluation of implementation strategies associated with hepatitis C treatment in the veterans health administration

BMC Health Services Research ◽

10.1186/s12913-021-07312-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Vera Yakovchenko ◽

Timothy R. Morgan ◽

Matthew J. Chinman ◽

Byron J. Powell ◽

Rachel Gonzalez ◽

...

Keyword(s):

Hepatitis C ◽

Data Warehousing ◽

Implementation Strategies ◽

Veterans Health Administration ◽

World Health ◽

Veterans Health ◽

Health Administration ◽

Hcv Treatment ◽

Medical Centers ◽

Treatment Volume

Abstract Background While few countries and healthcare systems are on track to meet the World Health Organization’s hepatitis C virus (HCV) elimination goals, the US Veterans Health Administration (VHA) has been a leader in these efforts. We aimed to determine which implementation strategies were associated with successful national viral elimination implementation within the VHA. Methods We conducted a five-year, longitudinal cohort study of the VHA Hepatic Innovation Team (HIT) Collaborative between October 2015 and September 2019. Participants from 130 VHA medical centers treating HCV were sent annual electronic surveys about their use of 73 implementation strategies, organized into nine clusters as described by the Expert Recommendations for Implementing Change taxonomy. Descriptive and nonparametric analyses assessed strategy use over time, strategy attribution to the HIT, and strategy associations with site HCV treatment volume and rate of adoption, following the Theory of Diffusion of Innovations. Results Between 58 and 109 medical centers provided responses in each year, including 127 (98%) responding at least once, and 54 (42%) responding in all four implementation years. A median of 13–27 strategies were endorsed per year, and 8–36 individual strategies were significantly associated with treatment volume per year. Data warehousing, tailoring, and patient-facing strategies were most commonly endorsed. One strategy—“identify early adopters to learn from their experiences”—was significantly associated with HCV treatment volume in each year. Peak implementation year was associated with revising professional roles, providing local technical assistance, using data warehousing (i.e., dashboard population management), and identifying and preparing champions. Many of the strategies were driven by a national learning collaborative, which was instrumental in successful HCV elimination. Conclusions VHA’s tremendous success in rapidly treating nearly all Veterans with HCV can provide a roadmap for other HCV elimination initiatives.

Explicitly Disclosing Clients Illness Catalogue Using Data Science Techniques

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38658 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1504-1508

Author(s):

A. Sai Ram

Keyword(s):

Cloud Computing ◽

Big Data ◽

Database Management ◽

Data Privacy ◽

Data Science ◽

Data Warehousing ◽

Database Management System ◽

Nominal Level ◽

Communication Problems ◽

Using Data

Abstract: Across the world in our day-to-day life, we come across various medical inaccuracies caused due to unreliable patient’s reminiscence. Statistically, communication problems are the most significant aspect that hampers the diagnosis of patient’s diseases. So, this paper represents the best theoretical solution to achieve patient care in the most adequate way. In these pandemic days, the communication gap between the patient and the physician has begun to decline to a nominal level. This paper demonstrates a vital solution and a steppingstone to the complete digitalization of the client’s illness catalogue. To attain the solution in a specified manner we are using adverse pre-existential technologies like data warehousing, database management system, cloud computing, big data, etc. We also persistently maintain the most secure, impenetrable infrastructure enabling the client’s data privacy. Keywords: Illness catalogue, cloud computing, data warehousing, database management systems, big data.