Maintaining Dimension's History in Data Warehouses Effectively

2019 ◽  
Vol 15 (3) ◽  
pp. 46-62
Author(s):  
Canan Eren Atay ◽  
Georgia Garani

A data warehouse is considered a key aspect of success for any decision support system. Research on temporal databases have produced important results in this field, and data warehouses, which store historical data, can clearly benefit from such studies. A slowly changing dimension is a dimension in which any of its attributes in a data warehouse can change infrequently over time. Although different solutions have been proposed, each has its own particular disadvantages. The authors propose the Object-Relational Temporal Data Warehouse (O-RTDW) model for the slowly changing dimensions in this research work. Using this approach, it is possible to keep track of the whole history of an object in a data warehouse efficiently. The proposed model has been implemented on a real data set and tested successfully. Several limitations implied in other solutions, such as redundancy, surrogate keys, incomplete historical data, and creation of additional tables are not present in our solution.

2019 ◽  
Vol XVI (2) ◽  
pp. 1-11
Author(s):  
Farrukh Jamal ◽  
Hesham Mohammed Reyad ◽  
Soha Othman Ahmed ◽  
Muhammad Akbar Ali Shah ◽  
Emrah Altun

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
K. S. Sultan ◽  
A. S. Al-Moisheer

We discuss the two-component mixture of the inverse Weibull and lognormal distributions (MIWLND) as a lifetime model. First, we discuss the properties of the proposed model including the reliability and hazard functions. Next, we discuss the estimation of model parameters by using the maximum likelihood method (MLEs). We also derive expressions for the elements of the Fisher information matrix. Next, we demonstrate the usefulness of the proposed model by fitting it to a real data set. Finally, we draw some concluding remarks.


2019 ◽  
Author(s):  
Leili Tapak ◽  
Omid Hamidi ◽  
Majid Sadeghifar ◽  
Hassan Doosti ◽  
Ghobad Moradi

Abstract Objectives Zero-inflated proportion or rate data nested in clusters due to the sampling structure can be found in many disciplines. Sometimes, the rate response may not be observed for some study units because of some limitations (false negative) like failure in recording data and the zeros are observed instead of the actual value of the rate/proportions (low incidence). In this study, we proposed a multilevel zero-inflated censored Beta regression model that can address zero-inflation rate data with low incidence.Methods We assumed that the random effects are independent and normally distributed. The performance of the proposed approach was evaluated by application on a three level real data set and a simulation study. We applied the proposed model to analyze brucellosis diagnosis rate data and investigate the effects of climatic and geographical position. For comparison, we also applied the standard zero-inflated censored Beta regression model that does not account for correlation.Results Results showed the proposed model performed better than zero-inflated censored Beta based on AIC criterion. Height (p-value <0.0001), temperature (p-value <0.0001) and precipitation (p-value = 0.0006) significantly affected brucellosis rates. While, precipitation in ZICBETA model was not statistically significant (p-value =0.385). Simulation study also showed that the estimations obtained by maximum likelihood approach had reasonable in terms of mean square error.Conclusions The results showed that the proposed method can capture the correlations in the real data set and yields accurate parameter estimates.


Author(s):  
Maurizio Pighin ◽  
Lucio Ieronutti

Data Warehouses are increasingly used by commercial organizations to extract, from a huge amount of transactional data, concise information useful for supporting decision processes. However, the task of designing a data warehouse and evaluating its effectiveness is not trivial, especially in the case of large databases and in presence of redundant information. The meaning and the quality of selected attributes heavily influence the data warehouse’s effectiveness and the quality of derived decisions. Our research is focused on interactive methodologies and techniques targeted at supporting the data warehouse design and evaluation by taking into account the quality of initial data. In this chapter we propose an approach for supporting the data warehouses development and refinement, providing practical examples and demonstrating the effectiveness of our solution. Our approach is mainly based on two phases: the first one is targeted at interactively guiding the attributes selection by providing quantitative information measuring different statistical and syntactical aspects of data, while the second phase, based on a set of 3D visualizations, gives the opportunity of run-time refining taken design choices according to data examination and analysis. For experimenting proposed solutions on real data, we have developed a tool, called ELDA (EvaLuation DAta warehouse quality), that has been used for supporting the data warehouse design and evaluation.


Risks ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 33
Author(s):  
Łukasz Delong ◽  
Mario V. Wüthrich

The goal of this paper is to develop regression models and postulate distributions which can be used in practice to describe the joint development process of individual claim payments and claim incurred. We apply neural networks to estimate our regression models. As regressors we use the whole claim history of incremental payments and claim incurred, as well as any relevant feature information which is available to describe individual claims and their development characteristics. Our models are calibrated and tested on a real data set, and the results are benchmarked with the Chain-Ladder method. Our analysis focuses on the development of the so-called Reported But Not Settled (RBNS) claims. We show benefits of using deep neural network and the whole claim history in our prediction problem.


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Marcelo Bourguignon ◽  
Indranil Ghosh ◽  
Gauss M. Cordeiro

The transmuted family of distributions has been receiving increased attention over the last few years. For a baselineGdistribution, we derive a simple representation for the transmuted-Gfamily density function as a linear mixture of theGand exponentiated-Gdensities. We investigate the asymptotes and shapes and obtain explicit expressions for the ordinary and incomplete moments, quantile and generating functions, mean deviations, Rényi and Shannon entropies, and order statistics and their moments. We estimate the model parameters of the family by the method of maximum likelihood. We prove empirically the flexibility of the proposed model by means of an application to a real data set.


Testing is very essential in Data warehouse systems for decision making because the accuracy, validation and correctness of data depends on it. By looking to the characteristics and complexity of iData iwarehouse, iin ithis ipaper, iwe ihave itried ito ishow the scope of automated testing in assuring ibest data iwarehouse isolutions. Firstly, we developed a data set generator for creating synthetic but near to real data; then in isynthesized idata, with ithe help of hand icoded Extraction, Transformation and Loading (ETL) routine, anomalies are classified. For the quality assurance of data for a Data warehouse and to give the idea of how important the iExtraction, iTransformation iand iLoading iis, some very important test cases were identified. After that, to ensure the quality of data, the procedures of automated testing iwere iembedded iin ihand icoded iETL iroutine. Statistical analysis was done and it revealed a big enhancement in the quality of data with the procedures of automated testing. It enhances the fact that automated testing gives promising results in the data warehouse quality. For effective and easy maintenance of distributed data,a novel architecture was proposed. Although the desired result of this research is achieved successfully and the objectives are promising, but still there's a need to validate the results with the real life environment, as this research was done in simulated environment, which may not always give the desired results in real life environment. Hence, the overall potential of the proposed architecture can be seen until it is deployed to manage the real data which is distributed globally.


2020 ◽  
Vol 33 (02) ◽  
pp. 454-467
Author(s):  
Roghyeh Malekii Vishkaeii ◽  
Behrouz Daneshian ◽  
Farhad Hosseinzadeh Lotfi

Conventional Data Envelopment Analysis (DEA) models are based on a production possibility set (PPS) that satisfies various postulates. Extension or modification of these axioms leads to different DEA models. In this paper, our focus concentrates on the convexity axiom, leaving the other axioms unmodified. Modifying or extending the convexity condition can lead to a different PPS. This adaptation is followed by a two-step procedure to evaluate the efficiency of a unit based on the resulting PPS. The proposed frontier is located between two standard, well-known DEA frontiers. The model presented can differentiate between units more finely than the standard variable return to scale (VRS) model. In order to illustrate the strengths of the proposed model, a real data set describing Iranian banks was employed. The results show that this alternative model outperforms the standard VRS model and increases the discrimination power of (VRS) models.


Author(s):  
Paula Rodríguez-Abruñeiras ◽  
Jesús Romero-Barranco

The present paper deals with a proposal for enhancing students’ engagement in the course ‘History of the English Language’ of the Degree in English Studies (Universitat de València). For the purpose, the traditional lectures will be combined with a research project carried out by groups of students (research teams) in which two digital tools will be used: electronic linguistic corpora and YouTube. Electronic linguistic corpora, on the one hand, will allow students to discover the diachronic development of certain linguistic features by looking at real data and making conclusions based on frequencies by themselves. YouTube, on the other, is a most appropriate online environment where students will share a video lecture so that their classmates can benefit from the research work they did, fostering peer-to-peer learning. The expected results are to make students more autonomous in their learning process, as they will be working on their project from the very beginning of the course; and to engage them more effectively since they will be working in a format that resembles what they do at their leisure time.


Sign in / Sign up

Export Citation Format

Share Document