Maintaining Dimension's History in Data Warehouses Effectively

A data warehouse is considered a key aspect of success for any decision support system. Research on temporal databases have produced important results in this field, and data warehouses, which store historical data, can clearly benefit from such studies. A slowly changing dimension is a dimension in which any of its attributes in a data warehouse can change infrequently over time. Although different solutions have been proposed, each has its own particular disadvantages. The authors propose the Object-Relational Temporal Data Warehouse (O-RTDW) model for the slowly changing dimensions in this research work. Using this approach, it is possible to keep track of the whole history of an object in a data warehouse efficiently. The proposed model has been implemented on a real data set and tested successfully. Several limitations implied in other solutions, such as redundancy, surrogate keys, incomplete historical data, and creation of additional tables are not present in our solution.

Download Full-text

EXPONENTIATED HALF-LOGISTIC LOMAX DISTRIBUTION WITH PROPERTIES AND APPLICATION

NED University Journal of Research ◽

10.35453/nedjr-ascn-2018-0033 ◽

2019 ◽

Vol XVI (2) ◽

pp. 1-11

Author(s):

Farrukh Jamal ◽

Hesham Mohammed Reyad ◽

Soha Othman Ahmed ◽

Muhammad Akbar Ali Shah ◽

Emrah Altun

Keyword(s):

Real Data ◽

Continuous Model ◽

Model Parameters ◽

Data Set ◽

Lomax Distribution ◽

Mathematical Properties ◽

Proposed Model ◽

Probability Weighted Moment ◽

Record Statistics ◽

Maximum Likelihood Criterion

A new three-parameter continuous model called the exponentiated half-logistic Lomax distribution is introduced in this paper. Basic mathematical properties for the proposed model were investigated which include raw and incomplete moments, skewness, kurtosis, generating functions, Rényi entropy, Lorenz, Bonferroni and Zenga curves, probability weighted moment, stress strength model, order statistics, and record statistics. The model parameters were estimated by using the maximum likelihood criterion and the behaviours of these estimates were examined by conducting a simulation study. The applicability of the new model is illustrated by applying it on a real data set.

Download Full-text

Mixture of Inverse Weibull and Lognormal Distributions: Properties, Estimation, and Illustration

Mathematical Problems in Engineering ◽

10.1155/2015/526786 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

K. S. Sultan ◽

A. S. Al-Moisheer

Keyword(s):

Information Matrix ◽

Real Data ◽

Likelihood Method ◽

Model Parameters ◽

Component Mixture ◽

Data Set ◽

Hazard Functions ◽

Lognormal Distributions ◽

Proposed Model ◽

Estimation Of Model Parameters

We discuss the two-component mixture of the inverse Weibull and lognormal distributions (MIWLND) as a lifetime model. First, we discuss the properties of the proposed model including the reliability and hazard functions. Next, we discuss the estimation of model parameters by using the maximum likelihood method (MLEs). We also derive expressions for the elements of the Fisher information matrix. Next, we demonstrate the usefulness of the proposed model by fitting it to a real data set. Finally, we draw some concluding remarks.

Download Full-text

Multilevel Zero-inflated Censored Beta Regression Modeling for Proportions and Rate Data with Extra-zeros

10.21203/rs.2.16731/v1 ◽

2019 ◽

Author(s):

Leili Tapak ◽

Omid Hamidi ◽

Majid Sadeghifar ◽

Hassan Doosti ◽

Ghobad Moradi

Keyword(s):

Regression Model ◽

Simulation Study ◽

Real Data ◽

P Value ◽

Parameter Estimates ◽

Beta Regression ◽

Rate Data ◽

Data Set ◽

Proposed Model ◽

Beta Regression Model

Abstract Objectives Zero-inflated proportion or rate data nested in clusters due to the sampling structure can be found in many disciplines. Sometimes, the rate response may not be observed for some study units because of some limitations (false negative) like failure in recording data and the zeros are observed instead of the actual value of the rate/proportions (low incidence). In this study, we proposed a multilevel zero-inflated censored Beta regression model that can address zero-inflation rate data with low incidence.Methods We assumed that the random effects are independent and normally distributed. The performance of the proposed approach was evaluated by application on a three level real data set and a simulation study. We applied the proposed model to analyze brucellosis diagnosis rate data and investigate the effects of climatic and geographical position. For comparison, we also applied the standard zero-inflated censored Beta regression model that does not account for correlation.Results Results showed the proposed model performed better than zero-inflated censored Beta based on AIC criterion. Height (p-value <0.0001), temperature (p-value <0.0001) and precipitation (p-value = 0.0006) significantly affected brucellosis rates. While, precipitation in ZICBETA model was not statistically significant (p-value =0.385). Simulation study also showed that the estimations obtained by maximum likelihood approach had reasonable in terms of mean square error.Conclusions The results showed that the proposed method can capture the correlations in the real data set and yields accurate parameter estimates.

Download Full-text

Interactive Quality-Oriented Data Warehouse Development

Progressive Methods in Data Warehousing and Business Intelligence ◽

10.4018/978-1-60566-232-9.ch004 ◽

2011 ◽

pp. 59-87

Author(s):

Maurizio Pighin ◽

Lucio Ieronutti

Keyword(s):

Data Warehouse ◽

Real Data ◽

Quantitative Information ◽

Second Phase ◽

Data Warehouses ◽

Warehouse Design ◽

Two Phases ◽

Data Warehouse Quality ◽

Commercial Organizations

Data Warehouses are increasingly used by commercial organizations to extract, from a huge amount of transactional data, concise information useful for supporting decision processes. However, the task of designing a data warehouse and evaluating its effectiveness is not trivial, especially in the case of large databases and in presence of redundant information. The meaning and the quality of selected attributes heavily influence the data warehouse’s effectiveness and the quality of derived decisions. Our research is focused on interactive methodologies and techniques targeted at supporting the data warehouse design and evaluation by taking into account the quality of initial data. In this chapter we propose an approach for supporting the data warehouses development and refinement, providing practical examples and demonstrating the effectiveness of our solution. Our approach is mainly based on two phases: the first one is targeted at interactively guiding the attributes selection by providing quantitative information measuring different statistical and syntactical aspects of data, while the second phase, based on a set of 3D visualizations, gives the opportunity of run-time refining taken design choices according to data examination and analysis. For experimenting proposed solutions on real data, we have developed a tool, called ELDA (EvaLuation DAta warehouse quality), that has been used for supporting the data warehouse design and evaluation.

Download Full-text

Neural Networks for the Joint Development of Individual Payments and Claim Incurred

Risks ◽

10.3390/risks8020033 ◽

2020 ◽

Vol 8 (2) ◽

pp. 33

Author(s):

Łukasz Delong ◽

Mario V. Wüthrich

Keyword(s):

Neural Networks ◽

Regression Models ◽

Development Process ◽

Real Data ◽

Data Set ◽

Joint Development ◽

Chain Ladder Method ◽

History Of ◽

Feature Information ◽

Chain Ladder

The goal of this paper is to develop regression models and postulate distributions which can be used in practice to describe the joint development process of individual claim payments and claim incurred. We apply neural networks to estimate our regression models. As regressors we use the whole claim history of incremental payments and claim incurred, as well as any relevant feature information which is available to describe individual claims and their development characteristics. Our models are calibrated and tested on a real data set, and the results are benchmarked with the Chain-Ladder method. Our analysis focuses on the development of the so-called Reported But Not Settled (RBNS) claims. We show benefits of using deep neural network and the whole claim history in our prediction problem.

Download Full-text

General Results for the Transmuted Family of Distributions and New Models

Journal of Probability and Statistics ◽

10.1155/2016/7208425 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 7

Author(s):

Marcelo Bourguignon ◽

Indranil Ghosh ◽

Gauss M. Cordeiro

Keyword(s):

Maximum Likelihood ◽

Generating Functions ◽

Real Data ◽

Model Parameters ◽

Data Set ◽

Proposed Model ◽

Method Of Maximum Likelihood ◽

The Family ◽

New Models ◽

Family Of Distributions

The transmuted family of distributions has been receiving increased attention over the last few years. For a baselineGdistribution, we derive a simple representation for the transmuted-Gfamily density function as a linear mixture of theGand exponentiated-Gdensities. We investigate the asymptotes and shapes and obtain explicit expressions for the ordinary and incomplete moments, quantile and generating functions, mean deviations, Rényi and Shannon entropies, and order statistics and their moments. We estimate the model parameters of the family by the method of maximum likelihood. We prove empirically the flexibility of the proposed model by means of an application to a real data set.

Download Full-text

Discovery of Knowledge by using Data Warehousing as well as ETL Processing

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1180.0782s619 ◽

2019 ◽

Vol 8 (2S6) ◽

pp. 936-945

Keyword(s):

Data Warehouse ◽

Real Life ◽

Real Data ◽

Automated Testing ◽

Quality Of Data ◽

Data Set ◽

The Real ◽

Using Data ◽

Data Warehouse Quality

Testing is very essential in Data warehouse systems for decision making because the accuracy, validation and correctness of data depends on it. By looking to the characteristics and complexity of iData iwarehouse, iin ithis ipaper, iwe ihave itried ito ishow the scope of automated testing in assuring ibest data iwarehouse isolutions. Firstly, we developed a data set generator for creating synthetic but near to real data; then in isynthesized idata, with ithe help of hand icoded Extraction, Transformation and Loading (ETL) routine, anomalies are classified. For the quality assurance of data for a Data warehouse and to give the idea of how important the iExtraction, iTransformation iand iLoading iis, some very important test cases were identified. After that, to ensure the quality of data, the procedures of automated testing iwere iembedded iin ihand icoded iETL iroutine. Statistical analysis was done and it revealed a big enhancement in the quality of data with the procedures of automated testing. It enhances the fact that automated testing gives promising results in the data warehouse quality. For effective and easy maintenance of distributed data,a novel architecture was proposed. Although the desired result of this research is achieved successfully and the objectives are promising, but still there's a need to validate the results with the real life environment, as this research was done in simulated environment, which may not always give the desired results in real life environment. Hence, the overall potential of the proposed architecture can be seen until it is deployed to manage the real data which is distributed globally.

Download Full-text

Modifying the convexity condition in Data Envelopment Analysis (DEA)

Nexo Revista Científica ◽

10.5377/nexo.v33i02.10784 ◽

2020 ◽

Vol 33 (02) ◽

pp. 454-467

Author(s):

Roghyeh Malekii Vishkaeii ◽

Behrouz Daneshian ◽

Farhad Hosseinzadeh Lotfi

Keyword(s):

Data Envelopment Analysis ◽

Real Data ◽

Data Envelopment ◽

Convexity Condition ◽

Discrimination Power ◽

Data Set ◽

Step Procedure ◽

Proposed Model ◽

Dea Models ◽

Production Possibility Set

Conventional Data Envelopment Analysis (DEA) models are based on a production possibility set (PPS) that satisfies various postulates. Extension or modification of these axioms leads to different DEA models. In this paper, our focus concentrates on the convexity axiom, leaving the other axioms unmodified. Modifying or extending the convexity condition can lead to a different PPS. This adaptation is followed by a two-step procedure to evaluate the efficiency of a unit based on the resulting PPS. The proposed frontier is located between two standard, well-known DEA frontiers. The model presented can differentiate between units more finely than the standard variable return to scale (VRS) model. In order to illustrate the strengths of the proposed model, a real data set describing Iranian banks was employed. The results show that this alternative model outperforms the standard VRS model and increases the discrimination power of (VRS) models.

Download Full-text

From scribe to YouTuber: A proposal to teach the History of the English Language in the digital era

5th International Conference on Higher Education Advances (HEAd'19) ◽

10.4995/head19.2019.9303 ◽

2019 ◽

Author(s):

Paula Rodríguez-Abruñeiras ◽

Jesús Romero-Barranco

Keyword(s):

English Language ◽

Research Work ◽

Real Data ◽

English Studies ◽

Linguistic Features ◽

Digital Era ◽

Traditional Lectures ◽

History Of ◽

The One ◽

Diachronic Development

The present paper deals with a proposal for enhancing students’ engagement in the course ‘History of the English Language’ of the Degree in English Studies (Universitat de València). For the purpose, the traditional lectures will be combined with a research project carried out by groups of students (research teams) in which two digital tools will be used: electronic linguistic corpora and YouTube. Electronic linguistic corpora, on the one hand, will allow students to discover the diachronic development of certain linguistic features by looking at real data and making conclusions based on frequencies by themselves. YouTube, on the other, is a most appropriate online environment where students will share a video lecture so that their classmates can benefit from the research work they did, fostering peer-to-peer learning. The expected results are to make students more autonomous in their learning process, as they will be working on their project from the very beginning of the course; and to engage them more effectively since they will be working in a format that resembles what they do at their leisure time.

Download Full-text

A Temporal Data Warehouse Conceptual Modelling and Its Transformation into Temporal Object Relational Model

Advances in Intelligent Systems and Computing - Advanced Intelligent Systems for Sustainable Development (AI2SD’2018) ◽

10.1007/978-3-030-11928-7_28 ◽

2019 ◽

pp. 314-323

Author(s):

Soumiya Ain El Hayat ◽

Mohamed Bahaj

Keyword(s):

Data Warehouse ◽

Relational Model ◽

Conceptual Modelling ◽

Temporal Data ◽

Temporal Object ◽

Object Relational

Download Full-text