A Survey of Multidimensional Modeling Methodologies

Many methodologies have been presented to support the multidimensional design of the data warehouse. First methodologies introduced were requirement-driven but the semantics of a data warehouse require to also consider data sources along the design process. In the following years, data sources gained relevance in multidimensional modeling and gave rise to several data-driven methodologies that automate the data warehouse design process from relational sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methodologies have been introduced proposing to combine data-driven and requirement-driven approaches. On the other hand, new approaches focus on considering other kind of structured data sources that have gained relevance in the last years such as ontologies or XML. In this article we present the most relevant methodologies introduced in the literature and a detailed comparison showing main features of each approach.

Download Full-text

Multidimensional Design Methods for Data Warehousing

Integrations of Data Warehousing, Data Mining and Database Technologies ◽

10.4018/978-1-60960-537-7.ch005 ◽

2011 ◽

pp. 78-105 ◽

Cited By ~ 1

Author(s):

Oscar Romero ◽

Alberto Abelló

Keyword(s):

Data Warehouse ◽

Design Process ◽

Data Warehousing ◽

Design Methods ◽

Data Sources ◽

Data Driven ◽

Main Research ◽

Multidimensional Modeling ◽

Warehouse Design ◽

Warehousing Systems

In the last years, data warehousing systems have gained relevance to support decision making within organizations. The core component of these systems is the data warehouse and nowadays it is widely assumed that the data warehouse design must follow the multidimensional paradigm. Thus, many methods have been presented to support the multidimensional design of the data warehouse.The first methods introduced were requirement-driven but the semantics of the data warehouse (since the data warehouse is the result of homogenizing and integrating relevant data of the organization in a single, detailed view of the organization business) require to also consider the data sources during the design process. Considering the data sources gave rise to several data-driven methods that automate the data warehouse design process, mainly, from relational data sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methods have been introduced proposing to combine data-driven and requirement-driven approaches. These methods focus on automating the whole process and improving the feedback retrieved by each approach to produce better results. On the other hand, some new approaches focus on considering alternative scenarios than relational sources. These methods also consider (semi)-structured data sources, such as ontologies or XML, that have gained relevance in the last years. Thus, they introduce innovative solutions for overcoming the heterogeneity of the data sources. All in all, we discuss the current scenario of multidimensional modeling by carrying out a survey of multidimensional design methods. We present the most relevant methods introduced in the literature and a detailed comparison showing the main features of each approach.

Download Full-text

Big Data Warehouse Automatic Design Methodology

Big Data ◽

10.4018/978-1-4666-9840-6.ch023 ◽

2016 ◽

pp. 454-492

Author(s):

Francesco Di Tria ◽

Ezio Lefons ◽

Filippo Tangorra

Keyword(s):

Big Data ◽

Data Warehouse ◽

Design Process ◽

Hybrid Approach ◽

End Users ◽

Structured Data ◽

Data Sources ◽

The One ◽

Definition Of ◽

Big Data Warehouse

Traditional data warehouse design methodologies are based on two opposite approaches. The one is data oriented and aims to realize the data warehouse mainly through a reengineering process of the well-structured data sources solely, while minimizing the involvement of end users. The other is requirement oriented and aims to realize the data warehouse only on the basis of business goals expressed by end users, with no regard to the information obtainable from data sources. Since these approaches are not able to address the problems that arise when dealing with big data, the necessity to adopt hybrid methodologies, which allow the definition of multidimensional schemas by considering user requirements and reconciling them against non-structured data sources, has emerged. As a counterpart, hybrid methodologies may require a more complex design process. For this reason, the current research is devoted to introducing automatisms in order to reduce the design efforts and to support the designer in the big data warehouse creation. In this chapter, the authors present a methodology based on a hybrid approach that adopts a graph-based multidimensional model. In order to automate the whole design process, the methodology has been implemented using logical programming.

Download Full-text

Big Data Warehouse Automatic Design Methodology

Big Data Management, Technologies, and Applications - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-4699-5.ch006 ◽

2013 ◽

pp. 115-149 ◽

Cited By ~ 9

Author(s):

Francesco Di Tria ◽

Ezio Lefons ◽

Filippo Tangorra

Keyword(s):

Big Data ◽

Data Warehouse ◽

Design Process ◽

Hybrid Approach ◽

End Users ◽

Structured Data ◽

Data Sources ◽

The One ◽

Definition Of ◽

Big Data Warehouse

Download Full-text

REMARKS ON THE VALUE OF THE HIGGS MASS FROM THE PRESENT LEP DATA

Modern Physics Letters A ◽

10.1142/s0217732395000910 ◽

1995 ◽

Vol 10 (10) ◽

pp. 845-852 ◽

Cited By ~ 5

Author(s):

M. CONSOLI ◽

Z. HIOKI

Keyword(s):

Standard Model ◽

Higgs Mass ◽

Detailed Comparison ◽

The Other ◽

Firm Conclusion ◽

Other Hand ◽

Model Predictions ◽

The One

We perform a detailed comparison of the present LEP data with the one-loop standard model predictions. It is pointed out that for mt = 174 GeV the "bulk" of the data prefers a rather large value of the Higgs mass in the range of 500–1000 GeV, in agreement with the indications from the W mass. On the other hand, to accommodate a light Higgs it is crucial to include the more problematic data for the τ FB asymmetry. We discuss further improvements on the data which are required to obtain a firm conclusion.

Download Full-text

Can there be a Finite Interpretation of the Kantian Sublime?

Kant Yearbook ◽

10.1515/kantyb-2019-0002 ◽

2019 ◽

Vol 11 (1) ◽

pp. 17-40

Author(s):

Sacha Golob

Keyword(s):

Detailed Comparison ◽

The Other ◽

Costs And Benefits ◽

The Sublime ◽

Other Hand ◽

The One ◽

Resultant Theory

Abstract Kant’s account of the sublime makes frequent appeals to infinity, appeals which have been extensively criticised by commentators such as Budd and Crowther. This paper examines the costs and benefits of reconstructing the account in finitist terms. On the one hand, drawing ona detailed comparison of the first and third Critiques, I argue that the underlying logic of Kant’s position is essentially finitist. I defend the approach against longstanding objections, as well as addressing recent infinitist work by Moore and Smith. On the other hand, however, I argue that finitism faces distinctive problems of its own: whilst the resultant theory is a coherent and interesting one, it is unclear in what sense it remains an analysis of the sublime. I illustrate the worry by juxtaposing the finitist reading with analytical cubism.

Download Full-text

Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets

Entropy ◽

10.3390/e22101084 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1084

Author(s):

Stefano Garlaschi ◽

Anna Fochesato ◽

Anna Tovo

Keyword(s):

Human Activity ◽

Life Science ◽

The Other ◽

Data Driven ◽

Natural Ecosystems ◽

Random Samples ◽

Global Statistics ◽

The One ◽

Entire Dataset ◽

Input Amount

Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems.

Download Full-text

Disproportional Ownership Devices: Reviewing the Last 25 Years of Research

International Journal of Business and Management ◽

10.5539/ijbm.v11n4p22 ◽

2016 ◽

Vol 11 (4) ◽

pp. 22 ◽

Cited By ~ 1

Author(s):

Sara Saggese ◽

Fabrizia Sarto

Keyword(s):

Financial Markets ◽

Policy Making ◽

Civil Law ◽

The Other ◽

Main Research ◽

The One ◽

Lock In ◽

Practical Implications ◽

Scholarly Attention ◽

Over Time

<p>The paper aims to systematize the literature on disproportional ownership devices by reviewing and classifying 148 articles published in international academic journals over the last 25 years. The findings show that the scholarly attention on disproportional ownership devices has grown over time. Most papers adopt the agency framework and examine the mechanisms for leveraging voting power and to lock-in control, especially in civil law countries. Corporate governance journals prevail as leading outlets, despite the lack of publications specialized on the topic. Finally, the literature systematization highlights a research taxonomy based on outcomes and drivers of disproportional ownership devices. The article has both theoretical and practical implications. First, it develops a literature framework that systematically outlines the main research streams on the topic and identifies under-explored issues so as to guide future scholarly efforts. Second, it highlights the implications of disproportional ownership devices for company outcomes and reporting. Thereby, on the one hand, it supports managers in selecting the appropriate combination of these mechanisms so as to attract and retain investors. On the other hand, it emphasizes the importance of proper policy making interventions to improve transparency, openness and competitiveness of financial markets.</p>

Download Full-text

Clustering suicides: A data-driven, exploratory machine learning approach

European Psychiatry ◽

10.1016/j.eurpsy.2019.08.009 ◽

2019 ◽

Vol 62 ◽

pp. 15-19 ◽

Cited By ~ 1

Author(s):

Birgit Ludwig ◽

Daniel König ◽

Nestor D. Kapusta ◽

Victor Blüml ◽

Georg Dorffner ◽

...

Keyword(s):

Machine Learning ◽

Cluster Analysis ◽

Analysis Data ◽

The Other ◽

Data Driven ◽

Learning Approach ◽

Suicide Methods ◽

Machine Learning Approach ◽

The One ◽

Methods Of Suicide

Abstract Methods of suicide have received considerable attention in suicide research. The common approach to differentiate methods of suicide is the classification into “violent” versus “non-violent” method. Interestingly, since the proposition of this dichotomous differentiation, no further efforts have been made to question the validity of such a classification of suicides. This study aimed to challenge the traditional separation into “violent” and “non-violent” suicides by generating a cluster analysis with a data-driven, machine learning approach. In a retrospective analysis, data on all officially confirmed suicides (N = 77,894) in Austria between 1970 and 2016 were assessed. Based on a defined distance metric between distributions of suicides over age group and month of the year, a standard hierarchical clustering method was performed with the five most frequent suicide methods. In cluster analysis, poisoning emerged as distinct from all other methods – both in the entire sample as well as in the male subsample. Violent suicides could be further divided into sub-clusters: hanging, shooting, and drowning on the one hand and jumping on the other hand. In the female sample, two different clusters were revealed – hanging and drowning on the one hand and jumping, poisoning, and shooting on the other. Our data-driven results in this large epidemiological study confirmed the traditional dichotomization of suicide methods into “violent” and “non-violent” methods, but on closer inspection “violent methods” can be further divided into sub-clusters and a different cluster pattern could be identified for women, requiring further research to support these refined suicide phenotypes.

Download Full-text

Rollover Warning for Articulated Heavy Vehicles Based on a Time-to-Rollover Metric

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.1988340 ◽

2004 ◽

Vol 127 (3) ◽

pp. 406-414 ◽

Cited By ~ 37

Author(s):

Bo-Chiuan Chen ◽

Huei Peng

Keyword(s):

Neural Network ◽

Simple Model ◽

Design Process ◽

The Other ◽

Roll Angle ◽

Heavy Vehicle ◽

Vehicle Speed ◽

Heavy Vehicles ◽

The One ◽

Accuracy Problem

A Time-To-Rollover (TTR) metric is proposed as the basis to assess rollover threat for an articulated heavy vehicle. The TTR metric accurately “counts-down” toward rollover regardless of vehicle speed and steering patterns, so that the level of rollover threat is accurately assessed. There are two conflicting requirements in the implementation of TTR. On the one hand, a model significantly faster than real-time is needed. On the other hand, the TTR predicted by this model needs to be accurate enough under all driving scenarios. An innovative approach is proposed in this paper to solve this dilemma and the design process is illustrated in an example. First, a simple yet reasonably accurate yaw∕roll model is identified. A Neural Network (NN) is then developed to mitigate the accuracy problem of this simple model. The NN takes the TTR generated by the simple model, vehicle roll angle, and change of roll angle to generate an enhanced NN-TTR index. The NN was trained and verified under a variety of driving patterns. It was found that an accurate TTR is achieved across all the driving scenarios we tested.

Download Full-text

A két vagy több nyelvet beszélő személyek memóriarendszere

Magyar Pszichológiai Szemle ◽

10.1556/mpszle.55.2000.1.2 ◽

2000 ◽

Vol 55 (1) ◽

pp. 19-44

Author(s):

Tünde Éva Polonyi

Keyword(s):

Second Language ◽

Second Language Learners ◽

Early Stage ◽

Recognition Task ◽

The Other ◽

Data Driven ◽

Word Fragment ◽

Free Recall Task ◽

Performance Patterns ◽

The One

A két- és többnyelvuek információtárolása és feldolgozása vitatott téma: egyes kutatók szerint ez olyan kognitív alrendszerek segítségével történik, amelyek tartalmazzák az emlékezeti képzeteket is és beszélt nyelveikkel állnak kapcsolatban, viszont funkcionálisan függetlenek egymástól (a függetlenség hipotézise); egy másik modell szerint (az egymástól való függés hipotézise) a különálló lingvisztikai rendszerek funkcionálisan kötodnek egy olyan közös fogalmi rendszerhez, ami egyben a megosztott memóriatároló is. Kísérletem célja az volt, hogy egyetlen vizsgálatban, különbözo bevésési stratégiákat és elohívási feladatokat alkalmazva olyan teljesítménymintákat mérjek fel, amelyeket egyik vagy másik modell alátámasztására szoktak felhozni; ezenkívül a fejlodési hipotézist is vizsgálom.Magyar–román–angol háromnyelvuek vettek részt a vizsgálatban két csoportba osztva. Hipotézisem szerint az angolt nehézkesebben beszélok teljesítménye egy adatvezérlésu szókiegészítési feladatnál a függetlenégi hipotézist kell hogy alátámassza, azonban eredményeim azt mutatták, hogy az adat vezérlésu és fogalmi vezérlésu feldolgozás itt együtt jelentkezik; a szabad felidézési feladat ered ményei a nyelvtol való függetlenség hipotézisét támasztották alá, a felismerési feladat eredményei pedig szintén a két típusú feldolgozás kombinációját mutatták. Az angolt folyékonyabban beszélo alanyok esetében nem találtam szignifikáns különbségeket a különbözo bevésési stratégiák között, ami újabb bizonyítékot jelent Kroll és Stewart (1994) modellje mellett. A nehézkesebben beszélo háromnyelvuek tehát a lexikális- és fogalmi közvetítés kombinációját mutatták, és csak a gyakorlott beszélokre jellemzo a tiszta fogalmi közvetítés. Általános következtetésem az, hogy a leghasznosabb kutatási paradigma egy olyan transzfer-központú szemlélet lenne, amelyben a megorzési próbákon való teljesítmény olyan mértékben javul, amilyen mértékben a teszt által megkövetelt eljárások megismétlik a bevésési eljárásokat.Bilinguals’ information-representation and -processing is a controversial theme among psycholinguists: According to some researchers bilinguals have cognitive subsystems linked to their known languages, which include the memory stores, as well, but they are functionally independent from each other (independence position). On the other hand, the interdependence position maintain that bilinguals represent words in a supralinguistic code, possibly based on the meanings of the words, that is independent of the language in which the words occurred. According to the developmental hypothesis second language learners start only with lexical associations, but gradually develop direct links between the second language lexicon and concepts.The aims of my study were: 1) to measure performance patterns, which are usually taken to reflect the one or the other model, in one experiment, using different retrieval tasks under identical encoding conditions; 2) to examine the developmental hypothesis by using less fluent and more fluent trilinguals. The subjects of my study were Hungarian–Romanian–English trilinguals, divided into two groups. According to my hypothesis, in the case of the less fluently speakers of English, a mostly data-driven task such as word fragment completion would depend on the matching of language at study and test, thus supporting the independence hypothesis. However, my results showed that in the case of this task both the data-driven and conceptually-driven processing is present: not only the language of study was important, but the increasing elaborate processing during study, as well. The results of the free recall task, as predicted, revealed evidence for interdependence effects. Finally, the recognition task showed again the combination of the two kind of processing: data-driven and conceptually-driven processing. The more fluent subjects, in turn, could face all the conditions and all the tasks almost equally well, suggesting that they mediate their languages entirely conceptually. In sum, we can tell that in the mind of the multilingual words are organised on the basis of meaning, not language. At very early stage of language acquisition, however, language specific cues intrude, even when subjects are concentrating upon meaning.My general conclusion is that the most useful research paradigm would be a transfer appropriate approach, according to which the performance on the retention tasks benefit to the extent to which procedures demanded by the task repeat those employed during encoding.

Download Full-text