scholarly journals Noise Resistant Multidimensional Data Fusion via Quasi-Cliques on Hypergraphs

Author(s):  
Alejandro Alvarez-Ayllon ◽  
Manuel Palomo-duarte ◽  
Juan Manuel Dodero

Cross-matching data stored on separate files is an everyday activity in the scientific domain. However sometimes the relation between attributes may not be obvious. The discovery of foreign keys on relational databases is a similar problem. Thus techniques devised for this problem can be adapted. Nonetheless, given the different nature of the data, which can be subject to uncertainty, this adaptation is not trivial.<br>This paper firstly introduces the concept of Equally-Distributed Dependencies, which is similar to the Inclusion Dependencies from the relational domain. We describe a correspondence in order to bridge existing ideas. We then propose PresQ: a new algorithm based on the search of maximal quasi-cliques on hyper-graphs to make it more robust to the nature of uncertain numerical data. This algorithm has been tested on three public datasets, showing promising results both in its capacity to find multidimensional equally-distributed sets of attributes and in run-time.

2021 ◽  
Author(s):  
Alejandro Alvarez-Ayllon ◽  
Manuel Palomo-duarte ◽  
Juan Manuel Dodero

Cross-matching data stored on separate files is an everyday activity in the scientific domain. However sometimes the relation between attributes may not be obvious. The discovery of foreign keys on relational databases is a similar problem. Thus techniques devised for this problem can be adapted. Nonetheless, given the different nature of the data, which can be subject to uncertainty, this adaptation is not trivial.<br>This paper firstly introduces the concept of Equally-Distributed Dependencies, which is similar to the Inclusion Dependencies from the relational domain. We describe a correspondence in order to bridge existing ideas. We then propose PresQ: a new algorithm based on the search of maximal quasi-cliques on hyper-graphs to make it more robust to the nature of uncertain numerical data. This algorithm has been tested on three public datasets, showing promising results both in its capacity to find multidimensional equally-distributed sets of attributes and in run-time.


2008 ◽  
pp. 2364-2370
Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (e.g. Wal-Mart’s data warehouse) and astronomical data (e.g. SKICAT) in scientific research, with textual data providing a descriptive rather than a central role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for non-numeric data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


Sensors ◽  
2019 ◽  
Vol 19 (21) ◽  
pp. 4746 ◽  
Author(s):  
Van Pham ◽  
Quang Le ◽  
Duc Nguyen ◽  
Nhu Dang ◽  
Huu Huynh ◽  
...  

While working on fire ground, firefighters risk their well-being in a state where any incident might cause not only injuries, but also fatality. They may be incapacitated by unpredicted falls due to floor cracks, holes, structure failure, gas explosion, exposure to toxic gases, or being stuck in narrow path, etc. Having acknowledged this need, in this study, we focus on developing an efficient portable system to detect firefighter’s falls, loss of physical performance, and alert high CO level by using a microcontroller carried by a firefighter with data fusion from a 3-DOF (degrees of freedom) accelerometer, 3-DOF gyroscope, 3-DOF magnetometer, barometer, and a MQ7 sensor using our proposed fall detection, loss of physical performance detection, and CO monitoring algorithms. By the combination of five sensors and highly efficient data fusion algorithms to observe the fall event, loss of physical performance, and detect high CO level, we can distinguish among falling, loss of physical performance, and the other on-duty activities (ODAs) such as standing, walking, running, jogging, crawling, climbing up/down stairs, and moving up/down in elevators. Signals from these sensors are sent to the microcontroller to detect fall, loss of physical performance, and alert high CO level. The proposed algorithms can achieve 100% of accuracy, specificity, and sensitivity in our experimental datasets and 97.96%, 100%, and 95.89% in public datasets in distinguishing between falls and ODAs activities, respectively. Furthermore, the proposed algorithm perfectly distinguishes between loss of physical performance and up/down movement in the elevator based on barometric data fusion. If a firefighter is unconscious following the fall or loss of physical performance, an alert message will be sent to their incident commander (IC) via the nRF224L01 module.


2019 ◽  
Vol 19 (2) ◽  
pp. 390-411 ◽  
Author(s):  
David Benjamin Verstraete ◽  
Enrique López Droguett ◽  
Viviana Meruane ◽  
Mohammad Modarres ◽  
Andrés Ferrada

With the availability of cheaper multisensor suites, one has access to massive and multidimensional datasets that can and should be used for fault diagnosis. However, from a time, resource, engineering, and computational perspective, it is often cost prohibitive to label all the data streaming into a database in the context of big machinery data, that is, massive multidimensional data. Therefore, this article proposes both a fully unsupervised and a semi-supervised deep learning enabled generative adversarial network-based methodology for fault diagnostics. Two public datasets of vibration data from rolling element bearings are used to evaluate the performance of the proposed methodology for fault diagnostics. The results indicate that the proposed methodology is a promising approach for both unsupervised and semi-supervised fault diagnostics.


2019 ◽  
Vol 17 (3) ◽  
pp. 355-368 ◽  
Author(s):  
Julija Pragarauskaitė ◽  
Gintautas Dzemyda

The analysis of the online customer shopping behavior is an important task nowadays, which allows maximizing the efficiency of advertising campaigns and increasing the return of investment for advertisers. The analysis results of online customer shopping behavior are usually reviewed and understood by a non-technical person; therefore the results must be displayed in the easiest possible way. The online shopping data is multidimensional and consists of both numerical and categorical data. In this paper, an approach has been proposed for the visual analysis of the online shopping data and their relevance. It integrates several multidimensional data visualization methods of different nature. The results of the visual analysis of numerical data are combined with the categorical data values. Based on the visualization results, the decisions on the advertising campaign could be taken in order to increase the return of investment and attract more customers to buy in the online e-shop.


Author(s):  
Parag Jain ◽  
Abhijit Mishra ◽  
Amar Prakash Azad ◽  
Karthik Sankaranarayanan

We propose a novel framework for controllable natural language transformation. Realizing that the requirement of parallel corpus is practically unsustainable for controllable generation tasks, an unsupervised training scheme is introduced. The crux of the framework is a deep neural encoder-decoder that is reinforced with text-transformation knowledge through auxiliary modules (called scorers). These scorers, based on off-the-shelf language processing tools, decide the learning scheme of the encoder-decoder based on its actions. We apply this framework for the text-transformation task of formalizing an input text by improving its readability grade; the degree of required formalization can be controlled by the user at run-time. Experiments on public datasets demonstrate the efficacy of our model towards: (a) transforming a given text to a more formal style, and (b) varying the amount of formalness in the output text based on the specified input control. Our code and datasets are released for academic use.


Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (in say Wal-Mart’s data warehouse (Westerman, 2000)) and astronomical data (for example SKICAT) in scientific research, with textual data providing a descriptive rather than a central analytic role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for ‘non-numeric’ data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model, and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


Sign in / Sign up

Export Citation Format

Share Document