scholarly journals Classification of Heterogeneous Data Based on Data Type Impact on Similarity

Author(s):  
Najat Ali ◽  
Daniel Neagu ◽  
Paul Trundle
Keyword(s):  
2020 ◽  
Vol 10 (2) ◽  
pp. 125-136 ◽  
Author(s):  
Tomasz Gałkowski ◽  
Adam Krzyżak ◽  
Zbigniew Filutowicz

AbstractNowadays, unprecedented amounts of heterogeneous data collections are stored, processed and transmitted via the Internet. In data analysis one of the most important problems is to verify whether data observed or/and collected in time are genuine and stationary, i.e. the information sources did not change their characteristics. There is a variety of data types: texts, images, audio or video files or streams, metadata descriptions, thereby ordinary numbers. All of them changes in many ways. If the change happens the next question is what is the essence of this change and when and where the change has occurred. The main focus of this paper is detection of change and classification of its type. Many algorithms have been proposed to detect abnormalities and deviations in the data. In this paper we propose a new approach for abrupt changes detection based on the Parzen kernel estimation of the partial derivatives of the multivariate regression functions in presence of probabilistic noise. The proposed change detection algorithm is applied to oneand two-dimensional patterns to detect the abrupt changes.


Information ◽  
2018 ◽  
Vol 9 (11) ◽  
pp. 285 ◽  
Author(s):  
Richard Roberts ◽  
Robert Laramee

A rapidly increasing number of businesses rely on visualisation solutions for their data management challenges. This demand stems from an industry-wide shift towards data-driven approaches to decision making and problem-solving. However, there is an overwhelming mass of heterogeneous data collected as a result. The analysis of these data become a critical and challenging part of the business process. Employing visual analysis increases data comprehension thus enabling a wider range of users to interpret the underlying behaviour, as opposed to skilled but expensive data analysts. Widening the reach to an audience with a broader range of backgrounds creates new opportunities for decision making, problem-solving, trend identification, and creative thinking. In this survey, we identify trends in business visualisation and visual analytic literature where visualisation is used to address data challenges and identify areas in which industries use visual design to develop their understanding of the business environment. Our novel classification of literature includes the topics of businesses intelligence, business ecosystem, customer-centric. This survey provides a valuable overview and insight into the business visualisation literature with a novel classification that highlights both mature and less developed research directions.


2011 ◽  
pp. 294-313
Author(s):  
Jean-Philippe Vert

Support vector machines and kernel methods are increasingly popular in genomics and computational biology due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in a high dimension and process nonvectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter, we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future.


2020 ◽  
Author(s):  
Ahmed Hamed ◽  
Ahmed Sobhy ◽  
Hamed Nassar

Abstract The coronavirus 2019 disease (COVID-19) is wreaking havoc around the world, and great efforts are underway to control it. Millions of people are now being tested and their data keeps accumulating in large volumes. This data can be used to classify newly tested persons as whether they have the disease or not. However, normal classification techniques are hampered by the fact that the data is typically both incomplete and heterogeneous. To address this two-fold obstacle, we propose a KNN variant (KNNV) algorithm which accurately and efficiently classifies COVID-19. The main two ideas behind the proposed algorithm are that for each instance to be classified it chooses the parameter K adaptively and calculates the distances to other instances in a novel way. The KNNV was implemented and tested on a COVID-19 dataset from the Italian society of medical and intervention radiology society. It was also compared to three algorithms of its category. The test results show that the KNNV can efficiently and accurately classify COVID-19 patients. The comparison results show that the algorithm greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score.


Sign in / Sign up

Export Citation Format

Share Document