scholarly journals Index Based Hidden Outlier Detection in Metric Space

2016 ◽  
Vol 2016 ◽  
pp. 1-14
Author(s):  
Honglong Xu ◽  
Rui Mao ◽  
Hao Liao ◽  
He Zhang ◽  
Minhua Lu ◽  
...  

Useless and noise information occupies large amount of big data, which increases our difficulty to extract worthy information. Therefore outlier detection attracts much attention recently, but if two points are far from other points but are relatively close to each other, they are less likely to be detected as outliers because of their adjacency to each other. In this situation, outliers are hidden by each other. In this paper, we propose a new perspective of hidden outlier. Experimental results show that it is more accurate than existing distance-based definitions of outliers. Accordingly, we exploit a candidate set based hidden outlier detection (HOD) algorithm. HOD algorithm achieves higher accuracy with comparable running time. Further, we develop an index based HOD (iHOD) algorithm to get higher detection speed.

Author(s):  
Honglong Xu ◽  
Haiwu Rong ◽  
Rui Mao ◽  
Guoliang Chen ◽  
Zhiguang Shan

Big data is profoundly changing the lifestyles of people around the world in an unprecedented way. Driven by the requirements of applications across many industries, research on big data has been growing. Methods to manage and analyze big data to extract valuable information are the key of big data research. Starting from the variety challenge of big data, this dissertation proposes a universal big data management and analysis framework based on metric space. In this framework, the Hilbert Index-based Outlier Detection (HIOD) algorithm is proposed. HIOD can handle all datatypes that can be abstracted to metric space and achieve higher detection speed. Experimental results indicate that HIOD can effectively overcome the variety challenge of big data and achieves a 2.02 speed up over iORCA on average and, in certain cases, up to 5.57. The distance calculation times are reduced by 47.57% on average and up to 89.10%.


Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


2021 ◽  
Author(s):  
Frank Hillary ◽  
Sarah Rajtmajer

Abstract:This critical review discusses evidence for the replication crisis in the clinical neuroscience literature with focus on the size of the literature and how scientific hypotheses are framed and tested. We aim to reinvigorate discussions born from philosophy of science regarding falsification (see Popper, 1959;1962) but with hope to bring pragmatic application that might give real leverage to attempts to address scientific reproducibility. The surging publication rate has not translated to unparalleled scientific progress so the current “science-by-volume” approach requires new perspective for determining scientific ground truths. We describe an example from the network neurosciences in the study of traumatic brain injury where there has been little effort to refute two prominent hypotheses leading to a literature without resolution. Based upon this example, we discuss how building strong hypotheses and then designing efforts to falsify them can bring greater precision to the clinical neurosciences. With falsification as the goal, we can harness big data and computational power to identify the fitness of each theory to advance the neurosciences.


2021 ◽  
Vol 50 (1) ◽  
pp. 5-12
Author(s):  
Hani Alquhayz ◽  
Mahdi Jemmali

This paper focuses on the maximization of the minimum completion time on identical parallel processors. The objective of this maximization is to ensure fair distribution. Let a set of jobs to be assigned to several identical parallel processors. This problem is shown as NP-hard. The research work of this paper is based essentially on the comparison of the proposed heuristics with others cited in the literature review. Our heuristics are developed using essentially the randomization method and the iterative utilization of the knapsack problem to solve the studied problem. Heuristics are assessed by several instances represented in the experimental results. The results show that the knapsack based heuristic gives almost a similar performance than heuristic in a literature review but in better running time.  


Author(s):  
Xiongtao Zhang ◽  
Nan Xiang ◽  
Qihua Chen ◽  
Zhengyi Zhong ◽  
Hui Yan ◽  
...  

Author(s):  
Joaquín Pérez Ortega ◽  
Nelva Nely Almanza Ortega ◽  
Andrea Vega Villalobos ◽  
Marco A. Aguirre L. ◽  
Crispín Zavala Díaz ◽  
...  

In recent years, the amount of texts in natural language, in digital format, has had an impressive increase. To obtain useful information from a large volume of data, new specialized techniques and efficient algorithms are required. Text mining consists of extracting meaningful patterns from texts; one of the basic approaches is clustering. The most used clustering algorithm is k-means. This chapter proposes an improvement of the k-means algorithm in the convergence step; the process stops whenever the number of objects that change their assigned cluster in the current iteration is bigger than the ones that changed in the previous iteration. Experimental results showed a reduction in execution time up to 93%. It is remarkable that, in general, better results are obtained when the volume of the text increase, particularly in those texts within big data environments.


Sign in / Sign up

Export Citation Format

Share Document