Detection of Outliers through Influence Function on Affinity

P. Rajalakshmi; P. Geetha

doi:10.12723/mjs.11.2

Detection of Outliers through Influence Function on Affinity

Mapana Journal of Sciences ◽

10.12723/mjs.11.2 ◽

2007 ◽

Vol 6 (2) ◽

pp. 34-44

Author(s):

P. Rajalakshmi ◽

P. Geetha

Keyword(s):

Data Analysis ◽

Outlier Detection ◽

Influence Function ◽

Data Cleaning ◽

Pharmaceutical Research ◽

The Other ◽

Classification Problems ◽

Science Data ◽

Network Intrusion ◽

Detection Of Outliers

Outliers are the atypical observations that lie at abnormal distances from the other observations in a random sample. Such outliers are often seen as contaminating the data. In general, the rejection of influential outliers improves the accuracy of the estimators and so the results with the identification of outliers have become the most important aspect in any data analysis. Outlier detection finds many applications in the areas such as data cleaning, fraud detection, network intrusion, pharmaceutical research and exploration in science data buses. The distance based outlier detection is the most commonly used method. In this paper, the influence function for affinity is explained and the detection of outliers in classification problems using influence function for affinity is illustrated for univariate data through a few examples.

Download Full-text

PREDIKSI JUMLAH KEBERANGKATAN PENUMPANG PESAWAT TERBANG MENGGUNAKAN MODEL VARIASI KALENDER DAN DETEKSI OUTLIER (Studi Kasus di Bandara Soekarno-Hatta)

Jurnal Gaussian ◽

10.14710/j.gauss.v9i3.28914 ◽

2020 ◽

Vol 9 (3) ◽

pp. 336-345

Author(s):

Alvi Waldira ◽

Abdul Hoyyi ◽

Dwi Ispriyanti

Keyword(s):

Time Series ◽

Data Analysis ◽

Outlier Detection ◽

Air Transportation ◽

Time Variable ◽

Unexpected Events ◽

Transportation Services ◽

Dummy Variable ◽

Strategic Role ◽

Detection Of Outliers

Transportation has a strategic role, even becoming one of the main needs of the community, especially air transportation services. A large number of passengers in air transportation always experiences a difference every month. One of the differences occurred when approaching Eid al-Fitr, which changes every year based on an Islamic calendar that is different from Masehi calendar. The lunar shift in the occurrence of Eid al-Fitr forms a pattern called calendar variation. The effects of calendar variations can be overcome by using an additional variable, such as a dummy variable, this variable which will be used in the ARIMAX model. Observation of time series is often influenced by several unexpected events such as outliers. This outlier causes the results of data analysis to be less valid. So the researchers added the detection of outliers in this study. Based on the analysis results, the ARIMA calendar variation model is obtained (1.0, [12]), with time variable t, dummy variable , and the addition of one outlier. This model has a MAPE value of 0.07079609 which means this model is very good for forecasting. Forecasting results showed an increase in the number of passengers during the two months before Eid. Keywords: Passenger, calendar variation, outlier detection

Download Full-text

Microchemical imaging in biomedical research

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100130225 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1118-1119

Author(s):

P. Ingram

Keyword(s):

Data Analysis ◽

Electron Probe ◽

The Other ◽

X Ray ◽

Cell Element ◽

Physiological Information ◽

Functional States ◽

Elemental Maps ◽

Electron Energy Loss ◽

Secondary Ion

It is well established that unique physiological information can be obtained by rapidly freezing cells in various functional states and analyzing the cell element content and distribution by electron probe x-ray microanalysis. (The other techniques of microanalysis that are amenable to imaging, such as electron energy loss spectroscopy, secondary ion mass spectroscopy, particle induced x-ray emission etc., are not addressed in this tutorial.) However, the usual processes of data acquisition are labor intensive and lengthy, requiring that x-ray counts be collected from individually selected regions of each cell in question and that data analysis be performed subsequent to data collection. A judicious combination of quantitative elemental maps and static raster probes adds not only an additional overall perception of what is occurring during a particular biological manipulation or event, but substantially increases data productivity. Recent advances in microcomputer instrumentation and software have made readily feasible the acquisition and processing of digital quantitative x-ray maps of one to several cells.

Download Full-text

PENGARUH ABILITY, BENEVOLENCE DAN INTEGRITY TERHADAP TRUST, SERTA IMPLIKASINYA TERHADAP PARTISIPASI PELANGGAN E-COMMERCE: STUDI KASUS PADA PELANGGAN E-COMMERCE DI UBM

Jurnal Riset Manajemen dan Bisnis (JRMB) Fakultas Ekonomi UNIAT ◽

10.36226/jrmb.v2i2.46 ◽

2017 ◽

Vol 2 (2) ◽

pp. 155-168 ◽

Cited By ~ 1

Author(s):

David Wong

Keyword(s):

Data Analysis ◽

Data Collection ◽

Structural Equation Model ◽

Structural Equation ◽

Likert Scale ◽

Equation Model ◽

The Other ◽

Predictor Variables ◽

Analysis Method ◽

Other Hand

This research aims at analyzing (1) the effect of vendor’s ability, benevolence, and integrity variables toward e-commerce customers’ trust in UBM; (2) the effect of vendor’s ability, benevolence, and integrity variables toward the level of e-commerce customers’ participation in Indonesia; and (3) the effect of trust variable toward level of e-commerce customers participation in UBM. This research makes use of UBM e-commerce users as research samples while using Likert scale questionnaire for data collection. Furthermore, the questionnaires are sent to as many as 200 respondents. For data analysis method, Structural Equation Model was used. Out of three predictor variables (ability, benevolence, and integrity), it is only vendor’s integrity that has a positive and significant effect on customers’ trust. On the other hand, it is only vendor’s integrity and customer’s trust that have a positive and significant effect on e-commerce customers’ participation in UBM. Keywords: e-commerce customers’ participation, ability, benevolence, integrity

Download Full-text

Unemployment and COVID-19 Impact in Greece: A Vector Autoregression (VAR) Data Analysis

Engineering Proceedings ◽

10.3390/engproc2021005041 ◽

2021 ◽

Vol 5 (1) ◽

pp. 41

Author(s):

Christos Katris

Keyword(s):

Data Analysis ◽

Response Function ◽

Unemployment Rate ◽

Vector Autoregression ◽

Impulse Response Function ◽

Youth Unemployment ◽

The Other ◽

Var Model ◽

Unemployment Rates ◽

The Impact

In this paper, the scope is to study whether and how the COVID-19 situation affected the unemployment rate in Greece. To achieve this, a vector autoregression (VAR) model is employed and data analysis is carried out. Another interesting question is whether the situation affected more heavily female and the youth unemployment (under 25 years old) compared to the overall unemployment. To predict the future impact of COVID-19 on these variables, we used the Impulse Response function. Furthermore, there is taking place a comparison of the impact of the pandemic with the other European countries for overall, female, and youth unemployment rates. Finally, the forecasting ability of such a model is compared with ARIMA and ANN univariate models.

Download Full-text

Quantitative Spectral Data Analysis Using Extreme Learning Machines Algorithm Incorporated with PCA

Algorithms ◽

10.3390/a14010018 ◽

2021 ◽

Vol 14 (1) ◽

pp. 18

Author(s):

Michael Li ◽

Santoso Wibowo ◽

Wei Li ◽

Lily D. Li

Keyword(s):

Data Analysis ◽

Conceptual Framework ◽

Spectral Data ◽

Learning Algorithm ◽

Structural Parameters ◽

Inverse Function ◽

Principal Component ◽

Learning Task ◽

Classification Problems ◽

Spectral Data Analysis

Extreme learning machine (ELM) is a popular randomization-based learning algorithm that provides a fast solution for many regression and classification problems. In this article, we present a method based on ELM for solving the spectral data analysis problem, which essentially is a class of inverse problems. It requires determining the structural parameters of a physical sample from the given spectroscopic curves. We proposed that the unknown target inverse function is approximated by an ELM through adding a linear neuron to correct the localized effect aroused by Gaussian basis functions. Unlike the conventional methods involving intensive numerical computations, under the new conceptual framework, the task of performing spectral data analysis becomes a learning task from data. As spectral data are typical high-dimensional data, the dimensionality reduction technique of principal component analysis (PCA) is applied to reduce the dimension of the dataset to ensure convergence. The proposed conceptual framework is illustrated using a set of simulated Rutherford backscattering spectra. The results have shown the proposed method can achieve prediction inaccuracies of less than 1%, which outperform the predictions from the multi-layer perceptron and numerical-based techniques. The presented method could be implemented as application software for real-time spectral data analysis by integrating it into a spectroscopic data collection system.

Download Full-text

Toward a Collective Agenda on AI for Earth Science Data Analysis

IEEE Geoscience and Remote Sensing Magazine ◽

10.1109/mgrs.2020.3043504 ◽

2021 ◽

Vol 9 (2) ◽

pp. 88-104

Author(s):

Devis Tuia ◽

Ribana Roscher ◽

Jan Dirk Wegner ◽

Nathan Jacobs ◽

Xiaoxiang Zhu ◽

...

Keyword(s):

Data Analysis ◽

Earth Science ◽

Science Data ◽

Earth Science Data

Download Full-text

Introduction to Statistical Methods for Outlier Detection and Sample Homogeneity Assessment of Reference Materials and Proficiency Test Items

10.51843/wsproceedings.2021.15 ◽

2021 ◽

Author(s):

Yi-Ting Chen ◽

Keyword(s):

Data Analysis ◽

Outlier Detection ◽

Reference Materials ◽

Proficiency Test ◽

Test Items ◽

Homogeneity Assessment ◽

Iso 13528 ◽

Scope Of Application ◽

Iso 5725 ◽

Sample Homogeneity

Due to the homogeneity of the product or sample, it will affect whether it meets the scope of application and purpose. For example, the reference materials(RM) produced by the reference material producer(RMP), and the proficiency test items selected by the proficiency testing provider(PTP), in order to ensure the reference materials or proficiency test items have consistent characteristics or comparability, they should be proved to have certain homogeneity. However, before performing homogeneity assessment, it is necessary to measure the characteristic parameters of the reference materials or proficiency test items to obtain a sufficient number of measured values for data analysis, but there may be outliers in the measured values that may affect data analysis and interpretation of the results. Therefore, this article will refer to ASTM E178-16a:2016[1], ISO 5725-2:1994[2], ISO 13528:2015[3], etc., to introduce several outlier detection and homogeneity assessment methods, supplemented by case studies. Finally, this article will remind the precautions for the use of the method, so that readers can choose the appropriate method for use in the actual analysis.

Download Full-text

Gravity Probe B data analysis: II. Science data and their handling prior to the final analysis

Classical and Quantum Gravity ◽

10.1088/0264-9381/32/22/224019 ◽

2015 ◽

Vol 32 (22) ◽

pp. 224019 ◽

Cited By ~ 3

Author(s):

A S Silbergleit ◽

J W Conklin ◽

M I Heifetz ◽

T Holmes ◽

J Li ◽

...

Keyword(s):

Data Analysis ◽

Final Analysis ◽

Science Data ◽

Gravity Probe B ◽

Gravity Probe

Download Full-text

Detection of Anomalies in Water Networks by Functional Data Analysis

Mathematical Problems in Engineering ◽

10.1155/2018/5129735 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 8

Author(s):

Laura Millán-Roures ◽

Irene Epifanio ◽

Vicente Martínez

Keyword(s):

Data Analysis ◽

Outlier Detection ◽

Functional Data Analysis ◽

Functional Data ◽

Real Data ◽

Water Networks ◽

Archetypal Analysis ◽

Detection Techniques ◽

Second Stage ◽

Two Phases

A functional data analysis (FDA) based methodology for detecting anomalous flows in urban water networks is introduced. Primary hydraulic variables are recorded in real-time by telecontrol systems, so they are functional data (FD). In the first stage, the data are validated (false data are detected) and reconstructed, since there could be not only false data, but also missing and noisy data. FDA tools are used such as tolerance bands for FD and smoothing for dense and sparse FD. In the second stage, functional outlier detection tools are used in two phases. In Phase I, the data are cleared of anomalies to ensure that data are representative of the in-control system. The objective of Phase II is system monitoring. A new functional outlier detection method is also proposed based on archetypal analysis. The methodology is applied and illustrated with real data. A simulated study is also carried out to assess the performance of the outlier detection techniques, including our proposal. The results are very promising.

Download Full-text

Space Telescope Science Data Analysis

Data Analysis in Astronomy ◽

10.1007/978-1-4615-9433-8_17 ◽

1985 ◽

pp. 191-194 ◽

Cited By ~ 1

Author(s):

R. Albrecht

Keyword(s):

Data Analysis ◽

Science Data ◽

Space Telescope

Download Full-text