Merging Data from Large and Small Telescopes – Good or Bad? And: How Useful is the Application of Statistical Weights to Time-Series Photometric Measurements?

AbstractI have investigated the value of the contribution of small telescopes to the success of a whole WET run. To this end, I have applied different data weighting schemes to two extreme WET test data sets. I find that weights proportional to the inverse local scatter in the light curves produce Fourier Transforms of best signal-to-noise. Weighting data stronger than their inverse scatter does not yield optimal results because of the reduction of the effective number of data points.The contribution of the small telescopes to the combined WET results was found to be very important. They do not only improve the spectral window, but they can reduce the noise in the total FT by more than their light gathering power would imply. Some suggestions for the optimal use of small telescopes in the WET are given.

Download Full-text

A posteriori noise estimation in variable data sets

Astronomy and Astrophysics ◽

10.1051/0004-6361/201730618 ◽

2018 ◽

Vol 609 ◽

pp. A39 ◽

Cited By ~ 7

Author(s):

S. Czesla ◽

T. Molle ◽

J. H. M. M. Schmitt

Keyword(s):

Standard Deviation ◽

Synthetic Data ◽

Light Curves ◽

Weighted Sums ◽

Data Sets ◽

A Posteriori ◽

Sampled Data ◽

Data Set ◽

Specific Parameter ◽

Data Points

Most physical data sets contain a stochastic contribution produced by measurement noise or other random sources along with the signal. Usually, neither the signal nor the noise are accurately known prior to the measurement so that both have to be estimated a posteriori. We have studied a procedure to estimate the standard deviation of the stochastic contribution assuming normality and independence, requiring a sufficiently well-sampled data set to yield reliable results. This procedure is based on estimating the standard deviation in a sample of weighted sums of arbitrarily sampled data points and is identical to the so-called DER_SNR algorithm for specific parameter settings. To demonstrate the applicability of our procedure, we present applications to synthetic data, high-resolution spectra, and a large sample of space-based light curves and, finally, give guidelines to apply the procedure in situation not explicitly considered here to promote its adoption in data analysis.

Download Full-text

ON GENERATING DIGITAL ELEVATION MODELS FROM LIDAR DATA – RESOLUTION VERSUS ACCURACY AND TOPOGRAPHIC WETNESS INDEX INDICES IN NORTHERN PEATLANDS

Geodesy and Cartography ◽

10.3846/20296991.2012.702983 ◽

2012 ◽

Vol 38 (2) ◽

pp. 57-69 ◽

Cited By ~ 12

Author(s):

Abdulghani Hasan ◽

Petter Pilesjö ◽

Andreas Persson

Keyword(s):

Large Scale ◽

Drainage Area ◽

Data Sets ◽

Topographic Wetness Index ◽

Absolute Deviation ◽

Digital Elevation ◽

Elevation Data ◽

Scale Modelling ◽

Data Points ◽

Emission Modelling

Global change and GHG emission modelling are dependent on accurate wetness estimations for predictions of e.g. methane emissions. This study aims to quantify how the slope, drainage area and the TWI vary with the resolution of DEMs for a flat peatland area. Six DEMs with spatial resolutions from 0.5 to 90 m were interpolated with four different search radiuses. The relationship between accuracy of the DEM and the slope was tested. The LiDAR elevation data was divided into two data sets. The number of data points facilitated an evaluation dataset with data points not more than 10 mm away from the cell centre points in the interpolation dataset. The DEM was evaluated using a quantile-quantile test and the normalized median absolute deviation. It showed independence of the resolution when using the same search radius. The accuracy of the estimated elevation for different slopes was tested using the 0.5 meter DEM and it showed a higher deviation from evaluation data for steep areas. The slope estimations between resolutions showed differences with values that exceeded 50%. Drainage areas were tested for three resolutions, with coinciding evaluation points. The model ability to generate drainage area at each resolution was tested by pair wise comparison of three data subsets and showed differences of more than 50% in 25% of the evaluated points. The results show that consideration of DEM resolution is a necessity for the use of slope, drainage area and TWI data in large scale modelling.

Download Full-text

Inter- and Intralaboratory Comparison of JC Polyomavirus Antibody Testing Using Two Different Virus-Like Particle-Based Assays

Clinical and Vaccine Immunology ◽

10.1128/cvi.00489-14 ◽

2014 ◽

Vol 21 (11) ◽

pp. 1581-1588 ◽

Cited By ~ 11

Author(s):

Piotr Kardas ◽

Mohammadreza Sadeghi ◽

Fabian H. Weissbach ◽

Tingting Chen ◽

Lea Hedman ◽

...

Keyword(s):

Risk Stratification ◽

Data Sets ◽

Antibody Testing ◽

Jc Polyomavirus ◽

Virus Like Particle ◽

Interlaboratory Variability ◽

Increased Risk ◽

Basic Protocol ◽

Data Points ◽

Reference Serum

ABSTRACTJC polyomavirus (JCPyV) can cause progressive multifocal leukoencephalopathy (PML), a debilitating, often fatal brain disease in immunocompromised patients. JCPyV-seropositive multiple sclerosis (MS) patients treated with natalizumab have a 2- to 10-fold increased risk of developing PML. Therefore, JCPyV serology has been recommended for PML risk stratification. However, different antibody tests may not be equivalent. To study intra- and interlaboratory variability, sera from 398 healthy blood donors were compared in 4 independent enzyme-linked immunoassay (ELISA) measurements generating >1,592 data points. Three data sets (Basel1, Basel2, and Basel3) used the same basic protocol but different JCPyV virus-like particle (VLP) preparations and introduced normalization to a reference serum. The data sets were also compared with an independent method using biotinylated VLPs (Helsinki1). VLP preadsorption reducing ≥35% activity was used to identify seropositive sera. The results indicated that Basel1, Basel2, Basel3, and Helsinki1 were similar regarding overall data distribution (P= 0.79) and seroprevalence (58.0, 54.5, 54.8, and 53.5%, respectively;P= 0.95). However, intra-assay intralaboratory comparison yielded 3.7% to 12% discordant results, most of which were close to the cutoff (0.080 < optical density [OD] < 0.250) according to Bland-Altman analysis. Introduction of normalization improved overall performance and reduced discordance. The interlaboratory interassay comparison between Basel3 and Helsinki1 revealed only 15 discordant results, 14 (93%) of which were close to the cutoff. Preadsorption identified specificities of 99.44% and 97.78% and sensitivities of 99.54% and 95.87% for Basel3 and Helsinki1, respectively. Thus, normalization to a preferably WHO-approved reference serum, duplicate testing, and preadsorption for samples around the cutoff may be necessary for reliable JCPyV serology and PML risk stratification.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

High depth-of-field imaging without sacrificing light-gathering power and resolution

10.1117/12.285587 ◽

1997 ◽

Author(s):

Alan R. FitzGerrell ◽

Edward R. Dowski, Jr.

Keyword(s):

Depth Of Field ◽

Light Gathering Power ◽

Light Gathering ◽

Field Imaging ◽

High Depth

Download Full-text

Mahalanobis distance informed by clustering

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay011 ◽

2018 ◽

Vol 8 (2) ◽

pp. 377-406

Author(s):

Almog Lahav ◽

Ronen Talmon ◽

Yuval Kluger

Keyword(s):

Mahalanobis Distance ◽

High Dimensional Data ◽

Hidden Variables ◽

Real Data ◽

Risk Groups ◽

High Dimensional ◽

Data Sets ◽

Kaplan Meier ◽

Data Points ◽

Survival Plot

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.

Download Full-text

Analysis of erroneous data entries in paper based and electronic data collection

10.21203/rs.2.11983/v4 ◽

2019 ◽

Author(s):

Benedikt Ley ◽

Komal Raj Rijal ◽

Jutta Marfurt ◽

Nabaraj Adhikari ◽

Megha Banjara ◽

...

Keyword(s):

Data Collection ◽

Data Entry ◽

Categorical Variables ◽

Data Sets ◽

Continuous Variables ◽

Electronic Data ◽

Suitable Alternative ◽

Electronic Data Collection ◽

Data Points ◽

Time Variables

Abstract Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3,580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1,074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1,370/12,530). Overall 64% (1,499/2,352) of all discrepancies were due to data omissions, 76.6% (1,148/1,499) of missing entries were among categorical data. Omissions in PBDC (n=1002) were twice as frequent as in EDC (n=497, p<0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.

Download Full-text

VOLUME BASED DTM GENERATION FROM VERY HIGH RESOLUTION PHOTOGRAMMETRIC DSMS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b3-83-2016 ◽

2016 ◽

Vol XLI-B3 ◽

pp. 83-90 ◽

Cited By ~ 5

Author(s):

B. Piltz ◽

S. Bayer ◽

A. M. Poznanska

Keyword(s):

Ground Surface ◽

Processing Parameters ◽

Data Sets ◽

Maximum Width ◽

Directional Filtering ◽

Very High Spatial Resolution ◽

Efficient Scheme ◽

Data Points ◽

Minimum Height ◽

Very High

In this paper we propose a new algorithm for digital terrain (DTM) model reconstruction from very high spatial resolution digital surface models (DSMs). It represents a combination of multi-directional filtering with a new metric which we call normalized volume above ground to create an above-ground mask containing buildings and elevated vegetation. This mask can be used to interpolate a ground-only DTM. The presented algorithm works fully automatically, requiring only the processing parameters minimum height and maximum width in metric units. Since slope and breaklines are not decisive criteria, low and smooth and even very extensive flat objects are recognized and masked. The algorithm was developed with the goal to generate the normalized DSM for automatic 3D building reconstruction and works reliably also in environments with distinct hillsides or terrace-shaped terrain where conventional methods would fail. A quantitative comparison with the ISPRS data sets Potsdam and Vaihingen show that 98-99% of all building data points are identified and can be removed, while enough ground data points (~66%) are kept to be able to reconstruct the ground surface. Additionally, we discuss the concept of size dependent height thresholds and present an efficient scheme for pyramidal processing of data sets reducing time complexity to linear to the number of pixels, O(WH).

Download Full-text

Grid preparation for magnetic and gravity data using fractal fields

Nonlinear Processes in Geophysics ◽

10.5194/npg-19-291-2012 ◽

2012 ◽

Vol 19 (2) ◽

pp. 291-296 ◽

Cited By ~ 4

Author(s):

M. Pilkington ◽

P. Keating

Keyword(s):

Potential Field ◽

Fourier Transforms ◽

Gravity Data ◽

Fractal Model ◽

Data Sets ◽

Spectral Character ◽

Aeromagnetic Survey ◽

Fractal Method ◽

Gravity Measurements ◽

Interpretive Method

Abstract. Most interpretive methods for potential field (magnetic and gravity) measurements require data in a gridded format. Many are also based on using fast Fourier transforms to improve their computational efficiency. As such, grids need to be full (no undefined values), rectangular and periodic. Since potential field surveys do not usually provide data sets in this form, grids must first be prepared to satisfy these three requirements before any interpretive method can be used. Here, we use a method for grid preparation based on a fractal model for predicting field values where necessary. Using fractal field values ensures that the statistical and spectral character of the measured data is preserved, and that unwanted discontinuities at survey boundaries are minimized. The fractal method compares well with standard extrapolation methods using gridding and maximum entropy filtering. The procedure is demonstrated on a portion of a recently flown aeromagnetic survey over a volcanic terrane in southern British Columbia, Canada.

Download Full-text

Summary of Affinity Propagation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.811 ◽

2011 ◽

Vol 268-270 ◽

pp. 811-816

Author(s):

Yong Zhou ◽

Yan Xing

Keyword(s):

Clustering Algorithm ◽

Large Data ◽

Large Data Sets ◽

Affinity Propagation ◽

Damping Factor ◽

Data Sets ◽

Similarity Matrix ◽

Data Points

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.

Download Full-text