Development of Data Sets for the Validation of Analytical Instrumentation

Abstract Analytical chemistry makes use of a wide range of basic statistical operations, including means; standard deviations; significance tests based on assumed distributions; and linear, polynomial, and multivariate regression. The effects of limited numerical precision, poor choice of algorithm, and extreme dynamic range on these common statistical operations are discussed. The effects of incorrect choice of algorithm on calculations of basic statistical parameters and calibration lines are illustrated by examples. Some approaches to validation of such software are considered. The preparation of reference data sets for testing statistical software is discussed. The use of ‘null space’ methods for producing reference data sets is described, and an example is given. These data sets have well-characterized properties and can be used to test the accuracy of basic statistical procedures. Specific properties that are controlled include the numerical precision required to represent the sets exactly and the analytically correct answers. A further property of some of the data sets under development is the predictability of the deviation from the expected results resulting from poor choice of algorithm.

Download Full-text

Vibration-Based Damage Detection under Changing Environmental and Operational Conditions

Advances in Science and Technology ◽

10.4028/www.scientific.net/ast.83.95 ◽

2012 ◽

Vol 83 ◽

pp. 95-104 ◽

Cited By ~ 2

Author(s):

Claus Peter Fritzen ◽

Peter Kraemer ◽

Inka Buethe

Keyword(s):

Damage Detection ◽

Reference Data ◽

Null Space ◽

Laboratory Model ◽

Data Sets ◽

Structural Vibrations ◽

Dynamic Changes ◽

Operational Conditions ◽

Order Of Magnitude ◽

Using Data

Structural Health Monitoring (SHM) allows to perform a diagnosis on demand which assists the operator to plan his future maintenance or repair activities. Using structural vibrations to extract damage sensitive features, problems can arise due to variations of the dynamical properties with changing environmental and operational conditions (EOC). The dynamic changes due to changing EOCs like variations in temperature, rotational speed, wind speed, etc. may be of the same order of magnitude as the variations due to damage making a reliable damage detection impossible. In this paper, we show a method for the compensation of changing EOC. The well-known null space based fault detection (NSFD) is used for damage detection. In the first stage, a training is performed using data from the undamaged structure under varying EOC. For the compensation of the EOC-e ects the undamaged state is modeled by different reference data corresponding to different representative EOC conditions. Finally, in the application, the influences of one or other EOC on each incoming data is weighted separately by means of a fuzzy-classiffcation algorithm. The theory and algorithm is successfully tested with data sets from a real wind turbine and with data from a laboratory model.

Download Full-text

Multiple data parameter identification for nonlinear conceptual models

Water Science & Technology ◽

10.2166/wst.1997.0165 ◽

1997 ◽

Vol 36 (5) ◽

pp. 61-68 ◽

Cited By ~ 1

Author(s):

Hermann Eberl ◽

Amar Khelil ◽

Peter Wilderer

Keyword(s):

Optimization Problem ◽

Conceptual Models ◽

Reference Data ◽

Transport Model ◽

Calibration Method ◽

Data Sets ◽

Multiple Data ◽

Marquardt Algorithm ◽

Multicriteria Optimization Problem ◽

Higher Order Differential Equations

A numerical method for the identification of parameters of nonlinear higher order differential equations is presented, which is based on the Levenberg-Marquardt algorithm. The estimation of the parameters can be performed by using several reference data sets simultaneously. This leads to a multicriteria optimization problem, which will be treated by using the Pareto optimality concept. In this paper, the emphasis is put on the presentation of the calibration method. As an example identification of the parameters of a nonlinear hydrological transport model for urban runoff is included, but the method can be applied to other problems as well.

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text

Detectivity optimization to detect of ultraweak light fluxes with an EM-CCD as binary photon counter array

Scientific Reports ◽

10.1038/s41598-021-82611-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ibtissame Khaoua ◽

Guillaume Graciani ◽

Andrey Kim ◽

François Amblard

Keyword(s):

Dynamic Range ◽

Signal To Noise Ratio ◽

Photon Counting ◽

Detection Methods ◽

Strong Nonlinearity ◽

Wide Range ◽

Depth Analysis ◽

Spatially Extended ◽

Extended Sources ◽

Photon Counting Mode

AbstractFor a wide range of purposes, one faces the challenge to detect light from extremely faint and spatially extended sources. In such cases, detector noises dominate over the photon noise of the source, and quantum detectors in photon counting mode are generally the best option. Here, we combine a statistical model with an in-depth analysis of detector noises and calibration experiments, and we show that visible light can be detected with an electron-multiplying charge-coupled devices (EM-CCD) with a signal-to-noise ratio (SNR) of 3 for fluxes less than $$30\,{\text{photon}}\,{\text{s}}^{ - 1} \,{\text{cm}}^{ - 2}$$ 30 photon s - 1 cm - 2 . For green photons, this corresponds to 12 aW $${\text{cm}}^{ - 2}$$ cm - 2 ≈ $$9{ } \times 10^{ - 11}$$ 9 × 10 - 11 lux, i.e. 15 orders of magnitude less than typical daylight. The strong nonlinearity of the SNR with the sampling time leads to a dynamic range of detection of 4 orders of magnitude. To detect possibly varying light fluxes, we operate in conditions of maximal detectivity $${\mathcal{D}}$$ D rather than maximal SNR. Given the quantum efficiency $$QE\left( \lambda \right)$$ Q E λ of the detector, we find $${ \mathcal{D}} = 0.015\,{\text{photon}}^{ - 1} \,{\text{s}}^{1/2} \,{\text{cm}}$$ D = 0.015 photon - 1 s 1 / 2 cm , and a non-negligible sensitivity to blackbody radiation for T > 50 °C. This work should help design highly sensitive luminescence detection methods and develop experiments to explore dynamic phenomena involving ultra-weak luminescence in biology, chemistry, and material sciences.

Download Full-text

Bulk Relativistic Motion in a Complete Sample of Radio Selected AGN

Symposium - International Astronomical Union ◽

10.1017/s007418090015524x ◽

1987 ◽

Vol 121 ◽

pp. 287-293

Author(s):

C.J. Schalinski ◽

P. Biermann ◽

A. Eckart ◽

K.J. Johnston ◽

T.Ph. Krichbaum ◽

...

Keyword(s):

Dynamic Range ◽

Radio Sources ◽

Relativistic Motion ◽

Complete Sample ◽

Superluminal Motion ◽

Spectrum Radio ◽

Wide Range ◽

Show Evidence

A complete sample of 13 flat spectrum radio sources is investigated over a wide range of frequencies and spatial resolutions. SSC-calculations lead to the prediction of bulk relativistic motion in all sources. So far 6 out of 7 sources observed with sufficient dynamic range by means of VLBI show evidence for apparent superluminal motion.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text

Optical whispering-gallery mode barcodes for high-precision and wide-range temperature measurements

Light Science & Applications ◽

10.1038/s41377-021-00472-2 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Jie Liao ◽

Lan Yang

Keyword(s):

High Precision ◽

Dynamic Range ◽

Single Mode ◽

Whispering Gallery Mode ◽

Temperature Measurements ◽

Laser Source ◽

Multiple Modes ◽

Whispering Gallery ◽

Wide Range ◽

Thermal Sensing

AbstractTemperature is one of the most fundamental physical properties to characterize various physical, chemical, and biological processes. Even a slight change in temperature could have an impact on the status or dynamics of a system. Thus, there is a great need for high-precision and large-dynamic-range temperature measurements. Conventional temperature sensors encounter difficulties in high-precision thermal sensing on the submicron scale. Recently, optical whispering-gallery mode (WGM) sensors have shown promise for many sensing applications, such as thermal sensing, magnetic detection, and biosensing. However, despite their superior sensitivity, the conventional sensing method for WGM resonators relies on tracking the changes in a single mode, which limits the dynamic range constrained by the laser source that has to be fine-tuned in a timely manner to follow the selected mode during the measurement. Moreover, we cannot derive the actual temperature from the spectrum directly but rather derive a relative temperature change. Here, we demonstrate an optical WGM barcode technique involving simultaneous monitoring of the patterns of multiple modes that can provide a direct temperature readout from the spectrum. The measurement relies on the patterns of multiple modes in the WGM spectrum instead of the changes of a particular mode. It can provide us with more information than the single-mode spectrum, such as the precise measurement of actual temperatures. Leveraging the high sensitivity of WGMs and eliminating the need to monitor particular modes, this work lays the foundation for developing a high-performance temperature sensor with not only superior sensitivity but also a broad dynamic range.

Download Full-text

Critical Aspects of Person Counting and Density Estimation

Journal of Imaging ◽

10.3390/jimaging7020021 ◽

2021 ◽

Vol 7 (2) ◽

pp. 21

Author(s):

Roland Perko ◽

Manfred Klopschitz ◽

Alexander Almer ◽

Peter M. Roth

Keyword(s):

Density Estimation ◽

Network Architecture ◽

Reference Data ◽

State Of The Art ◽

Limit State ◽

Ground Truth ◽

Data Sets ◽

Ground Truth Generation ◽

Baseline Approach ◽

Critical Aspects

Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main goal of this paper was to identify the critical aspects of these tasks and to show how these limit state-of-the-art approaches. Based on these findings, we show how to mitigate these limitations. To this end, we implemented a CNN-based baseline approach, which we extended to deal with identified problems. These include the discovery of bias in the reference data sets, ambiguity in ground truth generation, and mismatching of evaluation metrics w.r.t. the training loss function. The experimental results show that our modifications allow for significantly outperforming the baseline in terms of the accuracy of person counts and density estimation. In this way, we get a deeper understanding of CNN-based person density estimation beyond the network architecture. Furthermore, our insights would allow to advance the field of person density estimation in general by highlighting current limitations in the evaluation protocols.

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

An Interoperability Platform Enabling Reuse of Electronic Health Records for Signal Verification Studies

BioMed Research International ◽

10.1155/2016/6741418 ◽

2016 ◽

Vol 2016 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Mustafa Yuksel ◽

Suat Gonul ◽

Gokce Banu Laleci Erturkmen ◽

Ali Anil Sinaci ◽

Paolo Invernizzi ◽

...

Keyword(s):

Real Life ◽

Case Series ◽

Background Information ◽

Data Sets ◽

Local Data ◽

Lombardy Region ◽

Common Information ◽

Spontaneous Reports ◽

Wide Range ◽

Common Information Model

Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information.

Download Full-text