An Investigation of the Convergence of Average Peak Accelerations for High-Speed Planing Craft

Mapping Intimacies ◽

10.5957/smc-2021-063 ◽

2021 ◽

Author(s):

Michael R. Riley ◽

Heidi P. Murphy ◽

Brock W. Aron

Keyword(s):

High Speed ◽

Cumulative Distribution ◽

Data Sets ◽

Peak Acceleration ◽

Multiple Sources ◽

Data Set ◽

Distribution Shape ◽

Acceleration Data ◽

Rough Water ◽

The Stability

This paper summarizes the results of an investigation of the convergence of average peak accelerations as more and more peaks are recorded during rough-water trials of small high-speed craft. Existing guidance from multiple sources suggest that more peaks is better, but how much more, and what engineering rationale should substantiate the answer? To address the question, simplified equations and numerous examples of peak acceleration data sets are presented. The results demonstrate that convergence of the average of the highest 10 percent of peaks (A1/10), and the average of the highest 1 percent of peaks (A1/100), and the ratio means that the shape of the cumulative distribution of the data set becomes more stable as the number of peak acceleration data points increases. A simple percent difference criterion is presented for quantifying the stability of the cumulative distribution shape.

Download Full-text

Do galactic bars depend on environment?: an information theoretic analysis of Galaxy Zoo 2

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3665 ◽

2020 ◽

Vol 501 (1) ◽

pp. 994-1001

Author(s):

Suman Sarkar ◽

Biswajit Pandey ◽

Snehasish Bhattacharjee

Keyword(s):

Spatial Distribution ◽

Mutual Information ◽

Local Density ◽

Statistical Significance ◽

Distribution Functions ◽

Cumulative Distribution ◽

Host Galaxy ◽

Data Sets ◽

Data Set ◽

Information Theoretic

ABSTRACT We use an information theoretic framework to analyse data from the Galaxy Zoo 2 project and study if there are any statistically significant correlations between the presence of bars in spiral galaxies and their environment. We measure the mutual information between the barredness of galaxies and their environments in a volume limited sample (Mr ≤ −21) and compare it with the same in data sets where (i) the bar/unbar classifications are randomized and (ii) the spatial distribution of galaxies are shuffled on different length scales. We assess the statistical significance of the differences in the mutual information using a t-test and find that both randomization of morphological classifications and shuffling of spatial distribution do not alter the mutual information in a statistically significant way. The non-zero mutual information between the barredness and environment arises due to the finite and discrete nature of the data set that can be entirely explained by mock Poisson distributions. We also separately compare the cumulative distribution functions of the barred and unbarred galaxies as a function of their local density. Using a Kolmogorov–Smirnov test, we find that the null hypothesis cannot be rejected even at $75{{\ \rm per\ cent}}$ confidence level. Our analysis indicates that environments do not play a significant role in the formation of a bar, which is largely determined by the internal processes of the host galaxy.

Download Full-text

Sparse reflectivity inversion for nonstationary seismic data with surface-related multiples: Numerical and field-data experiments

Geophysics ◽

10.1190/geo2016-0520.1 ◽

2017 ◽

Vol 82 (3) ◽

pp. R199-R217 ◽

Cited By ~ 3

Author(s):

Xintao Chai ◽

Shangxu Wang ◽

Genyang Tang

Keyword(s):

Seismic Data ◽

Resolution Enhancement ◽

Synthetic Data ◽

Data Sets ◽

Data Set ◽

Anelastic Attenuation ◽

Seismic Resolution ◽

Text Filtering ◽

The Stability ◽

Reflectivity Inversion

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.

Download Full-text

Quantifying instrument errors in macromolecular X-ray data sets

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444910014836 ◽

2010 ◽

Vol 66 (6) ◽

pp. 733-740 ◽

Cited By ~ 56

Author(s):

Kay Diederichs

Keyword(s):

Data Reduction ◽

Spindle Speed ◽

Experimental Setup ◽

Data Sets ◽

Signal To Noise ◽

Instrument Error ◽

Acta Cryst ◽

Data Set ◽

X Ray ◽

The Stability

An indicator which is calculated after the data reduction of a test data set may be used to estimate the (systematic) instrument error at a macromolecular X-ray source. The numerical value of the indicator is the highest signal-to-noise [I/σ(I)] value that the experimental setup can produce and its reciprocal is related to the lower limit of the mergingRfactor. In the context of this study, the stability of the experimental setup is influenced and characterized by the properties of the X-ray beam, shutter, goniometer, cryostream and detector, and also by the exposure time and spindle speed. Typical values of the indicator are given for data sets from the JCSG archive. Some sources of error are explored with the help of test calculations usingSIM_MX[Diederichs (2009),Acta Cryst.D65, 535–542]. One conclusion is that the accuracy of data at low resolution is usually limited by the experimental setup rather than by the crystal. It is also shown that the influence of vibrations and fluctuations may be mitigated by a reduction in spindle speed accompanied by stronger attenuation.

Download Full-text

Doppler radar rainfall prediction and gauge data

BMC Research Notes ◽

10.1186/s13104-020-05311-y ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Jesse W. Lansford ◽

Tyson H. Walsh ◽

T. V. Hromadka ◽

P. Rao

Keyword(s):

Doppler Radar ◽

Weather Forecasting ◽

Rain Gauge ◽

Data Sets ◽

Multiple Sources ◽

Data Set ◽

Radar Rainfall ◽

Rainfall Prediction ◽

Doppler Data ◽

Precipitation Estimates

Abstract Objective The data herein represents multiple gauge sets and multiple radar sites of like-type Doppler data sets combined to produce populations of ordered pairs. Publications spanning decades yet specific to Doppler radar sites contain graphs of data pairs of Doppler radar precipitation estimates versus rain gauge precipitation readings. Data description Taken from multiple sources, the data set represents several radar sites and rain gauge sites combined for 8830 data points. The data is relevant in various applications of hydrometeorology and engineering as well as weather forecasting. Further, the importance of accuracy in radar and precipitation estimates continues to increase, necessitating the incorporation of as much data as possible.

Download Full-text

How long do satellites need to overlap? Evaluation of climate data stability from overlapping satellite records

Atmospheric Chemistry and Physics ◽

10.5194/acp-17-15069-2017 ◽

2017 ◽

Vol 17 (24) ◽

pp. 15069-15093 ◽

Cited By ~ 6

Author(s):

Elizabeth C. Weatherhead ◽

Jerald Harder ◽

Eduardo A. Araujo-Pradere ◽

Greg Bodeker ◽

Jason M. English ◽

...

Keyword(s):

Solar Radiation ◽

Spectral Irradiance ◽

Data Sets ◽

Climate Data ◽

Data Set ◽

Long Term Stability ◽

Earth Observations ◽

Earth’S Climate ◽

The Stability

Abstract. Sensors on satellites provide unprecedented understanding of the Earth's climate system by measuring incoming solar radiation, as well as both passive and active observations of the entire Earth with outstanding spatial and temporal coverage. A common challenge with satellite observations is to quantify their ability to provide well-calibrated, long-term, stable records of the parameters they measure. Ground-based intercomparisons offer some insight, while reference observations and internal calibrations give further assistance for understanding long-term stability. A valuable tool for evaluating and developing long-term records from satellites is the examination of data from overlapping satellite missions. This paper addresses how the length of overlap affects the ability to identify an offset or a drift in the overlap of data between two sensors. Ozone and temperature data sets are used as examples showing that overlap data can differ by latitude and can change over time. New results are presented for the general case of sensor overlap by using Solar Radiation and Climate Experiment (SORCE) Spectral Irradiance Monitor (SIM) and Solar Stellar Irradiance Comparison Experiment (SOLSTICE) solar irradiance data as an example. To achieve a 1 % uncertainty in estimating the offset for these two instruments' measurement of the Mg II core (280 nm) requires approximately 5 months of overlap. For relative drift to be identified within 0.1 % yr−1 uncertainty (0.00008 W m−2 nm−1 yr−1), the overlap for these two satellites would need to be 2.5 years. Additional overlap of satellite measurements is needed if, as is the case for solar monitoring, unexpected jumps occur adding uncertainty to both offsets and drifts; the additional length of time needed to account for a single jump in the overlap data may be as large as 50 % of the original overlap period in order to achieve the same desired confidence in the stability of the merged data set. Results presented here are directly applicable to satellite Earth observations. Approaches for Earth observations offer additional challenges due to the complexity of the observations, but Earth observations may also benefit from ancillary observations taken from ground-based and in situ sources. Difficult choices need to be made when monitoring approaches are considered; we outline some attempts at optimizing networks based on economic principles. The careful evaluation of monitoring overlap is important to the appropriate application of observational resources and to the usefulness of current and future observations.

Download Full-text

An Extensive Meta-Metagenomic Search Identifies SARS-CoV-2-Homologous Sequences in Pangolin Lung Viromes

mSphere ◽

10.1128/msphere.00160-20 ◽

2020 ◽

Vol 5 (3) ◽

Cited By ~ 9

Author(s):

Lamia Wahba ◽

Nimit Jain ◽

Andrew Z. Fire ◽

Massa J. Shoura ◽

Karen L. Artiles ◽

...

Keyword(s):

Nucleic Acid ◽

High Speed ◽

High Throughput Sequencing ◽

Biological Significance ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Link Type ◽

Recent Emergence

ABSTRACT In numerous instances, tracking the biological significance of a nucleic acid sequence can be augmented through the identification of environmental niches in which the sequence of interest is present. Many metagenomic data sets are now available, with deep sequencing of samples from diverse biological niches. While any individual metagenomic data set can be readily queried using web-based tools, meta-searches through all such data sets are less accessible. In this brief communication, we demonstrate such a meta-metagenomic approach, examining close matches to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in all high-throughput sequencing data sets in the NCBI Sequence Read Archive accessible with the “virome” keyword. In addition to the homology to bat coronaviruses observed in descriptions of the SARS-CoV-2 sequence (F. Wu, S. Zhao, B. Yu, Y. M. Chen, et al., Nature 579:265–269, 2020, https://doi.org/10.1038/s41586-020-2008-3; P. Zhou, X. L. Yang, X. G. Wang, B. Hu, et al., Nature 579:270–273, 2020, https://doi.org/10.1038/s41586-020-2012-7), we note a strong homology to numerous sequence reads in metavirome data sets generated from the lungs of deceased pangolins reported by Liu et al. (P. Liu, W. Chen, and J. P. Chen, Viruses 11:979, 2019, https://doi.org/10.3390/v11110979). While analysis of these reads indicates the presence of a similar viral sequence in pangolin lung, the similarity is not sufficient to either confirm or rule out a role for pangolins as an intermediate host in the recent emergence of SARS-CoV-2. In addition to the implications for SARS-CoV-2 emergence, this study illustrates the utility and limitations of meta-metagenomic search tools in effective and rapid characterization of potentially significant nucleic acid sequences. IMPORTANCE Meta-metagenomic searches allow for high-speed, low-cost identification of potentially significant biological niches for sequences of interest.

Download Full-text

Assessment of Odin-OSIRIS ozone measurements from 2001 to the present using MLS, GOMOS, and ozone sondes

Atmospheric Measurement Techniques Discussions ◽

10.5194/amtd-6-3819-2013 ◽

2013 ◽

Vol 6 (2) ◽

pp. 3819-3857 ◽

Cited By ~ 7

Author(s):

C. Adams ◽

A. E. Bourassa ◽

V. Sofieva ◽

L. Froidevaux ◽

C. A. McLinden ◽

...

Keyword(s):

Infrared Imaging ◽

Imaging System ◽

Data Sets ◽

Validation Data ◽

Data Set ◽

Long Term Stability ◽

Ozone Data ◽

High Bias ◽

The Stability

Abstract. The Optical Spectrograph and InfraRed Imaging System (OSIRIS) was launched aboard the Odin satellite in 2001 and is continuing to take limb-scattered sunlight measurements of the atmosphere. This work aims to characterize and assess the stability of the OSIRIS 11 yr v5.0x ozone data set. Three validation data sets were used: the v2.2 Microwave Limb Sounder (MLS) and v6 Global Ozone Monitoring of Occultation on Stars (GOMOS) satellite data records, and ozone sonde measurements. Global mean percent differences between coincident OSIRIS and validation measurements are within 5% of zero at all altitude layers above 18.5 km for MLS, above 21.5 km for GOMOS, and above 17.5 km for ozone sondes. Below 17.5 km, OSIRIS measurements agree with ozone sondes within 5% and are well-correlated (R > 0.75) with them. For low OSIRIS optics temperatures (< 16 °C), OSIRIS ozone measurements are biased low by up 6% compared with the validation data sets for 25.5–40.5 km. Biases between OSIRIS ascending and descending node measurements were investigated and were found to be related to aerosol retrievals below 27.5 km. Above 30 km, agreement between OSIRIS and the validation data sets was related to the OSIRIS retrieved albedo, which measures apparent upwelling, with a high bias for in OSIRIS data with large albedos. In order to assess the long-term stability of OSIRIS measurements, global average drifts relative to the validation data sets were calculated and were found to be < 3% per decade for comparisons against MLS for 19.5–36.5 km, GOMOS for 18.5–54.5 km, and ozone sondes for 12.5–22.5 km, and within error of 3% per decade at most altitudes. Above 36.5 km, the relative drift for OSIRIS versus MLS ranged from ~ 0–6%, depending on the data set used to convert MLS data to the OSIRIS altitude versus number density grid. Overall, this work demonstrates that the OSIRIS 11 yr ozone data set from 2001 to the present is suitable for trend studies.

Download Full-text

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS

Fractals ◽

10.1142/s0218348x01000658 ◽

2001 ◽

Vol 09 (02) ◽

pp. 209-222 ◽

Cited By ~ 21

Author(s):

STEPHEN M. BURROUGHS ◽

SARAH F. TEBBENS

Keyword(s):

Power Law ◽

Distribution Functions ◽

Cumulative Distribution ◽

Generalized Function ◽

Data Sets ◽

Scaling Exponent ◽

Cumulative Number ◽

Size Distributions ◽

Data Set ◽

Binned Data

Power law cumulative number-size distributions are widely used to describe the scaling properties of data sets and to establish scale invariance. We derive the relationships between the scaling exponents of non-cumulative and cumulative number-size distributions for linearly binned and logarithmically binned data. Cumulative number-size distributions for data sets of many natural phenomena exhibit a "fall-off" from a power law at the largest object sizes. Previous work has often either ignored the fall-off region or described this region with a different function. We demonstrate that when a data set is abruptly truncated at large object size, fall-off from a power law is expected for the cumulative distribution. Functions to describe this fall-off are derived for both linearly and logarithmically binned data. These functions lead to a generalized function, the upper-truncated power law, that is independent of binning method. Fitting the upper-truncated power law to a cumulative number-size distribution determines the parameters of the power law, thus providing the scaling exponent of the data. Unlike previous approaches that employ alternate functions to describe the fall-off region, an upper-truncated power law describes the data set, including the fall-off, with a single function.

Download Full-text

Tailoring data source distributions for fairness-aware data integration

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476299 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2519-2532

Author(s):

Fatemeh Nargesian ◽

Abolfazl Asudeh ◽

H. V. Jagadish

Keyword(s):

Optimal Solution ◽

Cost Effective ◽

Data Sources ◽

Data Sets ◽

Multiple Sources ◽

Data Set ◽

Demographic Groups ◽

Reward Function ◽

Effective Manner ◽

Data Source

Data scientists often develop data sets for analysis by drawing upon sources of data available to them. A major challenge is to ensure that the data set used for analysis has an appropriate representation of relevant (demographic) groups: it meets desired distribution requirements. Whether data is collected through some experiment or obtained from some data provider, the data from any single source may not meet the desired distribution requirements. Therefore, a union of data from multiple sources is often required. In this paper, we study how to acquire such data in the most cost effective manner, for typical cost functions observed in practice. We present an optimal solution for binary groups when the underlying distributions of data sources are known and all data sources have equal costs. For the generic case with unequal costs, we design an approximation algorithm that performs well in practice. When the underlying distributions are unknown, we develop an exploration-exploitation based strategy with a reward function that captures the cost and approximations of group distributions in each data source. Besides theoretical analysis, we conduct comprehensive experiments that confirm the effectiveness of our algorithms.

Download Full-text

Automated identification of changes in electrode contact properties for long-term permanent ERT monitoring experiments

Geophysics ◽

10.1190/geo2012-0088.1 ◽

2013 ◽

Vol 78 (2) ◽

pp. E79-E94 ◽

Cited By ~ 11

Author(s):

John Deceuster ◽

Olivier Kaufmann ◽

Michel Van Camp

Keyword(s):

Repeated Measurements ◽

Cumulative Distribution ◽

Time Slice ◽

Data Sets ◽

Automated Identification ◽

Weighting Method ◽

Data Set ◽

Electrode Contact ◽

The Given

Electrical resistivity tomography (ERT) monitoring experiments are being conducted more often to image spatiotemporal changes in soil properties. When conducting long-term ERT monitoring, the identification of suspicious electrodes in a permanent spread is of major importance because changes in electrode contact properties of a single electrode may affect the quality of many measurements on each time-slice. An automated methodology was developed to detect these temporal changes in electrode contact properties, based on a Bayesian approach called “weights of evidence.” Contrasts [Formula: see text] and studentized contrasts [Formula: see text] are estimators of the influence of each electrode in the global data quality. A consolidated studentized contrast [Formula: see text] is introduced to consider the proportion of rejected quadripoles which contain a single electrode. These estimators are computed for each time-slice using [Formula: see text]-factor (coefficient of variation of repeated measurements) threshold values, from 0 to 10%, to discriminate between selected and rejected quadripoles. An automated detection strategy is proposed to identify suspicious electrodes by comparing the [Formula: see text] to the [Formula: see text] (maximum expected [Formula: see text] values when every electrode is good for the given data set). These [Formula: see text] are computed using Monte-Carlo simulations of a hundred random draws where the distribution of [Formula: see text]-factor values follows a Weibull cumulative distribution, with [Formula: see text] and [Formula: see text], fitted on a background data set filtered using a 5% threshold on absolute reciprocal errors. The efficiency of the methodology and its sensitivity to the selected reciprocal error threshold are assessed on synthetic and field data. Our approach is suitable to detect suspicious electrodes and slowly changing conditions affecting the galvanic contact resistances where classical approaches are shown to be inadequate except when the faulty electrode is disconnected. A data-weighting method is finally proposed to ensure that only good data will be used in the inversion of ERT monitoring data sets.

Download Full-text