Signal Detection for Data Sets with a Signal-To-Noise Ratio of 1 or Less with the Use of a Moving Product Filter

1998 ◽  
Vol 52 (4) ◽  
pp. 621-625 ◽  
Author(s):  
H. G. Schulze ◽  
L. S. Greek ◽  
C. J. Barbosa ◽  
M. W. Blades ◽  
R. F. B. Turner

We report on a method to reduce background noise and amplify signals in data sets with low signal-to-noise ratios (SNRs). This method consists of taking a data set with mean 0 and normalized with respect to absolute value, adding 1 to all values to adjust the mean to 1, and then applying a moving product (MP) to the transformed data set (similar to the application of a moving average or 0-order Savitzky–Golay filtering). A data point in the presence of a signal raises the probability of that data point having a value >1, while the absence of a signal increases the probability of that data point having a value < 1. If the autocorrelation lag of the signal is larger than the autocorrelation lag of the associated noise, the use of an MP with window comparable to that of the signal width (i.e., 2–3 times the signal standard deviation) will tend to reduce the values of data points where no signal is present and similarly amplify data points where signal is present. Signal amplification, often to a considerable degree, is gained at the cost of signal distortion. We have used this method on simulated data sets with SNRs of 1, 0.5, and 0.33, and obtained signal-to-background noise ratio (SBNR) enhancements in excess of 100 times. We have also applied this procedure to low SNR measured Raman spectra, and we discuss our findings and their implications. This method is expected to be useful in the detection of weak signals buried in strong background noise.

2003 ◽  
Vol 99 (6) ◽  
pp. 1255-1262 ◽  
Author(s):  
Wei Lu ◽  
James G. Ramsay ◽  
James M. Bailey

Background Many pharmacologic studies record data as binary, yes-or-no, variables with analysis using logistic regression. In a previous study, it was shown that estimates of C50, the drug concentration associated with a 50% probability of drug effect, were unbiased, whereas estimates of gamma, the term describing the steepness of the concentration-effect relationship, were biased when sparse data were naively pooled for analysis. In this study, it was determined whether mixed-effects analysis improved the accuracy of parameter estimation. Methods Pharmacodynamic studies with binary, yes-or-no, responses were simulated and analyzed with NONMEM. The bias and coefficient of variation of C50 and gamma estimates were determined as a function of numbers of patients in the simulated study, the number of simulated data points per patient, and the "true" value of gamma. In addition, 100 sparse binary human data sets were generated from an evaluation of midazolam for postoperative sedation of adult patients undergoing cardiac surgery by random selection of a single data point (sedation score vs. midazolam plasma concentration) from each of the 30 patients in the study. C50 and gamma were estimated for each of these data sets by using NONMEM and were compared with the estimates from the complete data set of 656 observations. Results Estimates of C50 were unbiased, even for sparse data (one data point per patient) with coefficients of variation of 30-50%. Estimates of gamma were highly biased for sparse data for all values of gamma greater than 1, and the value of gamma was overestimated. Unbiased estimation of gamma required 10 data points per patient. The coefficient of variation of gamma estimates was greater than that of the C50 estimates. Clinical data for sedation with midazolam confirmed the simulation results, showing an overestimate of gamma with sparse data. Conclusion Although accurate estimations of C50 from sparse binary data are possible, estimates of gamma are biased. Data with 10 or more observations per patient is necessary for accurate estimations of gamma.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.


2021 ◽  
pp. gr.273631.120
Author(s):  
Xinhao Liu ◽  
Huw A Ogilvie ◽  
Luay Nakhleh

Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, are coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human-chimp-gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation it is flexible enough to enable future implementations of all kinds of population models.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


2009 ◽  
Vol 72 (2) ◽  
pp. 260-266 ◽  
Author(s):  
JOHN R. RUBY ◽  
STEVEN C. INGHAM

Previous work using a large data set (no. 1, n = 5,355) of carcass sponge samples from three large-volume beef abattoirs highlighted the potential use of binary (present or absent) Enterobacteriaceae results for predicting the absence of Salmonella on carcasses. Specifically, the absence of Enterobacteriaceae was associated with the absence of Salmonella. We tested the accuracy of this predictive approach by using another large data set (no. 2, n = 2,163 carcasses sampled before or after interventions) from the same three data set no. 1 abattoirs over a later 7-month period. Similarly, the predictive approach was tested on smaller subsets from data set no. 2 (n = 1,087, and n = 405) and on a much smaller data set (no. 3, n = 100 postintervention carcasses) collected at a small-volume abattoir over 4 months. Of Enterobacteriaceae-negative data set no. 2 carcasses, &gt;98% were Salmonella negative. Similarly accurate predictions were obtained in the two data subsets obtained from data set no. 2 and in data set no. 3. Of final postintervention carcass samples in data set nos. 2 and 3, 9 and 70%, respectively, were Enterobacteriaceae positive; mean Enterobacteriaceae values for the two data sets were −0.375, and 0.169 log CFU/100 cm2 (detection limit = −0.204, and Enterobacteriaceae negative assigned a value of −0.505 log CFU/100 cm2). Salmonella contamination rates for final postintervention beef carcasses in data set nos. 2 and 3 were 1.1 and 7.0%, respectively. Binary Enterobacteriaceae results may be useful in evaluating beef abattoir hygiene and intervention treatment efficacy.


Fractals ◽  
2001 ◽  
Vol 09 (01) ◽  
pp. 105-128 ◽  
Author(s):  
TAYFUN BABADAGLI ◽  
KAYHAN DEVELI

This paper presents an evaluation of the methods applied to calculate the fractal dimension of fracture surfaces. Variogram (applicable to 1D self-affine sets) and power spectral density analyses (applicable to 2D self-affine sets) are selected to calculate the fractal dimension of synthetic 2D data sets generated using fractional Brownian motion (fBm). Then, the calculated values are compared with the actual fractal dimensions assigned in the generation of the synthetic surfaces. The main factor considered is the size of the 2D data set (number of data points). The critical sample size that yields the best agreement between the calculated and actual values is defined for each method. Limitations and the proper use of each method are clarified after an extensive analysis. The two methods are also applied to synthetically and naturally developed fracture surfaces of different types of rocks. The methods yield inconsistent fractal dimensions for natural fracture surfaces and the reasons of this are discussed. The anisotropic feature of fractal dimension that may lead to a correlation of fracturing mechanism and multifractality of the fracture surfaces is also addressed.


Author(s):  
Ying-Jia Lin ◽  
Ying-Cheng Su ◽  
Paul C.-P. Chao ◽  
Jia-Yu Zhang ◽  
Eka Fitrah Pribadi

Abstract A capacitive sensing circuit including electrodes for a 7-inch ultra-thin flexible on-cell touch panel has been designed. Implementing code-division multiple sensing (CDMS) with Walsh transform to scan Tx electrodes is chosen to improve the signal-to-noise ratio (SNR). The algorithm applies to field programmable logic array (FPGA). The sensing readout algorithm is applied to work on 4 Tx transmitter electrodes and 4 Rx sensing electrodes. The switched-capacitor (SC) circuit is applied to avoid disturbing sample signal from parasitic capacitance and enlarge the voltage difference from capacitance changes of the touch panel. 12-bit ADC to transfer the front-end analog signal to digital code. The digital part adopts a correction algorithm to eliminate the background value of the panel, the moving average algorithm has an adjustable signal-to-noise ratio function, and the Walsh conversion demodulation algorithm improves the touch report rate to achieve high SNR with up-to 34 dB.


2016 ◽  
Vol 2016 ◽  
pp. 1-7
Author(s):  
Zhizheng Liang

Feature scaling has attracted considerable attention during the past several decades because of its important role in feature selection. In this paper, a novel algorithm for learning scaling factors of features is proposed. It first assigns a nonnegative scaling factor to each feature of data and then adopts a generalized performance measure to learn the optimal scaling factors. It is of interest to note that the proposed model can be transformed into a convex optimization problem: second-order cone programming (SOCP). Thus the scaling factors of features in our method are globally optimal in some sense. Several experiments on simulated data, UCI data sets, and the gene data set are conducted to demonstrate that the proposed method is more effective than previous methods.


2005 ◽  
Vol 30 (4) ◽  
pp. 369-396 ◽  
Author(s):  
Eisuke Segawa

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.


Sign in / Sign up

Export Citation Format

Share Document