scholarly journals Getting the model right: an information criterion for spectroscopy

2020 ◽  
Vol 501 (2) ◽  
pp. 2268-2278
Author(s):  
John K Webb ◽  
Chung-Chi Lee ◽  
Robert F Carswell ◽  
Dinko Milaković

ABSTRACT Robust model-fitting to spectroscopic transitions is a requirement across many fields of science. The corrected Akaike and Bayesian information criteria (AICc and BIC) are most frequently used to select the optimal number of fitting parameters. In general, AICc modelling is thought to overfit (too many model parameters) and BIC underfits. For spectroscopic modelling, both AICc and BIC lack in two important respects: (a) no penalty distinction is made according to line strength such that parameters of weak lines close to the detection threshold are treated with equal importance as strong lines and (b) no account is taken of the way in which a narrow spectral line impacts only on a very small section of the overall data. In this paper, we introduce a new information criterion that addresses these shortcomings, the Spectral Information Criterion (SpIC). Spectral simulations are used to compare performances. The main findings are (i) SpIC clearly outperforms AICc for high signal-to-noise data, (ii) SpIC and AICc work equally well for lower signal-to-noise data, although SpIC achieves this with fewer parameters, and (iii) BIC does not perform well (for this application) and should be avoided. The new method should be of broader applicability (beyond spectroscopy), wherever different model parameters influence separated small ranges within a larger data set and/or have widely varying sensitivities.

2014 ◽  
Vol 41 (4) ◽  
pp. 334-341 ◽  
Author(s):  
Jun Peng ◽  
Zhibao Dong ◽  
Fengqing Han ◽  
Yuanhong Han ◽  
Xueling Dai

Abstract The optically stimulated luminescence (OSL) decay curve is assumed to consist of a number of first-order exponential components. Improper estimation of the number of components leads to under-or over-fitting of the curve under consideration. Hence, correct estimation of the number of components is important to accurately analyze an OSL decay curve. In this study, we investigated the possibility of using the Bayesian Information Criterion to estimate the optimal number of components in an OSL decay curve. We tested the reliability of this method using several hundred measured decay curves and three simulation scenarios. Our results demonstrate that the quality of the identification can be influenced by several factors: the measurement time and the number of channels; the variability of the decay constants; and the signal-to-noise ratios of a decaying component. The results also suggest that the Bayesian Information Criterion has great potential to estimate the number of components in an OSL decay curve with a moderate to high signal-to-noise ratio.


2008 ◽  
Vol 06 (02) ◽  
pp. 261-282 ◽  
Author(s):  
AO YUAN ◽  
WENQING HE

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.


2014 ◽  
Vol 556-562 ◽  
pp. 6328-6331
Author(s):  
Su Zhen Shi ◽  
Yi Chen Zhao ◽  
Li Biao Yang ◽  
Yao Tang ◽  
Juan Li

The LIFT technology has applied in process of denoising to ensure the imaging precision of minor faults and structure in 3D coalfield seismic processing. The paper focused on the denoising process in two study areas where the LIFT technology is used. The separation of signal and noise is done firstly. Then denoising would be done in the noise data. The Data of weak effective signal that is from the noise data could be blended with the original effective signal to reconstruct the denoising data, so the result which has high signal-to-noise ratio and preserved amplitude is acquired. Thus the fact shows that LIFT is an effective denoising method for 3D seismic in coalfield and could be used widely in other work area.


1993 ◽  
Vol 138 ◽  
pp. 27-41
Author(s):  
Saul J. Adelman

AbstractI review abundance determinations of normal B5-F4 and peculiar stars published since 1984. Several analyses performed with photographic spectrograms indicate interesting stars which should be analyzed with high signal-to-noise data. Studies of stars of known ages which belong to clusters, associations, and moving groups should led to the most direct confrontations with theory. The increase in signal-to-noise ratio provided by electronic detectors with respect to photographic plates should allow accurate analyses of moderating rotating stars. High resolution, high signal-to-noise ratio studies have revealed crucial information about the line profiles of Sirius, Vega, and other A stars. It would aid comparison of analyses if we could agree on a standard set of gf-values and line damping constants. A computer bulletin board would be a useful means to provide and maintain such data as well as model atmosphere codes.


Geophysics ◽  
2009 ◽  
Vol 74 (4) ◽  
pp. J35-J48 ◽  
Author(s):  
Bernard Giroux ◽  
Abderrezak Bouchedda ◽  
Michel Chouteau

We introduce two new traveltime picking schemes developed specifically for crosshole ground-penetrating radar (GPR) applications. The main objective is to automate, at least partially, the traveltime picking procedure and to provide first-arrival times that are closer in quality to those of manual picking approaches. The first scheme is an adaptation of a method based on cross-correlation of radar traces collated in gathers according to their associated transmitter-receiver angle. A detector is added to isolate the first cycle of the radar wave and to suppress secon-dary arrivals that might be mistaken for first arrivals. To improve the accuracy of the arrival times obtained from the crosscorrelation lags, a time-rescaling scheme is implemented to resize the radar wavelets to a common time-window length. The second method is based on the Akaike information criterion(AIC) and continuous wavelet transform (CWT). It is not tied to the restrictive criterion of waveform similarity that underlies crosscorrelation approaches, which is not guaranteed for traces sorted in common ray-angle gathers. It has the advantage of being automated fully. Performances of the new algorithms are tested with synthetic and real data. In all tests, the approach that adds first-cycle isolation to the original crosscorrelation scheme improves the results. In contrast, the time-rescaling approach brings limited benefits, except when strong dispersion is present in the data. In addition, the performance of crosscorrelation picking schemes degrades for data sets with disparate waveforms despite the high signal-to-noise ratio of the data. In general, the AIC-CWT approach is more versatile and performs well on all data sets. Only with data showing low signal-to-noise ratios is the AIC-CWT superseded by the modified crosscorrelation picker.


2011 ◽  
Vol 7 (S279) ◽  
pp. 325-326 ◽  
Author(s):  
Franz E. Bauer ◽  
Paula Zelaya ◽  
Alejandro Clocchiatti ◽  
Justyn Maund

AbstractWe report results for two epochs of spectropolarimetry on the luminous type IIn SN2010jl, taken at ≈36 and 85 days post-explosion with VLT FORS2-PMOS. The high signal-to-noise data demonstrate distinct evolution in the continuum and the broad lines point to a complex origin for the various emission components and to a potentially common polarization signal for the type IIn class even over 1-2 orders of magnitude in luminosity output.


2018 ◽  
Author(s):  
Katharina Renner-Martin ◽  
Norbert Brunner ◽  
Manfred Kühleitner ◽  
Werner-Georg Nowak ◽  
Klaus Scheicher

The Bertalanffy-Pütter growth model describes mass m at age t by means of the differential equation dm/dt = p⋅ma−q⋅mb. The special case using the Bertalanffy exponent-pair a=2/3 and b=1 is most common (it corresponds to the von Bertalanffy growth function VBGF for length in fishery literature). For data fitting using general exponents, five model parameters need to be optimized, the pair a<b of non-negative exponents, the non-negative constants p and q, and a positive initial value m0 for the differential equation. For the case b=1 it is known that for most fish data any exponent a<1 could be used to model growth without affecting the fit to the data significantly (when the other parameters p, q, m0 were optimized). Thereby, data fitting used the method of least squares, minimizing the sum of squared errors (SSE). It was conjectured that the optimization of both exponents would result in a significantly better fit of the optimal growth function to the data and thereby reduce SSE. This conjecture was tested for a data set for the mass-growth of Walleye (Sander vitreus), a fish from Lake Erie, USA. Compared to the Bertalanffy exponent-pair the optimal exponent-pair achieved a reduction of SSE by 10%. However, when the optimization of additional parameters was penalized, using the Akaike information criterion (AIC), then the optimal exponent-pair model had a higher (worse) AIC, when compared to the Bertalanffy exponent-pair. Thereby SSE and AIC are different ways to compare models. SSE is used, when predictive power is needed alone, and AIC is used, when simplicity of the model and explanatory power are needed.


Geophysics ◽  
2016 ◽  
Vol 81 (2) ◽  
pp. KS71-KS91 ◽  
Author(s):  
Jubran Akram ◽  
David W. Eaton

We have evaluated arrival-time picking algorithms for downhole microseismic data. The picking algorithms that we considered may be classified as window-based single-level methods (e.g., energy-ratio [ER] methods), nonwindow-based single-level methods (e.g., Akaike information criterion), multilevel- or array-based methods (e.g., crosscorrelation approaches), and hybrid methods that combine a number of single-level methods (e.g., Akazawa’s method). We have determined the key parameters for each algorithm and developed recommendations for optimal parameter selection based on our analysis and experience. We evaluated the performance of these algorithms with the use of field examples from a downhole microseismic data set recorded in western Canada as well as with pseudo-synthetic microseismic data generated by adding 100 realizations of Gaussian noise to high signal-to-noise ratio microseismic waveforms. ER-based algorithms were found to be more efficient in terms of computational speed and were therefore recommended for real-time microseismic data processing. Based on the performance on pseudo-synthetic and field data sets, we found statistical, hybrid, and multilevel crosscorrelation methods to be more efficient in terms of accuracy and precision. Pick errors for S-waves are reduced significantly when data are preconditioned by applying a transformation into ray-centered coordinates.


2021 ◽  
Author(s):  
C Massiot ◽  
John Townend ◽  
A Nicol ◽  
DD McNamara

©2017. American Geophysical Union. All Rights Reserved. Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.


2021 ◽  
Author(s):  
C Massiot ◽  
John Townend ◽  
A Nicol ◽  
DD McNamara

©2017. American Geophysical Union. All Rights Reserved. Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.


Sign in / Sign up

Export Citation Format

Share Document