scholarly journals Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

Author(s):  
Mina Lee ◽  
Chris Donahue ◽  
Robin Jia ◽  
Alexander Iyabor ◽  
Percy Liang
2015 ◽  
Author(s):  
Oren Melamud ◽  
Omer Levy ◽  
Ido Dagan

Author(s):  
Richard C. Kittler

Abstract Analysis of manufacturing data as a tool for failure analysts often meets with roadblocks due to the complex non-linear behaviors of the relationships between failure rates and explanatory variables drawn from process history. The current work describes how the use of a comprehensive engineering database and data mining technology over-comes some of these difficulties and enables new classes of problems to be solved. The characteristics of the database design necessary for adequate data coverage and unit traceability are discussed. Data mining technology is explained and contrasted with traditional statistical approaches as well as those of expert systems, neural nets, and signature analysis. Data mining is applied to a number of common problem scenarios. Finally, future trends in data mining technology relevant to failure analysis are discussed.


2006 ◽  
Vol 14 (4) ◽  
pp. 278-287 ◽  
Author(s):  
Manisa Pipattanasomporn ◽  
Saifur Rahman

2014 ◽  
Vol 7 (7) ◽  
pp. 7053-7084
Author(s):  
M. F. Schibig ◽  
M. Steinbacher ◽  
B. Buchmann ◽  
I. T. van der Laan-Luijkx ◽  
S. van der Laan ◽  
...  

Abstract. Since 2004, atmospheric carbon dioxide (CO2) is measured at the High Altitude Research Station Jungfraujoch by the division of Climate and Environmental Physics at the University of Bern (KUP) using a nondispersive infrared gas analyzer (NDIR) in combination with a paramagnetic O2 analyzer. In January 2010, CO2 measurements based on cavity ring down spectroscopy (CRDS) as part of the Swiss National Air Pollution Monitoring Network have been added by the Swiss Federal Laboratories for Materials Science and Technology (Empa). To ensure a smooth transition – a prerequisite when merging two datasets e.g. for trend determinations – the two measurement systems run in parallel for several years. Such a long-term intercomparison also allows identifying potential offsets between the two datasets and getting information about the compatibility of the two systems on different time scales. A good agreement of the seasonality as well as for the short-term variations was observed and to a lesser extent for trend calculations mainly due to the short common period. However, the comparison revealed some issues related to the stability of the calibration gases of the KUP system and their assigned CO2 mole fraction. It was possible to adapt an improved calibration strategy based on standard gas determinations, which lead to better agreement between the two data sets. By excluding periods with technical problems and bad calibration gas cylinders, the average hourly difference (CRDS − NDIR) of the two systems is −0.03 ppm ± 0.25 ppm. Although the difference of the two datasets is in line with the compatibility goal of ±0.1 ppm of the World Meteorological Organization (WMO), the standard deviation is still too high. A significant part of this uncertainty originates from the necessity to switch the KUP system frequently (every 12 min) for 6 min from ambient air to a working gas in order to correct short-term variations of the O2 measurement system. Allowing additionally for signal stabilization after switching the sample, an effective data coverage of only 1/6 for the KUP system is achieved while the Empa system has a nearly complete data coverage. Additionally, different internal volumes and flow rates between the two systems may affect observed differences.


2019 ◽  
Author(s):  
Truly Santika ◽  
Michael F. Hutchinson ◽  
Kerrie A. Wilson

ABSTRACTPresence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.


1998 ◽  
Vol 88 (5) ◽  
pp. 1275-1288 ◽  
Author(s):  
Craig A. Schultz ◽  
Stephen C. Myers ◽  
James Hipp ◽  
Christopher J. Young

Abstract Seismic characterization works to improve the detection, location, and identification of seismic events by correcting for inaccuracies in geophysical models. These inaccuracies are caused by inherent averaging in the model, and, as a result, exact data values cannot be directly recovered at a point in the model. Seismic characterization involves cataloging reference events so that inaccuracies in the model can be mapped at these points and true data values can be retained through a correction. Application of these corrections to a new event requires the accurate prediction of the correction value at a point that is near but not necessarily coincident with the reference events. Given that these reference events can be sparsely distributed geographically, both interpolation and extrapolation of corrections to the new point are required. In this study, we develop a closed-form representation of Bayesian kriging (linear prediction) that incorporates variable spatial damping. The result is a robust nonstationary algorithm for spatially interpolating geophysical corrections. This algorithm extends local trends when data coverage is good and allows for damping (blending) to an a priori background mean when data coverage is poor. Benchmark tests show that the technique gives reliable predictions of the correction value along with an appropriate uncertainty estimate. Tests with travel-time residual data demonstrate that combining variable damping with an azimuthal coverage criterion reduces the large errors that occur with more classical linear prediction techniques, especially when values are extrapolated in poor coverage regions. In the travel-time correction case, this technique generates both seismic corrections along with uncertainties and can properly incorporate model error in the final location estimate. Results favor the applicability of this nonstationary algorithm to other types of seismic corrections such as amplitude and attenuation measures.


Sign in / Sign up

Export Citation Format

Share Document