scholarly journals Research on Data Currency Rule and Quality Evaluation

2021 ◽  
Vol 50 (2) ◽  
pp. 247-263
Author(s):  
Xuliang Duan ◽  
Bing Guo ◽  
Yan Shen ◽  
Yuncheng Shen ◽  
Xiangqian Dong ◽  
...  

Data currency is a temporal reference of data, it reflects the degree to which the data is current with the worldit models. Currency rule is a formal rule extracted from the data set and reflecting the currency order of thedata tuples, it can be used for both data repairing and currency quality evaluation. Based on the research of datacurrency repairing, the basic form of currency rule is extended, and parallel rule extraction and update algorithmsare proposed to meet the requirement of running on dynamic data sets. Besides, four data currency qualityevaluation models are proposed and verified by experiments. The performance test show that the efficiencyof parallel algorithms is significantly improved, the rules compliance mean(CM2) model based on extendedcurrency rule has the highest average precision. The extended currency rules not only improve the efficiencyand adaptability, but also provide more valuable features for data quality evaluation.

Author(s):  
Tomas Gro¨nstedt ◽  
Markus Wallin

Recent work on gas turbine diagnostics based on optimisation techniques advocates two different approaches: 1) Stochastic optimisation, including Genetic Algorithm techniques, for its robustness when optimising objective functions with many local optima and 2) Gradient based methods mainly for their computational efficiency. For smooth and single optimum functions, gradient methods are known to provide superior numerical performance. This paper addresses the key issue for method selection, i.e. whether multiple local optima may occur when the optimisation approach is applied to real engine testing. Two performance test data sets for the RM12 low bypass ratio turbofan engine, powering the Swedish Fighter Gripen, have been analysed. One set of data was recorded during performance testing of a highly degraded engine. This engine has been subjected to Accelerated Mission Testing (AMT) cycles corresponding to more than 4000 hours of run time. The other data set was recorded for a development engine with less than 200 hours of operation. The search for multiple optima was performed starting from more than 100 extreme points. Not a single case of multi-modality was encountered, i.e. one unique solution for each of the two data sets was consistently obtained. The RM12 engine cycle is typical for a modern fighter engine, implying that the obtained results can be transferred to, at least, most low bypass ratio turbofan engines. The paper goes on to describe the numerical difficulties that had to be resolved to obtain efficient and robust performance by the gradient solvers. Ill conditioning and noise may, as illustrated on a model problem, introduce local optima without a correspondence in the gas turbine physics. Numerical methods exploiting the special problem structure represented by a non-linear least squares formulation is given special attention. Finally, a mixed norm allowing for both robustness and numerical efficiency is suggested.


Nowadays, a huge amount of data is generated due to the growth in the technologies. There are different tools used to view this massive amount of data, and these tools contain different data mining techniques which can be applied for the obtained data sets. Classification is required to extract useful information or to predict the result from these enormous amounts of data. For this purpose, there are different classification algorithms. In this paper, we have compared Naive Bayes, K*, and random forest classification algorithm using Weka tool. To analyze the performance of these three algorithms we have considered three data sets. They are diabetes, supermarket and weather data set. In this work, an analysis is made based on the confusion matrix and different performance measures like RMSE, MAE, ROC, etc


1998 ◽  
Vol 9 ◽  
pp. 247-293 ◽  
Author(s):  
J. M. Wiebe ◽  
T. P. O'Hara ◽  
Thorsten Ohrstrom-Sandgren ◽  
K. J. McKeever

Scheduling dialogs, during which people negotiate the times of appointments, are common in everyday life. This paper reports the results of an in-depth empirical investigation of resolving explicit temporal references in scheduling dialogs. There are four phases of this work: data annotation and evaluation, model development, system implementation and evaluation, and model evaluation and analysis. The system and model were developed primarily on one set of data, and then applied later to a much more complex data set, to assess the generalizability of the model for the task being performed. Many different types of empirical methods are applied to pinpoint the strengths and weaknesses of the approach. Detailed annotation instructions were developed and an intercoder reliability study was performed, showing that naive annotators can reliably perform the targeted annotations. A fully automatic system has been developed and evaluated on unseen test data, with good results on both data sets. We adopt a pure realization of a recency-based focus model to identify precisely when it is and is not adequate for the task being addressed. In addition to system results, an in-depth evaluation of the model itself is presented, based on detailed manual annotations. The results are that few errors occur specifically due to the model of focus being used, and the set of anaphoric relations defined in the model are low in ambiguity for both data sets.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Chengkun Lu

Through the recognition and analysis of human motion information, the actual motion state of human body can be obtained. However, the multifeature fusion of human behavior has limitations in recognition accuracy and robustness. Combined with deep reinforcement learning, multifeature fusion human behavior recognition is studied and we proposed a multifeature fusion human behavior recognition algorithm using deep reinforcement learning. Firstly, several typical human behavior data sets are selected as the research data in the benchmark data set. In the selected data sets, the behavior category contained in each video is the same behavior, and there are category tags. Secondly, the attention model is constructed. In the deep reinforcement learning network, the small sampling area is used as the model input. Finally, the corresponding position of the next visual area is estimated according to the time series information obtained after the input. The human behavior recognition algorithm based on deep reinforcement learning multifeature fusion is completed. The results show that the average accuracy of multifeature fusion of the algorithm is about 95%, the human behavior recognition effect is good, the identification accuracy rate is as high as about 98% and passed the camera movement impact performance test and the algorithm robustness, and the average time consumption of the algorithm is only 12.7 s, which shows that the algorithm has very broad application prospects.


1993 ◽  
Vol 8 (2) ◽  
pp. 171-206 ◽  
Author(s):  
Sali A. Tagliamonte ◽  
Shana Poplack

This paper examines the past temporal reference system in two data sets representing "early" Black English: Sarnana and the Ex-slave Recordings, with a view to discovering the structure underlying variable use of overt verbal morphology. Extrapolating from proposals in the literature on the behavior of past temporal reference structures in known creoles, as well as in black and white vernaculars, we propose and test an analytical model based on quantitative methodology and making use of the stepwise selection procedure incorporated in a variable rule analysis. Competing hypotheses were operationalized as factors in the analysis and systematically tested on the same data set. Perhaps the most striking result of our study is that no matter which way the data are configured, the same three factor effects obtain. These reflect general constraints on language use and language processing rather than specific creole phenomena, such as the patterning expected of a relative tense system sensitive to stativity and anteriority. These findings lead us to suggest not only that an English-like system of absolute tense marking, expressed by both marked and unmarked verbs, prevails in these materials, but also that the temporal organization of these materials is not consistent with what has been posited for creole languages.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


2013 ◽  
Vol 756-759 ◽  
pp. 3652-3658
Author(s):  
You Li Lu ◽  
Jun Luo

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.


Sign in / Sign up

Export Citation Format

Share Document