benchmark data
Recently Published Documents


TOTAL DOCUMENTS

469
(FIVE YEARS 170)

H-INDEX

32
(FIVE YEARS 7)

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262463
Author(s):  
Keisuke Yoshihara ◽  
Kei Takahashi

We propose a simple anomaly detection method that is applicable to unlabeled time series data and is sufficiently tractable, even for non-technical entities, by using the density ratio estimation based on the state space model. Our detection rule is based on the ratio of log-likelihoods estimated by the dynamic linear model, i.e. the ratio of log-likelihood in our model to that in an over-dispersed model that we will call the NULL model. Using the Yahoo S5 data set and the Numenta Anomaly Benchmark data set, publicly available and commonly used benchmark data sets, we find that our method achieves better or comparable performance compared to the existing methods. The result implies that it is essential in time series anomaly detection to incorporate the specific information on time series data into the model. In addition, we apply the proposed method to unlabeled Web time series data, specifically, daily page view and average session duration data on an electronic commerce site that deals in insurance goods to show the applicability of our method to unlabeled real-world data. We find that the increase in page view caused by e-mail newsletter deliveries is less likely to contribute to completing an insurance contract. The result also suggests the importance of the simultaneous monitoring of more than one time series.


Mathematics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 39
Author(s):  
Qihang Huang ◽  
Yulin He ◽  
Zhexue Huang

To provide more external knowledge for training self-supervised learning (SSL) algorithms, this paper proposes a maximum mean discrepancy-based SSL (MMD-SSL) algorithm, which trains a well-performing classifier by iteratively refining the classifier using highly confident unlabeled samples. The MMD-SSL algorithm performs three main steps. First, a multilayer perceptron (MLP) is trained based on the labeled samples and is then used to assign labels to unlabeled samples. Second, the unlabeled samples are divided into multiple groups with the k-means clustering algorithm. Third, the maximum mean discrepancy (MMD) criterion is used to measure the distribution consistency between k-means-clustered samples and MLP-classified samples. The samples having a consistent distribution are labeled as highly confident samples and used to retrain the MLP. The MMD-SSL algorithm performs an iterative training until all unlabeled samples are consistently labeled. We conducted extensive experiments on 29 benchmark data sets to validate the rationality and effectiveness of the MMD-SSL algorithm. Experimental results show that the generalization capability of the MLP algorithm can gradually improve with the increase of labeled samples and the statistical analysis demonstrates that the MMD-SSL algorithm can provide better testing accuracy and kappa values than 10 other self-training and co-training SSL algorithms.


Author(s):  
Aditi R. Gupta ◽  
Swapnil Patond

Cephalic record is otherwise called cranial list or broadness file. It's processed as the length separated by the width of the skull times by 100. Assessing varieties in cephalic records across guardians, children, and family members can uncover if hereditary qualities are passed down hereditarily. The exploration included 480 clinical understudies (296 male and 184 female understudies). Hardlika's methodology was utilized to decide the cranial record. Most of the people were Mesocephalic (cephalic record 7579.9). 43.58 percent of guys and 42.93 percent of young ladies had a mesocephalic head. Men’s had an average cephalic record of 81.24 3.66, while ladies had a regular cephalic file of 80.31 4.28. Somatometric estimations, for example, the face and cephalic records, are used in human criminological sciences. These lists utilized a human's sex and racial populace to compute their singular character. This exploration aims to give benchmark data to cephalic records and face files in the Central Indian populace, contrasting these outcomes with past research. This material will be helpful to legal specialists, anatomists, and others in related disciplines. The cephalic record (CI), in some cases called the cranial list, is the proportion of the head's maximal expansiveness to length. The motivation behind the review was to research the anthropometry of cranial qualities. The review's objective was to check out the anthropometry of cranial qualities utilizing google structures circled in the school gatherings. Experts in scientific science will see the data as advantageous, just as in clinical, medico-lawful, anthropological, and excavator settings.


Author(s):  
Z Kok ◽  
J T Duffy ◽  
S Chai ◽  
Y Jin

The demand to increase port throughput has driven container ships to travel relatively fast in shallow water whilst avoiding grounding and hence, there is need for more accurate high-speed squat predictions. A study has been undertaken to determine the most suitable method to predict container ship squat when travelling at relatively high speeds (Frh ≥ 0.5) in finite water depth (1.1 ≤ h/T ≤ 1.3). The accuracy of two novel self-propelled URANS CFD squat model are compared with that of readily available empirical squat prediction formulae. Comparison of the CFD and empirical predictions with benchmark data demonstrates that for very low water depth (h/T < 1.14) and when Frh < 0.46; Barass II (1979), ICORELS (1980), and Millward’s (1992) formulae have the best correlation with benchmark data for all cases investigated. However, at relatively high speeds (Frh ≥ 0.5) which is achievable in deeper waters (h/T ≥ 1.14), most of the empirical formulae severely underestimated squat (7-49%) whereas the quasi-static CFD model presented has the best correlation. The changes in wave patterns and effective wake fraction with respect to h/T are also presented.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8281
Author(s):  
Rundong Yang ◽  
Kangfeng Zheng ◽  
Bin Wu ◽  
Chunhua Wu ◽  
Xiujuan Wang

Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model.


2021 ◽  
Author(s):  
Akinola Oladiran Adepetun ◽  
◽  
Bamidele Mustapha Oseni ◽  
Olusola Samuel Makinde ◽  
◽  
...  

In recent time, the Bayesian approach to randomized response technique has been used for estimating the population proportion especially of respondents possessing sensitive attributes such as induced abortion, tax evasion and shoplifting. This is done by combining suitable prior information about an unknown parameter of the population with the sample information for the estimation of the unknown parameter. In this study, possibility of using a transmuted Kumaraswamy prior is raised, yielding a new Bayes estimator for estimating population proportion of sensitive attribute for Warner’s randomized response technique. Consequently, the proposed Bayes estimator with transmuted Kumaraswamy prior is compared with existing Bayes estimators developed with a simple beta and Kumaraswamy priors in terms of their mean square error. The proposed estimator competes well with the existing estimators for some values of population proportion. The performances of Bayes estimators were also compared using some benchmark data.


2021 ◽  
Vol 11 (22) ◽  
pp. 10977
Author(s):  
Youngjae Lee ◽  
Hyeyoung Park

In developing a few-shot classification model using deep networks, the limited number of samples in each class causes difficulty in utilizing statistical characteristics of the class distributions. In this paper, we propose a method to treat this difficulty by combining a probabilistic similarity based on intra-class statistics with a metric-based few-shot classification model. Noting that the probabilistic similarity estimated from intra-class statistics and the classifier of conventional few-shot classification models have a common assumption on the class distributions, we propose to apply the probabilistic similarity to obtain loss value for episodic learning of embedding network as well as to classify unseen test data. By defining the probabilistic similarity as the probability density of difference vectors between two samples with the same class label, it is possible to obtain a more reliable estimate of the similarity especially for the case of large number of classes. Through experiments on various benchmark data, we confirm that the probabilistic similarity can improve the classification performance, especially when the number of classes is large.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7468
Author(s):  
Yui-Kai Weng ◽  
Shih-Hsu Huang ◽  
Hsu-Yu Kao

In a CNN (convolutional neural network) accelerator, to reduce memory traffic and power consumption, there is a need to exploit the sparsity of activation values. Therefore, some research efforts have been paid to skip ineffectual computations (i.e., multiplications by zero). Different from previous works, in this paper, we point out the similarity of activation values: (1) in the same layer of a CNN model, most feature maps are either highly dense or highly sparse; (2) in the same layer of a CNN model, feature maps in different channels are often similar. Based on the two observations, we propose a block-based compression approach, which utilizes both the sparsity and the similarity of activation values to further reduce the data volume. Moreover, we also design an encoder, a decoder and an indexing module to support the proposed approach. The encoder is used to translate output activations into the proposed block-based compression format, while both the decoder and the indexing module are used to align nonzero values for effectual computations. Compared with previous works, benchmark data consistently show that the proposed approach can greatly reduce both memory traffic and power consumption.


Water ◽  
2021 ◽  
Vol 13 (22) ◽  
pp. 3159
Author(s):  
João Faria Feliciano ◽  
André Marques Arsénio ◽  
Joana Cassidy ◽  
Ana Rita Santos ◽  
Alice Ganhão

Digitalization and knowledge management in the water sector, and their impacts on performance, greatly depend on two factors: human capacity and digital maturity. To understand the link between performance, human capacity, and digital maturity, six AGS water retail utilities were compared with all Portuguese utilities using Portuguese benchmark data (2011–2019). AGS utilities achieved better results, including in compound performance indicators, which are assumed to be surrogates for digital maturity. These compound indicators were also found to correlate positively with better performance. In fact, AGS utilities show levels of non-revenue water (NRW) (<25%) below the national median (30–40%), with network replacement values similar to the national median (<0.5%). These results seem to imply that higher digital maturity can offset relatively low network replacement levels and guarantee NRW levels below the national average. Furthermore, regarding personnel aging index and digital maturity—two internally developed indicators—there was an increase in the digital maturity and aging of the staff, which, again, raises questions about long-term sustainability. The growing performance and the slight increase in digital maturity can be attributed to group-wide capacity building and digitalization programs that bring together staff from all AGS utilities in year-long activities.


Sign in / Sign up

Export Citation Format

Share Document