benchmark data Latest Research Papers

A simple method for unsupervised anomaly detection: An application to Web time series data

PLoS ONE ◽

10.1371/journal.pone.0262463 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262463

Author(s):

Keisuke Yoshihara ◽

Kei Takahashi

Keyword(s):

Time Series ◽

Anomaly Detection ◽

Time Series Data ◽

Density Ratio ◽

Series Data ◽

Specific Information ◽

Data Set ◽

Simple Method ◽

Page View ◽

Benchmark Data

We propose a simple anomaly detection method that is applicable to unlabeled time series data and is sufficiently tractable, even for non-technical entities, by using the density ratio estimation based on the state space model. Our detection rule is based on the ratio of log-likelihoods estimated by the dynamic linear model, i.e. the ratio of log-likelihood in our model to that in an over-dispersed model that we will call the NULL model. Using the Yahoo S5 data set and the Numenta Anomaly Benchmark data set, publicly available and commonly used benchmark data sets, we find that our method achieves better or comparable performance compared to the existing methods. The result implies that it is essential in time series anomaly detection to incorporate the specific information on time series data into the model. In addition, we apply the proposed method to unlabeled Web time series data, specifically, daily page view and average session duration data on an electronic commerce site that deals in insurance goods to show the applicability of our method to unlabeled real-world data. We find that the increase in page view caused by e-mail newsletter deliveries is less likely to contribute to completing an insurance contract. The result also suggests the importance of the simultaneous monitoring of more than one time series.

Get full-text (via PubEx)

A Novel Maximum Mean Discrepancy-Based Semi-Supervised Learning Algorithm

Mathematics ◽

10.3390/math10010039 ◽

2021 ◽

Vol 10 (1) ◽

pp. 39

Author(s):

Qihang Huang ◽

Yulin He ◽

Zhexue Huang

Keyword(s):

Statistical Analysis ◽

Supervised Learning ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Data Sets ◽

Generalization Capability ◽

Maximum Mean Discrepancy ◽

Benchmark Data ◽

Testing Accuracy ◽

Multiple Groups

To provide more external knowledge for training self-supervised learning (SSL) algorithms, this paper proposes a maximum mean discrepancy-based SSL (MMD-SSL) algorithm, which trains a well-performing classifier by iteratively refining the classifier using highly confident unlabeled samples. The MMD-SSL algorithm performs three main steps. First, a multilayer perceptron (MLP) is trained based on the labeled samples and is then used to assign labels to unlabeled samples. Second, the unlabeled samples are divided into multiple groups with the k-means clustering algorithm. Third, the maximum mean discrepancy (MMD) criterion is used to measure the distribution consistency between k-means-clustered samples and MLP-classified samples. The samples having a consistent distribution are labeled as highly confident samples and used to retrain the MLP. The MMD-SSL algorithm performs an iterative training until all unlabeled samples are consistently labeled. We conducted extensive experiments on 29 benchmark data sets to validate the rationality and effectiveness of the MMD-SSL algorithm. Experimental results show that the generalization capability of the MLP algorithm can gradually improve with the increase of labeled samples and the statistical analysis demonstrates that the MMD-SSL algorithm can provide better testing accuracy and kappa values than 10 other self-training and co-training SSL algorithms.

Get full-text (via PubEx)

Estimation of Cephalic Index of Population of Central India

Journal of Pharmaceutical Research International ◽

10.9734/jpri/2021/v33i60a34525 ◽

2021 ◽

pp. 614-622

Author(s):

Aditi R. Gupta ◽

Swapnil Patond

Keyword(s):

Family Members ◽

Central India ◽

Past Research ◽

Cephalic Index ◽

Benchmark Data ◽

The People ◽

The Face

Cephalic record is otherwise called cranial list or broadness file. It's processed as the length separated by the width of the skull times by 100. Assessing varieties in cephalic records across guardians, children, and family members can uncover if hereditary qualities are passed down hereditarily. The exploration included 480 clinical understudies (296 male and 184 female understudies). Hardlika's methodology was utilized to decide the cranial record. Most of the people were Mesocephalic (cephalic record 7579.9). 43.58 percent of guys and 42.93 percent of young ladies had a mesocephalic head. Men’s had an average cephalic record of 81.24 3.66, while ladies had a regular cephalic file of 80.31 4.28. Somatometric estimations, for example, the face and cephalic records, are used in human criminological sciences. These lists utilized a human's sex and racial populace to compute their singular character. This exploration aims to give benchmark data to cephalic records and face files in the Central Indian populace, contrasting these outcomes with past research. This material will be helpful to legal specialists, anatomists, and others in related disciplines. The cephalic record (CI), in some cases called the cranial list, is the proportion of the head's maximal expansiveness to length. The motivation behind the review was to research the anthropometry of cranial qualities. The review's objective was to check out the anthropometry of cranial qualities utilizing google structures circled in the school gatherings. Experts in scientific science will see the data as advantageous, just as in clinical, medico-lawful, anthropological, and excavator settings.

Get full-text (via PubEx)

BH9, a New Comprehensive Benchmark Data Set for Barrier Heights and Reaction Energies: Assessment of Density Functional Approximations and Basis Set Incompleteness Potentials

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.1c00694 ◽

2021 ◽

Author(s):

Viki Kumar Prasad ◽

Zhipeng Pei ◽

Simon Edelmann ◽

Alberto Otero-de-la-Roza ◽

Gino A. DiLabio

Keyword(s):

Density Functional ◽

Basis Set ◽

Data Set ◽

Barrier Heights ◽

Benchmark Data ◽

Reaction Energies

Get full-text (via PubEx)

COMPARISON OF UNSTEADY REYNOLDS-AVERAGED NAVIER-STOKES PREDICTION OF SELF-PROPELLED CONTAINER SHIP SQUAT AGAINST EMPIRICAL METHODS AND BENCHMARK DATA

The International Journal of Maritime Engineering ◽

10.5750/ijme.v162ia2.1131 ◽

2021 ◽

Vol 162 (A2) ◽

Author(s):

Z Kok ◽

J T Duffy ◽

S Chai ◽

Y Jin

Keyword(s):

Water Depth ◽

High Speed ◽

Navier Stokes ◽

Container Ship ◽

Benchmark Data ◽

High Speeds ◽

Port Throughput ◽

Finite Water Depth ◽

Urans Cfd ◽

Effective Wake

The demand to increase port throughput has driven container ships to travel relatively fast in shallow water whilst avoiding grounding and hence, there is need for more accurate high-speed squat predictions. A study has been undertaken to determine the most suitable method to predict container ship squat when travelling at relatively high speeds (Frh ≥ 0.5) in finite water depth (1.1 ≤ h/T ≤ 1.3). The accuracy of two novel self-propelled URANS CFD squat model are compared with that of readily available empirical squat prediction formulae. Comparison of the CFD and empirical predictions with benchmark data demonstrates that for very low water depth (h/T < 1.14) and when Frh < 0.46; Barass II (1979), ICORELS (1980), and Millward’s (1992) formulae have the best correlation with benchmark data for all cases investigated. However, at relatively high speeds (Frh ≥ 0.5) which is achievable in deeper waters (h/T ≥ 1.14), most of the empirical formulae severely underestimated squat (7-49%) whereas the quasi-static CFD model presented has the best correlation. The changes in wave patterns and effective wake fraction with respect to h/T are also presented.

Get full-text (via PubEx)

Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Sensors ◽

10.3390/s21248281 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8281

Author(s):

Rundong Yang ◽

Kangfeng Zheng ◽

Bin Wu ◽

Chunhua Wu ◽

Xiujuan Wang

Keyword(s):

Random Forest ◽

Third Party ◽

Web Content ◽

Accuracy Rate ◽

Benchmark Data ◽

Proposed Model ◽

Extreme Model ◽

Multi Level ◽

Winner Take All ◽

Different Levels

Phishing has become one of the biggest and most effective cyber threats, causing hundreds of millions of dollars in losses and millions of data breaches every year. Currently, anti-phishing techniques require experts to extract phishing sites features and use third-party services to detect phishing sites. These techniques have some limitations, one of which is that extracting phishing features requires expertise and is time-consuming. Second, the use of third-party services delays the detection of phishing sites. Hence, this paper proposes an integrated phishing website detection method based on convolutional neural networks (CNN) and random forest (RF). The method can predict the legitimacy of URLs without accessing the web content or using third-party services. The proposed technique uses character embedding techniques to convert URLs into fixed-size matrices, extract features at different levels using CNN models, classify multi-level features using multiple RF classifiers, and, finally, output prediction results using a winner-take-all approach. On our dataset, a 99.35% accuracy rate was achieved using the proposed model. An accuracy rate of 99.26% was achieved on the benchmark data, much higher than that of the existing extreme model.

Get full-text (via PubEx)

Bayesian Estimation ofWarner’s Randomized Response Technique with Transmuted Kumaraswamy Prior

Kuwait Journal of Science ◽

10.48129/kjs.13051 ◽

2021 ◽

Author(s):

Akinola Oladiran Adepetun ◽

◽

Bamidele Mustapha Oseni ◽

Olusola Samuel Makinde ◽

◽

...

Keyword(s):

Tax Evasion ◽

Unknown Parameter ◽

Prior Information ◽

Bayes Estimator ◽

Randomized Response Technique ◽

Randomized Response ◽

Bayes Estimators ◽

Population Proportion ◽

Benchmark Data ◽

Sensitive Attribute

In recent time, the Bayesian approach to randomized response technique has been used for estimating the population proportion especially of respondents possessing sensitive attributes such as induced abortion, tax evasion and shoplifting. This is done by combining suitable prior information about an unknown parameter of the population with the sample information for the estimation of the unknown parameter. In this study, possibility of using a transmuted Kumaraswamy prior is raised, yielding a new Bayes estimator for estimating population proportion of sensitive attribute for Warner’s randomized response technique. Consequently, the proposed Bayes estimator with transmuted Kumaraswamy prior is compared with existing Bayes estimators developed with a simple beta and Kumaraswamy priors in terms of their mean square error. The proposed estimator competes well with the existing estimators for some values of population proportion. The performances of Bayes estimators were also compared using some benchmark data.

Get full-text (via PubEx)

Effect of Probabilistic Similarity Measure on Metric-Based Few-Shot Classification

Applied Sciences ◽

10.3390/app112210977 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10977

Author(s):

Youngjae Lee ◽

Hyeyoung Park

Keyword(s):

Classification Performance ◽

Classification Model ◽

Statistical Characteristics ◽

Reliable Estimate ◽

Class Label ◽

Common Assumption ◽

Shot Classification ◽

Benchmark Data ◽

Two Samples ◽

Number Of Classes

In developing a few-shot classification model using deep networks, the limited number of samples in each class causes difficulty in utilizing statistical characteristics of the class distributions. In this paper, we propose a method to treat this difficulty by combining a probabilistic similarity based on intra-class statistics with a metric-based few-shot classification model. Noting that the probabilistic similarity estimated from intra-class statistics and the classifier of conventional few-shot classification models have a common assumption on the class distributions, we propose to apply the probabilistic similarity to obtain loss value for episodic learning of embedding network as well as to classify unseen test data. By defining the probabilistic similarity as the probability density of difference vectors between two samples with the same class label, it is possible to obtain a more reliable estimate of the similarity especially for the case of large number of classes. Through experiments on various benchmark data, we confirm that the probabilistic similarity can improve the classification performance, especially when the number of classes is large.

Get full-text (via PubEx)

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Sensors ◽

10.3390/s21227468 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7468

Author(s):

Yui-Kai Weng ◽

Shih-Hsu Huang ◽

Hsu-Yu Kao

Keyword(s):

Neural Network ◽

Power Consumption ◽

Convolutional Neural Network ◽

Feature Maps ◽

Benchmark Data ◽

Data Volume ◽

Block Based

In a CNN (convolutional neural network) accelerator, to reduce memory traffic and power consumption, there is a need to exploit the sparsity of activation values. Therefore, some research efforts have been paid to skip ineffectual computations (i.e., multiplications by zero). Different from previous works, in this paper, we point out the similarity of activation values: (1) in the same layer of a CNN model, most feature maps are either highly dense or highly sparse; (2) in the same layer of a CNN model, feature maps in different channels are often similar. Based on the two observations, we propose a block-based compression approach, which utilizes both the sparsity and the similarity of activation values to further reduce the data volume. Moreover, we also design an encoder, a decoder and an indexing module to support the proposed approach. The encoder is used to translate output activations into the proposed block-based compression format, while both the decoder and the indexing module are used to align nonzero values for effectual computations. Compared with previous works, benchmark data consistently show that the proposed approach can greatly reduce both memory traffic and power consumption.

Get full-text (via PubEx)

Knowledge Management and Operational Capacity in Water Utilities, a Balance between Human Resources and Digital Maturity–The Case of AGS

Water ◽

10.3390/w13223159 ◽

2021 ◽

Vol 13 (22) ◽

pp. 3159

Author(s):

João Faria Feliciano ◽

André Marques Arsénio ◽

Joana Cassidy ◽

Ana Rita Santos ◽

Alice Ganhão

Keyword(s):

Knowledge Management ◽

Human Resources ◽

Performance Indicators ◽

National Average ◽

Benchmark Data ◽

Aging Index ◽

Two Factors ◽

Human Capacity ◽

Operational Capacity

Digitalization and knowledge management in the water sector, and their impacts on performance, greatly depend on two factors: human capacity and digital maturity. To understand the link between performance, human capacity, and digital maturity, six AGS water retail utilities were compared with all Portuguese utilities using Portuguese benchmark data (2011–2019). AGS utilities achieved better results, including in compound performance indicators, which are assumed to be surrogates for digital maturity. These compound indicators were also found to correlate positively with better performance. In fact, AGS utilities show levels of non-revenue water (NRW) (<25%) below the national median (30–40%), with network replacement values similar to the national median (<0.5%). These results seem to imply that higher digital maturity can offset relatively low network replacement levels and guarantee NRW levels below the national average. Furthermore, regarding personnel aging index and digital maturity—two internally developed indicators—there was an increase in the digital maturity and aging of the staff, which, again, raises questions about long-term sustainability. The growing performance and the slight increase in digital maturity can be attributed to group-wide capacity building and digitalization programs that bring together staff from all AGS utilities in year-long activities.

Get full-text (via PubEx)

benchmark data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A simple method for unsupervised anomaly detection: An application to Web time series data

A Novel Maximum Mean Discrepancy-Based Semi-Supervised Learning Algorithm

Estimation of Cephalic Index of Population of Central India

BH9, a New Comprehensive Benchmark Data Set for Barrier Heights and Reaction Energies: Assessment of Density Functional Approximations and Basis Set Incompleteness Potentials

COMPARISON OF UNSTEADY REYNOLDS-AVERAGED NAVIER-STOKES PREDICTION OF SELF-PROPELLED CONTAINER SHIP SQUAT AGAINST EMPIRICAL METHODS AND BENCHMARK DATA

Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Bayesian Estimation ofWarner’s Randomized Response Technique with Transmuted Kumaraswamy Prior

Effect of Probabilistic Similarity Measure on Metric-Based Few-Shot Classification

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Knowledge Management and Operational Capacity in Water Utilities, a Balance between Human Resources and Digital Maturity–The Case of AGS

Export Citation Format

benchmark dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A simple method for unsupervised anomaly detection: An application to Web time series data

A Novel Maximum Mean Discrepancy-Based Semi-Supervised Learning Algorithm

Estimation of Cephalic Index of Population of Central India

BH9, a New Comprehensive Benchmark Data Set for Barrier Heights and Reaction Energies: Assessment of Density Functional Approximations and Basis Set Incompleteness Potentials

COMPARISON OF UNSTEADY REYNOLDS-AVERAGED NAVIER-STOKES PREDICTION OF SELF-PROPELLED CONTAINER SHIP SQUAT AGAINST EMPIRICAL METHODS AND BENCHMARK DATA

Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning

Bayesian Estimation ofWarner’s Randomized Response Technique with Transmuted Kumaraswamy Prior

Effect of Probabilistic Similarity Measure on Metric-Based Few-Shot Classification

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Knowledge Management and Operational Capacity in Water Utilities, a Balance between Human Resources and Digital Maturity–The Case of AGS

benchmark data
Recently Published Documents