The Coal Production Anomaly Detection Based on Data Mining

Choose data Mining to study the anomaly detection in coal preparation, using ash of raw coal , rapid ash and yields of raw coal which density below 1.45, and ash and actual yields of fine coal in the database as sample attribute of coal production anomaly detection model, based on Box-plot analysis, the evaluating values range of five attribute above are determined. On this condition, by using SVM and KNN, the identification model of anomaly detection in coal preparation is established. The Receiver Operating Characteristic curves analysis result shows judging production target Abnormal Conditions using SVM will be more accurate in coal preparation.

Download Full-text

Dental Data Mining: Potential Pitfalls and Practical Issues

Advances in Dental Research ◽

10.1177/154407370301700125 ◽

2003 ◽

Vol 17 (1) ◽

pp. 109-114 ◽

Cited By ~ 19

Author(s):

S.A. Gansky

Keyword(s):

Data Mining ◽

Operating Characteristic ◽

Regression Tree ◽

Careful Analysis ◽

Classification And Regression Tree ◽

Receiver Operating Characteristic Curves ◽

Ann Models ◽

Artificial Neural Network Ann ◽

Sensitivity Specificity ◽

Better Than

Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports ( e.g., Kattan et al., 1998 ) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses ( Schwarzer et al., 2000 ) can be carefully avoided.

Download Full-text

An adaptive smartphone anomaly detection model based on data mining

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-018-1158-6 ◽

2018 ◽

Vol 2018 (1) ◽

Cited By ~ 3

Author(s):

Xue Li Hu ◽

Lian Cheng Zhang ◽

Zhen Xing Wang

Keyword(s):

Data Mining ◽

Anomaly Detection ◽

Detection Model ◽

Model Based

Download Full-text

Anomalies Detection Based on the ROC Analysis using Classifiers in Tactical Cognitive Radio Systems: A survey

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v5.i3.pp105-116 ◽

2016 ◽

Vol 5 (3) ◽

pp. 105

Author(s):

Ahmed Moumena

Keyword(s):

Machine Learning ◽

Data Mining ◽

Cognitive Radio ◽

Anomaly Detection ◽

Roc Curve ◽

Roc Analysis ◽

Operating Characteristic ◽

Roc Curves ◽

Cognitive Radio Systems ◽

Radio Systems

Receiver operating characteristic (ROC) curve is an important technique for organizing classifiers and visualizing their performance in tactical systems in the presence of jamming signal. ROC curves are commonly used to evaluate the performance of classifiers for anomalies detection. This paper gives a survey of ROC analysis based on the anomaly detection using classifiers for using them in research. In recent years have been increasingly adopted in the machine learning and data mining research communities. This survey gives definitions of the anomaly detection theory and how to use one ROC curve, what a ROC curve, when we use ROC curves.

Download Full-text

Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves

Journal of Management Information Systems ◽

10.1080/07421222.2004.11045815 ◽

2004 ◽

Vol 21 (3) ◽

pp. 249-280 ◽

Cited By ~ 27

Author(s):

ATISH P. SINHA ◽

JERROLD H. MAY

Keyword(s):

Data Mining ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curves ◽

Receiver Operating Characteristic Curves ◽

Predictive Data Mining ◽

Receiver Operating

Download Full-text

Un-Supervised approach to backorder prediction using deep autoencoder

Recent Patents on Computer Science ◽

10.2174/2213275912666190819112609 ◽

2019 ◽

Vol 12 ◽

Author(s):

Gunjan Saraogi ◽

Deepa Gupta ◽

Lavanya Sharma ◽

Ajay Rana

Keyword(s):

Anomaly Detection ◽

Predictive Model ◽

Inventory Management ◽

Supervised Classification ◽

Operating Characteristic ◽

Imbalanced Data ◽

Bullwhip Effect ◽

Detection Methods ◽

Good Representation ◽

Operational Management

Background: Backorders are an accepted abnormality affecting accumulation alternation and logistics, sales, chump service, and manufacturing, which generally leads to low sales and low chump satisfaction. A predictive archetypal can analyse which articles are best acceptable to acquaintance backorders giving the alignment advice and time to adjust, thereby demography accomplishes to aerate their profit. Objective: To address the issue of predicting backorders, this paper has proposed an un-supervised approach to backorder prediction using Deep Autoencoder. Method: In this paper, artificial intelligence paradigms are researched in order to introduce a predictive model for the present unbalanced data issues, where the number of products going on backorder is rare. Result: Un-supervised anomaly detection using deep auto encoders has shown better Area under the Receiver Operating Characteristic and precision-recall curves than supervised classification techniques employed with resampling techniques for imbalanced data problems. Conclusion: We demonstrated that Un-supervised anomaly detection methods specifically deep auto-encoders can be used to learn a good representation of the data. The method can be used as predictive model for inventory management and help to reduce bullwhip effect, raise customer satisfaction as well as improve operational management in the organization. This technology is expected to create the sentient supply chain of the future – able to feel, perceive and react to situations at an extraordinarily granular level

Download Full-text

Influence of Features on Accuracy of Anomaly Detection for an Energy Trading System

Sensors ◽

10.3390/s21124237 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4237

Author(s):

Hoon Ko ◽

Kwangcheol Rim ◽

Isabel Praça

Keyword(s):

Anomaly Detection ◽

Real Time ◽

Feature Detection ◽

Trading System ◽

Judgment Accuracy ◽

Energy Trading ◽

Real Time Processing ◽

Detection Model ◽

Time Required ◽

The Relationship

The biggest problem with conventional anomaly signal detection using features was that it was difficult to use it in real time and it requires processing of network signals. Furthermore, analyzing network signals in real-time required vast amounts of processing for each signal, as each protocol contained various pieces of information. This paper suggests anomaly detection by analyzing the relationship among each feature to the anomaly detection model. The model analyzes the anomaly of network signals based on anomaly feature detection. The selected feature for anomaly detection does not require constant network signal updates and real-time processing of these signals. When the selected features are found in the received signal, the signal is registered as a potential anomaly signal and is then steadily monitored until it is determined as either an anomaly or normal signal. In terms of the results, it determined the anomaly with 99.7% (0.997) accuracy in f(4)(S0) and in case f(4)(REJ) received 11,233 signals with a normal or 171anomaly judgment accuracy of 98.7% (0.987).

Download Full-text

Evaluation of alpha-l-fucosidase for the diagnosis of hepatocellular carcinoma based on meta-analysis

LaboratoriumsMedizin ◽

10.1515/labmed-2019-0152 ◽

2020 ◽

Vol 0 (0) ◽

Cited By ~ 1

Author(s):

Lei Xi ◽

Chunqing Yang

Keyword(s):

Hepatocellular Carcinoma ◽

Operating Characteristic ◽

Meta Analysis ◽

Area Under The Curve ◽

Diagnostic Odds Ratio ◽

Diagnostic Value ◽

Receiver Operating Characteristic Curves ◽

Sensitivity Specificity ◽

Review Manager

AbstractObjectivesThe main aim of the present study was to assess the diagnostic value of alpha-l-fucosidase (AFU) for hepatocellular carcinoma (HCC).MethodsStudies that explored the diagnostic value of AFU in HCC were searched in EMBASE, SCI, and PUBMED. The sensitivity, specificity, and DOR about the accuracy of serum AFU in the diagnosis of HCC were pooled. The methodological quality of each article was evaluated with QUADAS-2 (quality assessment for studies of diagnostic accuracy 2). Receiver operating characteristic curves (ROC) analysis was performed. Statistical analysis was conducted by using Review Manager 5 and Open Meta-analyst.ResultsEighteen studies were selected in this study. The pooled estimates for AFU vs. α-fetoprotein (AFP) in the diagnosis of HCC in 18 studies were as follows: sensitivity of 0.7352 (0.6827, 0.7818) vs. 0.7501 (0.6725, 0.8144), and specificity of 0.7681 (0.6946, 0.8283) vs. 0.8208 (0.7586, 0.8697), diagnostic odds ratio (DOR) of 7.974(5.302, 11.993) vs. 13.401 (8.359, 21.483), area under the curve (AUC) of 0.7968 vs. 0.8451, respectively.ConclusionsAFU is comparable to AFP for the diagnosis of HCC.

Download Full-text

A conditional approach for the receiver operating characteristic curve construction to evaluate diagnostic test performance in a family-matched case–control design

Statistical Methods in Medical Research ◽

10.1177/0962280221995956 ◽

2021 ◽

pp. 096228022199595

Author(s):

Yalda Zarnegarnia ◽

Shari Messinger

Keyword(s):

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Case Control ◽

Paired Data ◽

Characteristic Curves ◽

Receiver Operating Characteristic Curves ◽

Operating Characteristic Curve ◽

Receiver Operating

Receiver operating characteristic curves are widely used in medical research to illustrate biomarker performance in binary classification, particularly with respect to disease or health status. Study designs that include related subjects, such as siblings, usually have common environmental or genetic factors giving rise to correlated biomarker data. The design could be used to improve detection of biomarkers informative of increased risk, allowing initiation of treatment to stop or slow disease progression. Available methods for receiver operating characteristic construction do not take advantage of correlation inherent in this design to improve biomarker performance. This paper will briefly review some developed methods for receiver operating characteristic curve estimation in settings with correlated data from case–control designs and will discuss the limitations of current methods for analyzing correlated familial paired data. An alternative approach using conditional receiver operating characteristic curves will be demonstrated. The proposed approach will use information about correlation among biomarker values, producing conditional receiver operating characteristic curves that evaluate the ability of a biomarker to discriminate between affected and unaffected subjects in a familial paired design.

Download Full-text

Detecting cybersecurity attacks across different network features and learners

Journal Of Big Data ◽

10.1186/s40537-021-00426-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Joffrey L. Leevy ◽

John Hancock ◽

Richard Zuech ◽

Taghi M. Khoshgoftaar

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Operating Characteristic ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Feature Selection Technique ◽

Impact Performance ◽

Detection Model ◽

Wide Range ◽

Research Questions

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.

Download Full-text