The Coal Production Anomaly Detection Based on Data Mining

2012 ◽  
Vol 239-240 ◽  
pp. 744-748
Author(s):  
Guang Hui Wang ◽  
Ya Li Kuang ◽  
Zhang Guo Wang

Choose data Mining to study the anomaly detection in coal preparation, using ash of raw coal , rapid ash and yields of raw coal which density below 1.45, and ash and actual yields of fine coal in the database as sample attribute of coal production anomaly detection model, based on Box-plot analysis, the evaluating values range of five attribute above are determined. On this condition, by using SVM and KNN, the identification model of anomaly detection in coal preparation is established. The Receiver Operating Characteristic curves analysis result shows judging production target Abnormal Conditions using SVM will be more accurate in coal preparation.

2003 ◽  
Vol 17 (1) ◽  
pp. 109-114 ◽  
Author(s):  
S.A. Gansky

Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports ( e.g., Kattan et al., 1998 ) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses ( Schwarzer et al., 2000 ) can be carefully avoided.


Author(s):  
Ahmed Moumena

Receiver operating characteristic (ROC) curve is an important technique for organizing classifiers and visualizing their performance in tactical systems in the presence of jamming signal. ROC curves are commonly used to evaluate the performance of classifiers for anomalies detection. This paper gives a survey of ROC analysis based on the anomaly detection using classifiers for using them in research. In recent years have been increasingly adopted in the machine learning and data mining research communities. This survey gives definitions of the anomaly detection theory and how to use one ROC curve, what a ROC curve, when we use ROC curves.


Author(s):  
Gunjan Saraogi ◽  
Deepa Gupta ◽  
Lavanya Sharma ◽  
Ajay Rana

Background: Backorders are an accepted abnormality affecting accumulation alternation and logistics, sales, chump service, and manufacturing, which generally leads to low sales and low chump satisfaction. A predictive archetypal can analyse which articles are best acceptable to acquaintance backorders giving the alignment advice and time to adjust, thereby demography accomplishes to aerate their profit. Objective: To address the issue of predicting backorders, this paper has proposed an un-supervised approach to backorder prediction using Deep Autoencoder. Method: In this paper, artificial intelligence paradigms are researched in order to introduce a predictive model for the present unbalanced data issues, where the number of products going on backorder is rare. Result: Un-supervised anomaly detection using deep auto encoders has shown better Area under the Receiver Operating Characteristic and precision-recall curves than supervised classification techniques employed with resampling techniques for imbalanced data problems. Conclusion: We demonstrated that Un-supervised anomaly detection methods specifically deep auto-encoders can be used to learn a good representation of the data. The method can be used as predictive model for inventory management and help to reduce bullwhip effect, raise customer satisfaction as well as improve operational management in the organization. This technology is expected to create the sentient supply chain of the future – able to feel, perceive and react to situations at an extraordinarily granular level


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4237
Author(s):  
Hoon Ko ◽  
Kwangcheol Rim ◽  
Isabel Praça

The biggest problem with conventional anomaly signal detection using features was that it was difficult to use it in real time and it requires processing of network signals. Furthermore, analyzing network signals in real-time required vast amounts of processing for each signal, as each protocol contained various pieces of information. This paper suggests anomaly detection by analyzing the relationship among each feature to the anomaly detection model. The model analyzes the anomaly of network signals based on anomaly feature detection. The selected feature for anomaly detection does not require constant network signal updates and real-time processing of these signals. When the selected features are found in the received signal, the signal is registered as a potential anomaly signal and is then steadily monitored until it is determined as either an anomaly or normal signal. In terms of the results, it determined the anomaly with 99.7% (0.997) accuracy in f(4)(S0) and in case f(4)(REJ) received 11,233 signals with a normal or 171anomaly judgment accuracy of 98.7% (0.987).


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Lei Xi ◽  
Chunqing Yang

AbstractObjectivesThe main aim of the present study was to assess the diagnostic value of alpha-l-fucosidase (AFU) for hepatocellular carcinoma (HCC).MethodsStudies that explored the diagnostic value of AFU in HCC were searched in EMBASE, SCI, and PUBMED. The sensitivity, specificity, and DOR about the accuracy of serum AFU in the diagnosis of HCC were pooled. The methodological quality of each article was evaluated with QUADAS-2 (quality assessment for studies of diagnostic accuracy 2). Receiver operating characteristic curves (ROC) analysis was performed. Statistical analysis was conducted by using Review Manager 5 and Open Meta-analyst.ResultsEighteen studies were selected in this study. The pooled estimates for AFU vs. α-fetoprotein (AFP) in the diagnosis of HCC in 18 studies were as follows: sensitivity of 0.7352 (0.6827, 0.7818) vs. 0.7501 (0.6725, 0.8144), and specificity of 0.7681 (0.6946, 0.8283) vs. 0.8208 (0.7586, 0.8697), diagnostic odds ratio (DOR) of 7.974(5.302, 11.993) vs. 13.401 (8.359, 21.483), area under the curve (AUC) of 0.7968 vs. 0.8451, respectively.ConclusionsAFU is comparable to AFP for the diagnosis of HCC.


2021 ◽  
pp. 096228022199595
Author(s):  
Yalda Zarnegarnia ◽  
Shari Messinger

Receiver operating characteristic curves are widely used in medical research to illustrate biomarker performance in binary classification, particularly with respect to disease or health status. Study designs that include related subjects, such as siblings, usually have common environmental or genetic factors giving rise to correlated biomarker data. The design could be used to improve detection of biomarkers informative of increased risk, allowing initiation of treatment to stop or slow disease progression. Available methods for receiver operating characteristic construction do not take advantage of correlation inherent in this design to improve biomarker performance. This paper will briefly review some developed methods for receiver operating characteristic curve estimation in settings with correlated data from case–control designs and will discuss the limitations of current methods for analyzing correlated familial paired data. An alternative approach using conditional receiver operating characteristic curves will be demonstrated. The proposed approach will use information about correlation among biomarker values, producing conditional receiver operating characteristic curves that evaluate the ability of a biomarker to discriminate between affected and unaffected subjects in a familial paired design.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Richard Zuech ◽  
Taghi M. Khoshgoftaar

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.


Sign in / Sign up

Export Citation Format

Share Document