scholarly journals Data Mining to Identify Anomalies in Public Procurement Rating Parameters

Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2873
Author(s):  
Yeferson Torres-Berru ◽  
Vivian F. López Batista

The awarding of public procurement processes is one of the main causes of corruption in governments, due to the fact that in many cases, contracts are awarded to previously agreed suppliers (favouritism); for this selection, the qualification parameters of a process play a fundamental role, seeing as due to their manipulation, bidders with high prices win, causing prejudice to the state. This study identifies processes with anomalies and generates a model for detecting possible corruption in the assignment of process qualification parameters in public procurement. A multi-phase model was used (the identification of anomalies and generation of the detection model), which uses different algorithms, such as clustering (K-Means), Self-Organizing map (SOM), Support Vector Machine (SVM) and Principal Component Analysis (PCA). SOM was used to determine the level of influence of each rating parameter, K-Means to create groups by clustering, semi-supervised learning with SVM and PCA to generate a model to detect anomalies in the processes. By means of a case study, four groups of processes were obtained, highlighting the presence of the group “null economic offer” where the values for the economic offer do not exceed 1%, and a greater weight is given to other qualification parameters, which include direct contracting. The processes in this cluster are considered anomalous. Following this methodology, a semi-supervised learning model is built for the detection of anomalies, which obtains an accuracy of 95%, allowing the detection of procedures where the aim is to benefit a particular supplier by means of the qualification assignment parameters.

2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


2021 ◽  
Vol 15 (4) ◽  
pp. 18-30
Author(s):  
Om Prakash Samantray ◽  
Satya Narayan Tripathy

There are several malware detection techniques available that are based on a signature-based approach. This approach can detect known malware very effectively but sometimes may fail to detect unknown or zero-day attacks. In this article, the authors have proposed a malware detection model that uses operation codes of malicious and benign executables as the feature. The proposed model uses opcode extract and count (OPEC) algorithm to prepare the opcode feature vector for the experiment. Most relevant features are selected using extra tree classifier feature selection technique and then passed through several supervised learning algorithms like support vector machine, naive bayes, decision tree, random forest, logistic regression, and k-nearest neighbour to build classification models for malware detection. The proposed model has achieved a detection accuracy of 98.7%, which makes this model better than many of the similar works discussed in the literature.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Shih-Ting Yang ◽  
Jiann-Der Lee ◽  
Tzyh-Chyang Chang ◽  
Chung-Hsien Huang ◽  
Jiun-Jie Wang ◽  
...  

In this study, an MRI-based classification framework was proposed to distinguish the patients with AD and MCI from normal participants by using multiple features and different classifiers. First, we extracted features (volume and shape) from MRI data by using a series of image processing steps. Subsequently, we applied principal component analysis (PCA) to convert a set of features of possibly correlated variables into a smaller set of values of linearly uncorrelated variables, decreasing the dimensions of feature space. Finally, we developed a novel data mining framework in combination with support vector machine (SVM) and particle swarm optimization (PSO) for the AD/MCI classification. In order to compare the hybrid method with traditional classifier, two kinds of classifiers, that is, SVM and a self-organizing map (SOM), were trained for patient classification. With the proposed framework, the classification accuracy is improved up to 82.35% and 77.78% in patients with AD and MCI. The result achieved up to 94.12% and 88.89% in AD and MCI by combining the volumetric features and shape features and using PCA. The present results suggest that novel multivariate methods of pattern matching reach a clinically relevant accuracy for the a priori prediction of the progression from MCI to AD.


Author(s):  
N. A. Correa-Muñoz ◽  
C. A. Murillo-Feo

<p><strong>Abstract.</strong> SAR polarimetry (PolSAR) is a method that can be used to investigate landslides. Polarimetric scattering power decomposition allows to separate the total power received by the SAR antenna, which is divided in surface scattering power, double bounce scattering and volume scattering power. Polarimetric indices are expected to serve for landslide recognition, because landslides’ scattering properties are different from those of the surrounding forested areas. The surface scattering mechanism is mainly caused by rough surfaces like bare soil and agricultural fields, so we hope that this will be the predominant dispersion mechanism in landslides. In a study area located in south-western Colombia, we used dual-Pol provided by ESA’s Sentinel-1 satellites and quad-pol from NASA’s UAVSAR aerial platform. Using C-band and L-band radar images, we analysed the interaction between radar signals and landslides. First, with dual-pol we found backscatter calibrate coefficients over four GRD radar images acquired between 2015 and 2017. The analysis gave an average backscatter value of &amp;minus;14.47&amp;thinsp;dB for VH polarisation and &amp;minus;8.40&amp;thinsp;dB for VV polarisation. Then, using H-a decomposition for quad-pol data, we validated the high relationship between entropy and alpha parameter, which has the highest contribution to the first axis in a principal component analysis. These results were used to obtain an unsupervised classification of landslides, that separated the Colombian Geological Service landslide inventory in three classes characterized by the mechanism of dispersion. These results will be combined with InSAR parameters, morphometric parameters and optical spectral indexes to obtain a local detection model of landslides.</p>


2021 ◽  
Vol 11 (16) ◽  
pp. 7376
Author(s):  
Oscar Serradilla ◽  
Ekhi Zugasti ◽  
Julian Ramirez de Okariz ◽  
Jon Rodriguez ◽  
Urko Zurutuza

Predictive maintenance (PdM) has the potential to reduce industrial costs by anticipating failures and extending the work life of components. Nowadays, factories are monitoring their assets and most collected data belong to correct working conditions. Thereby, semi-supervised data-driven models are relevant to enable PdM application by learning from assets’ data. However, their main challenges for application in industry are achieving high accuracy on anomaly detection, diagnosis of novel failures, and adaptability to changing environmental and operational conditions (EOC). This article aims to tackle these challenges, experimenting with algorithms in press machine data of a production line. Initially, state-of-the-art and classic data-driven anomaly detection model performance is compared, including 2D autoencoder, null-space, principal component analysis (PCA), one-class support vector machines (OC-SVM), and extreme learning machine (ELM) algorithms. Then, diagnosis tools are developed supported on autoencoder’s latent space feature vector, including clustering and projection algorithms to cluster data of synthetic failure types semi-supervised. In addition, explainable artificial intelligence techniques have enabled to track the autoencoder’s loss with input data to detect anomalous signals. Finally, transfer learning is applied to adapt autoencoders to changing EOC data of the same process. The data-driven techniques used in this work can be adapted to address other industrial use cases, helping stakeholders gain trust and thus promote the adoption of data-driven PdM systems in smart factories.


2011 ◽  
Vol 189-193 ◽  
pp. 3243-3248
Author(s):  
Yu Quan Cui ◽  
Le Jun Shi ◽  
Yu Wei Fang

Using time series model, isometric transformation time series model and ARTAFIT model, we deal with acoustic signal, obtaining different sets of parameters according to different acoustic signals. We use support vector machine (SVM) to recognize different acoustic signals by analyzing different sets of parameters. When the parameter set is too large, we should first reduce order making use of principal component analysis (PCA), then we can recognize them using support vector machine. In the end, we give a case study, which indicate the results of applying our models are satisfactory.


2021 ◽  
Vol 5 (2 (113)) ◽  
pp. 37-43
Author(s):  
Abdul Azis Abdillah ◽  
Azwardi Azwardi ◽  
Sulaksana Permana ◽  
Iwan Susanto ◽  
Fuad Zainuri ◽  
...  

Currently the hospital is a place that is very vulnerable to the transmission of Covid-19, so giving birth in a hospital is very risky. In addition, the hospital currently only accepts cesarean deliveries, while mothers who can give birth vaginally are recommended to give birth in a midwife because the chances of being exposed to Covid-19 are much lower. In general, this study aims to examine the performance of the LDA-SVM method in predicting whether a prospective mother needs to undergo a C-section or simply give birth normally. The aims of this study are: 1) to determine the best parameters for building the detection model; 2) to determine the best accuracy from the model; 3) to compare the accuracies with the other methods. The data used in this study is the dataset of caesarian section. This data consists of the results of 80 pregnant women following C-section with the most important characteristics of labor problems in the clinical field. Based on the results of the experiments that have been carried out, several parameter values that provide the best results for building the detection model are obtained, namely σ (sigma) –5.9 for 70 % training data, σ=4, –6.1 and ‑6.6 for 80 % training data and σ=4 and 16 for 90 % training data. Besides, the results obtained show that the LDA-SVM method is able to classify the C-section method properly with an accuracy of up to 100 %. This research is also able to surpass the methods in previous studies. The results show that LDA-SVM for this case study generates an accuracy of 100.00 %. This method has great potential to be used by doctors used as an early detection to determine whether a mother needs to go through a C-section or simply give birth vaginally. So that mothers can prevent the transmission of Covid-19 in the hospital


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Hai Guo ◽  
Jinghua Yin ◽  
Jingying Zhao ◽  
Yuanyuan Liu ◽  
Lei Yao ◽  
...  

An automatic detection model adopting pattern recognition technology is proposed in this paper; it can realize the measurement to the element of nanocomposite film. The features of gray level cooccurrence matrix (GLCM) can be extracted from different types of surface morphology images of film; after that, the dimension reduction of film can be handled by principal component analysis (PCA). So it is possible to identify the element of film according to the Adaboost M1 algorithm of a strong classifier with ten decision tree classifiers. The experimental result shows that this model is superior to the ones of SVM (support vector machine), NN and BayesNet. The method proposed can be widely applied to the automatic detection of not only nanocomposite film element but also other nanocomposite material elements.


Sign in / Sign up

Export Citation Format

Share Document