Abnormal Financial Transaction Detection via AI Technology

2021 ◽  
Vol 12 (2) ◽  
pp. 24-34
Author(s):  
Zhuo Wang

Financial supervision plays an important role in the construction of anti-corruption and honesty, but financial data has the characteristics of non-stationary, non-linearity, and low signal-to-noise ratio, and there is no special training set that is used to identify abnormal financial data. This paper generates time series of financial transaction data with a weekly time span, and selects the total transaction amount, transaction dispersion coefficient, and the number of transfers as the characteristics of financial account data. The features are then input in a weighted one-class support vector machine (WOC-SVM) model to determine whether the transaction is abnormal. The weighted one-class support vector machine (WOC-SVM) is learnt on a training set which consists of massive normal transaction due to the difficulty to collect abnormal transactions. The parameters in WOC-SVM are tuned by cross-validation. The experiments on simulation data demonstrate the effectiveness of the WOC-SVM model learnt on selected features to detect suspicious values.

2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


2012 ◽  
Vol 2012 ◽  
pp. 1-10
Author(s):  
Pijush Samui

The main objective of site characterization is the prediction of in situ soil properties at any half-space point at a site based on limited tests. In this study, the Support Vector Machine (SVM) has been used to develop a three dimensional site characterization model for Bangalore, India based on large amount of Standard Penetration Test. SVM is a novel type of learning machine based on statistical learning theory, uses regression technique by introducing ε-insensitive loss function. The database consists of 766 boreholes, with more than 2700 field SPT values () spread over 220 sq km area of Bangalore. The model is applied for corrected () values. The three input variables (, , and , where , , and are the coordinates of the Bangalore) were used for the SVM model. The output of SVM was the data. The results presented in this paper clearly highlight that the SVM is a robust tool for site characterization. In this study, a sensitivity analysis of SVM parameters (σ, , and ε) has been also presented.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Hongbo Zhao ◽  
Zenghui Huang ◽  
Zhengsheng Zou

Stress-strain relationship of geomaterials is important to numerical analysis in geotechnical engineering. It is difficult to be represented by conventional constitutive model accurately. Artificial neural network (ANN) has been proposed as a more effective approach to represent this complex and nonlinear relationship, but ANN itself still has some limitations that restrict the applicability of the method. In this paper, an alternative method, support vector machine (SVM), is proposed to simulate this type of complex constitutive relationship. The SVM model can overcome the limitations of ANN model while still processing the advantages over the traditional model. The application examples show that it is an effective and accurate modeling approach for stress-strain relationship representation for geomaterials.


2020 ◽  
Vol 14 (1) ◽  
pp. 41-50 ◽  
Author(s):  
Hai-Bang Ly ◽  
Binh Thai Pham

Background: Shear strength of soil, the magnitude of shear stress that a soil can maintain, is an important factor in geotechnical engineering. Objective: The main objective of this study is dedicated to the development of a machine learning algorithm, namely Support Vector Machine (SVM) to predict the shear strength of soil based on 6 input variables such as clay content, moisture content, specific gravity, void ratio, liquid limit and plastic limit. Methods: An important number of experimental measurements, including more than 500 samples was gathered from the Long Phu 1 power plant project’s technical reports. The accuracy of the proposed SVM was evaluated using statistical indicators such as the coefficient of correlation (R), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) over a number of 200 simulations taking into account the random sampling effect. Finally, the most accurate SVM model was used to interpret the prediction results due to Partial Dependence Plots (PDP). Results: Validation results showed that SVM model performed well for prediction of soil shear strength (R = 0.9 to 0.95), and the moisture content, liquid limit and plastic limit were found as the three most affecting features to the prediction of soil shear strength. Conclusion: This study might help in quick and accurate prediction of soil shear strength for practical purposes in civil engineering.


2018 ◽  
Vol 57 (05/06) ◽  
pp. 253-260 ◽  
Author(s):  
J. Patel ◽  
Z. Siddiqui ◽  
A. Krishnan ◽  
T. Thyvalikakath

Background Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing. Objective To determine patients' detailed smoking status based on smoking intensity from the EDR. Methods First, the authors created a reference standard of 3,296 unique patients’ smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and naïve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients’ smoking histories. Results Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98%); intermittent smoker (P: 95%, R: 98%, F: 96%); past smoker (P, R, F: 89%); light smoker (P, R, F: 87%); smokers with unknown intensity (P: 76%, R: 86%, F: 81%), and intermediate smoker (P: 90%, R: 88%, F: 89%). It performed moderately to differentiate heavy smokers (P: 90%, R: 44%, F: 60%). EDR could be a valuable source for obtaining patients’ detailed smoking information. Conclusion EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR.


Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1442 ◽  
Author(s):  
Tao Shen ◽  
Hong Yu ◽  
Yuan-Zhong Wang

Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.


2020 ◽  
Vol 92 (3) ◽  
pp. 502-518 ◽  
Author(s):  
Seyed Amin Bagherzadeh

Purpose This paper aims to propose a nonlinear model for aeroelastic aircraft that can predict the flight parameters throughout the investigated flight envelopes. Design/methodology/approach A system identification method based on the support vector machine (SVM) is developed and applied to the nonlinear dynamics of an aeroelastic aircraft. In the proposed non-parametric gray-box method, force and moment coefficients are estimated based on the state variables, flight conditions and control commands. Then, flight parameters are estimated using aircraft equations of motion. Nonlinear system identification is performed using the SVM network by minimizing errors between the calculated and estimated force and moment coefficients. To that end, a least squares algorithm is used as the training rule to optimize the generalization bound given for the regression. Findings The results confirm that the SVM is successful at the aircraft system identification. The precision of the SVM model is preserved when the models are excited by input commands different from the training ones. Also, the generalization of the SVM model is acceptable at non-trained flight conditions within the trained flight conditions. Considering the precision and generalization of the model, the results indicate that the SVM is more successful than the well-known methods such as artificial neural networks. Practical implications In this paper, both the simulated and real flight data of the F/A-18 aircraft are used to provide aeroelastic models for its lateral-directional dynamics. Originality/value This paper proposes a non-parametric system identification method for aeroelastic aircraft based on the SVM method for the first time. Up to the author’s best knowledge, the SVM is not used for the aircraft system identification or the aircraft parameter estimation until now.


2010 ◽  
Vol 20-23 ◽  
pp. 147-153 ◽  
Author(s):  
Zhi Wei Huang ◽  
Jian Zhong Zhou ◽  
Li Xiang Song ◽  
Yong Chuan Zhang

According to the complex and uncertain relationships between indexes and grades of flood hazard evaluation, as well as the deficiency of measured samples, an improved support vector machine (SVM) model was established to improve accuracy and efficiency of calculation. The function that comprehensively evaluated indexes of multi-dimensional disaster situation in one-dimensional continuous space could be realized, and effectively solved the incompatible problems of different evaluation results with single index. The results showed that the model based on improved support vector machine had a better ability of generalization and calculation speed by reduce constraint conditions. It is considered to have a good application prospect in multi-index comprehensive evaluation.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Xiaoyong Liu ◽  
Hui Fu

Disease diagnosis is conducted with a machine learning method. We have proposed a novel machine learning method that hybridizes support vector machine (SVM), particle swarm optimization (PSO), and cuckoo search (CS). The new method consists of two stages: firstly, a CS based approach for parameter optimization of SVM is developed to find the better initial parameters of kernel function, and then PSO is applied to continue SVM training and find the best parameters of SVM. Experimental results indicate that the proposed CS-PSO-SVM model achieves better classification accuracy and F-measure than PSO-SVM and GA-SVM. Therefore, we can conclude that our proposed method is very efficient compared to the previously reported algorithms.


Sign in / Sign up

Export Citation Format

Share Document