scholarly journals Identification of Insider Trading Using Extreme Gradient Boosting and Multi-Objective Optimization

Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 367 ◽  
Author(s):  
Shangkun Deng ◽  
Chenguang Wang ◽  
Jie Li ◽  
Haoran Yu ◽  
Hongyu Tian ◽  
...  

Illegal insider trading identification presents a challenging task that attracts great interest from researchers due to the serious harm of insider trading activities to the investors’ confidence and the sustainable development of security markets. In this study, we proposed an identification approach which integrates XGboost (eXtreme Gradient Boosting) and NSGA-II (Non-dominated Sorting Genetic Algorithm II) for insider trading regulation. First, the insider trading cases that occurred in the Chinese security market were automatically derived, and their relevant indicators were calculated and obtained. Then, the proposed method trained the XGboost model and it employed the NSGA-II for optimizing the parameters of XGboost by using multiple objective functions. Finally, the testing samples were identified using the XGboost with optimized parameters. Its performances were empirically measured by both identification accuracy and efficiency over multiple time window lengths. Results of experiments showed that the proposed approach successfully achieved the best accuracy under the time window length of 90-days, demonstrating that relevant features calculated within the 90-days time window length could be extremely beneficial for insider trading regulation. Additionally, the proposed approach outperformed all benchmark methods in terms of both identification accuracy and efficiency, indicating that it could be used as an alternative approach for insider trading regulation in the Chinese security market. The proposed approach and results in this research is of great significance for market regulators to improve their supervision efficiency and accuracy on illegal insider trading identification.

2021 ◽  
Vol 8 ◽  
Author(s):  
Ruixia Cui ◽  
Wenbo Hua ◽  
Kai Qu ◽  
Heran Yang ◽  
Yingmu Tong ◽  
...  

Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.


Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 734 ◽  
Author(s):  
Hao-Xiang Chen ◽  
Ying Nan ◽  
Yi Yang

This paper considers a reconnaissance task assignment problem for multiple unmanned aerial vehicles (UAVs) with different sensor capacities. A modified Multi-Objective Symbiotic Organisms Search algorithm (MOSOS) is adopted to optimize UAVs’ task sequence. A time-window based task model is built for heterogeneous targets. Then, the basic task assignment problem is formulated as a Multiple Time-Window based Dubins Travelling Salesmen Problem (MTWDTSP). Double-chain encoding rules and several criteria are established for the task assignment problem under logical and physical constraints. Pareto dominance determination and global adaptive scaling factors is introduced to improve the performance of original MOSOS. Numerical simulation and Monte-Carlo simulation results for the task assignment problem are also presented in this paper, whereas comparisons with non-dominated sorting genetic algorithm (NSGA-II) and original MOSOS are made to verify the superiority of the proposed method. The simulation results demonstrate that modified SOS outperforms the original MOSOS and NSGA-II in terms of optimality and efficiency of the assignment results in MTWDTSP.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yang Gao ◽  
Yangyang Li ◽  
Yaojun Wang

PurposeThis paper aims to explore the interaction between investor attention and green security markets, including green bonds and stocks.Design/methodology/approachThis study takes the Baidu index of “green finance” as the proxy for investor attention and constructs several generalized prediction error variance decomposition models to investigate the interdependence. It further analyzes the dynamic interaction between investor attention and the return and volatility of green security markets using the rolling time window.FindingsThe empirical analysis and robustness test results reveal that the spillovers between investor attention and the return and volatility of the green bond market are relatively stable. In contrast, the spillover level between investor attention and the green stock market displays significant time-varying and asymmetric effects. Moreover, the volatility spillover between investor attention and green securities is vulnerable to major financial events, while the return spillover is extremely sensitive to market performance.Originality/valueThe conclusion further expands the practical application and theoretical framework of behavioral finance in green finance and provides a new reference for investors and regulators. Besides, this study also lays a theoretical basis for investors to focus on the practical application of volatility prediction and risk management in green securities.


Algorithms ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 283
Author(s):  
Xiao Wang ◽  
Xi Lin ◽  
Rong Wang ◽  
Kai-Qi Fan ◽  
Li-Jun Han ◽  
...  

DNA N4-methylcytosine(4mC) plays an important role in numerous biological functions and is a mechanism of particular epigenetic importance. Therefore, accurate identification of the 4mC sites in DNA sequences is necessary to understand the functional mechanism. Although some effective calculation tools have been proposed to identifying DNA 4mC sites, it is still challenging to improve identification accuracy and generalization ability. Therefore, there is a great need to build a computational tool to accurately identify the position of DNA 4mC sites. Hence, this study proposed a novel predictor XGB4mcPred, a predictor for the identification of 4mC sites trained using an extreme gradient boosting algorithm (XGBoost) and DNA sequence information. Firstly, we used the One-Hot encoding on adjacent and spaced nucleotides, dinucleotides, and trinucleotides of the original 4mC site sequences as feature vectors. Then, the importance values of the feature vectors pre-trained by the XGBoost algorithm were used as a threshold to filter redundant features, resulting in a significant improvement in the identification accuracy of the constructed XGB4mcPred predictor to identify 4mC sites. The analysis shows that there is a clear preference for nucleotide sequences between 4mC sites and non-4mC site sequences in six datasets from multiple species, and the optimized features can better distinguish 4mC sites from non-4mC sites. The experimental results of cross-validation and independent tests from six different species show that our proposed predictor XGB4mcPred significantly outperformed other state-of-the-art predictors and was improved to varying degrees compared with other state-of-the-art predictors. Additionally, the user-friendly webserver we used to developed the XGB4mcPred predictor was made freely accessible.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 613
Author(s):  
Jaakko Tervonen ◽  
Kati Pettersson ◽  
Jani Mäntyjärvi

Human cognitive capabilities are under constant pressure in the modern information society. Cognitive load detection would be beneficial in several applications of human–computer interaction, including attention management and user interface adaptation. However, current research into accurate and real-time biosignal-based cognitive load detection lacks understanding of the optimal and minimal window length in data segmentation which would allow for more timely, continuous state detection. This study presents a comparative analysis of ultra-short (30 s or less) window lengths in cognitive load detection with a wearable device. Heart rate, heart rate variability, galvanic skin response, and skin temperature features are extracted at six different window lengths and used to train an Extreme Gradient Boosting classifier to detect between cognitive load and rest. A 25 s window showed the highest accury (67.6%), which is similar to earlier studies using the same dataset. Overall, model accuracy tended to decrease as the window length decreased, and lowest performance (60.0%) was observed with a 5 s window. The contribution of different physiological features to the classification performance and the most useful features that react in short windows are also discussed. The analysis provides a promising basis for future real-time applications with wearable sensors.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Author(s):  
Irfan Ullah Khan ◽  
Nida Aslam ◽  
Malak Aljabri ◽  
Sumayh S. Aljameel ◽  
Mariam Moataz Aly Kamaleldin ◽  
...  

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.


2021 ◽  
Vol 13 (6) ◽  
pp. 1147
Author(s):  
Xiangqian Li ◽  
Wenping Yuan ◽  
Wenjie Dong

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.


Sign in / Sign up

Export Citation Format

Share Document