Identification of Insider Trading Using Extreme Gradient Boosting and Multi-Objective Optimization

Illegal insider trading identification presents a challenging task that attracts great interest from researchers due to the serious harm of insider trading activities to the investors’ confidence and the sustainable development of security markets. In this study, we proposed an identification approach which integrates XGboost (eXtreme Gradient Boosting) and NSGA-II (Non-dominated Sorting Genetic Algorithm II) for insider trading regulation. First, the insider trading cases that occurred in the Chinese security market were automatically derived, and their relevant indicators were calculated and obtained. Then, the proposed method trained the XGboost model and it employed the NSGA-II for optimizing the parameters of XGboost by using multiple objective functions. Finally, the testing samples were identified using the XGboost with optimized parameters. Its performances were empirically measured by both identification accuracy and efficiency over multiple time window lengths. Results of experiments showed that the proposed approach successfully achieved the best accuracy under the time window length of 90-days, demonstrating that relevant features calculated within the 90-days time window length could be extremely beneficial for insider trading regulation. Additionally, the proposed approach outperformed all benchmark methods in terms of both identification accuracy and efficiency, indicating that it could be used as an alternative approach for insider trading regulation in the Chinese security market. The proposed approach and results in this research is of great significance for market regulators to improve their supervision efficiency and accuracy on illegal insider trading identification.

Download Full-text

An Interpretable Early Dynamic Sequential Predictor for Sepsis-Induced Coagulopathy Progression in the Real-World Using Machine Learning

Frontiers in Medicine ◽

10.3389/fmed.2021.775047 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ruixia Cui ◽

Wenbo Hua ◽

Kai Qu ◽

Heran Yang ◽

Yingmu Tong ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Time Series Data ◽

Time Window ◽

Medical Center ◽

Characteristic Curve ◽

Series Data ◽

Gradient Boosting ◽

Early Management ◽

Extreme Gradient Boosting

Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.

Download Full-text

Multi-UAV Reconnaissance Task Assignment for Heterogeneous Targets Based on Modified Symbiotic Organisms Search Algorithm

Sensors ◽

10.3390/s19030734 ◽

2019 ◽

Vol 19 (3) ◽

pp. 734 ◽

Cited By ~ 9

Author(s):

Hao-Xiang Chen ◽

Ying Nan ◽

Yi Yang

Keyword(s):

Assignment Problem ◽

Time Window ◽

Search Algorithm ◽

Task Assignment ◽

Multiple Time ◽

Nsga Ii ◽

Symbiotic Organisms Search ◽

Simulation Results ◽

Symbiotic Organisms ◽

Task Assignment Problem

This paper considers a reconnaissance task assignment problem for multiple unmanned aerial vehicles (UAVs) with different sensor capacities. A modified Multi-Objective Symbiotic Organisms Search algorithm (MOSOS) is adopted to optimize UAVs’ task sequence. A time-window based task model is built for heterogeneous targets. Then, the basic task assignment problem is formulated as a Multiple Time-Window based Dubins Travelling Salesmen Problem (MTWDTSP). Double-chain encoding rules and several criteria are established for the task assignment problem under logical and physical constraints. Pareto dominance determination and global adaptive scaling factors is introduced to improve the performance of original MOSOS. Numerical simulation and Monte-Carlo simulation results for the task assignment problem are also presented in this paper, whereas comparisons with non-dominated sorting genetic algorithm (NSGA-II) and original MOSOS are made to verify the superiority of the proposed method. The simulation results demonstrate that modified SOS outperforms the original MOSOS and NSGA-II in terms of optimality and efficiency of the assignment results in MTWDTSP.

Download Full-text

The dynamic interaction between investor attention and green security market: an empirical study based on Baidu index

China Finance Review International ◽

10.1108/cfri-06-2021-0136 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yang Gao ◽

Yangyang Li ◽

Yaojun Wang

Keyword(s):

Behavioral Finance ◽

Time Window ◽

Error Variance ◽

Dynamic Interaction ◽

Security Market ◽

Practical Application ◽

Investor Attention ◽

Content Type ◽

Security Markets ◽

Green Finance

PurposeThis paper aims to explore the interaction between investor attention and green security markets, including green bonds and stocks.Design/methodology/approachThis study takes the Baidu index of “green finance” as the proxy for investor attention and constructs several generalized prediction error variance decomposition models to investigate the interdependence. It further analyzes the dynamic interaction between investor attention and the return and volatility of green security markets using the rolling time window.FindingsThe empirical analysis and robustness test results reveal that the spillovers between investor attention and the return and volatility of the green bond market are relatively stable. In contrast, the spillover level between investor attention and the green stock market displays significant time-varying and asymmetric effects. Moreover, the volatility spillover between investor attention and green securities is vulnerable to major financial events, while the return spillover is extremely sensitive to market performance.Originality/valueThe conclusion further expands the practical application and theoretical framework of behavioral finance in green finance and provides a new reference for investors and regulators. Besides, this study also lays a theoretical basis for investors to focus on the practical application of volatility prediction and risk management in green securities.

Download Full-text

XGB4mcPred: Identification of DNA N4-Methylcytosine Sites in Multiple Species Based on an eXtreme Gradient Boosting Algorithm and DNA Sequence Information

Algorithms ◽

10.3390/a14100283 ◽

2021 ◽

Vol 14 (10) ◽

pp. 283

Author(s):

Xiao Wang ◽

Xi Lin ◽

Rong Wang ◽

Kai-Qi Fan ◽

Li-Jun Han ◽

...

Keyword(s):

Dna Sequence ◽

Dna Sequences ◽

State Of The Art ◽

Identification Accuracy ◽

Gradient Boosting ◽

Sequence Information ◽

Feature Vectors ◽

Extreme Gradient Boosting ◽

Boosting Algorithm ◽

Multiple Species

DNA N4-methylcytosine(4mC) plays an important role in numerous biological functions and is a mechanism of particular epigenetic importance. Therefore, accurate identification of the 4mC sites in DNA sequences is necessary to understand the functional mechanism. Although some effective calculation tools have been proposed to identifying DNA 4mC sites, it is still challenging to improve identification accuracy and generalization ability. Therefore, there is a great need to build a computational tool to accurately identify the position of DNA 4mC sites. Hence, this study proposed a novel predictor XGB4mcPred, a predictor for the identification of 4mC sites trained using an extreme gradient boosting algorithm (XGBoost) and DNA sequence information. Firstly, we used the One-Hot encoding on adjacent and spaced nucleotides, dinucleotides, and trinucleotides of the original 4mC site sequences as feature vectors. Then, the importance values of the feature vectors pre-trained by the XGBoost algorithm were used as a threshold to filter redundant features, resulting in a significant improvement in the identification accuracy of the constructed XGB4mcPred predictor to identify 4mC sites. The analysis shows that there is a clear preference for nucleotide sequences between 4mC sites and non-4mC site sequences in six datasets from multiple species, and the optimized features can better distinguish 4mC sites from non-4mC sites. The experimental results of cross-validation and independent tests from six different species show that our proposed predictor XGB4mcPred significantly outperformed other state-of-the-art predictors and was improved to varying degrees compared with other state-of-the-art predictors. Additionally, the user-friendly webserver we used to developed the XGB4mcPred predictor was made freely accessible.

Download Full-text

Ultra-Short Window Length and Feature Importance Analysis for Cognitive Load Detection from Wearable Sensors

Electronics ◽

10.3390/electronics10050613 ◽

2021 ◽

Vol 10 (5) ◽

pp. 613

Author(s):

Jaakko Tervonen ◽

Kati Pettersson ◽

Jani Mäntyjärvi

Keyword(s):

Heart Rate ◽

Cognitive Load ◽

Real Time ◽

Wearable Sensors ◽

Classification Performance ◽

Gradient Boosting ◽

Window Length ◽

Load Detection ◽

Continuous State ◽

Extreme Gradient Boosting

Human cognitive capabilities are under constant pressure in the modern information society. Cognitive load detection would be beneficial in several applications of human–computer interaction, including attention management and user interface adaptation. However, current research into accurate and real-time biosignal-based cognitive load detection lacks understanding of the optimal and minimal window length in data segmentation which would allow for more timely, continuous state detection. This study presents a comparative analysis of ultra-short (30 s or less) window lengths in cognitive load detection with a wearable device. Heart rate, heart rate variability, galvanic skin response, and skin temperature features are extracted at six different window lengths and used to train an Extreme Gradient Boosting classifier to detect between cognitive load and rest. A 25 s window showed the highest accury (67.6%), which is similar to earlier studies using the same dataset. Overall, model accuracy tended to decrease as the window length decreased, and lowest performance (60.0%) was observed with a 5 s window. The contribution of different physiological features to the classification performance and the most useful features that react in short windows are also discussed. The analysis provides a promising basis for future real-time applications with wearable sensors.

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i4.1076 ◽

2020 ◽

pp. 253-262

Author(s):

Mohammad Hamim Zajuli Al Faroby ◽

Mohammad Isa Irawan ◽

Ni Nyoman Tri Puspaningsih

Keyword(s):

Protein Interactions ◽

Interaction Analysis ◽

Synthesis Process ◽

Gradient Boosting ◽

Protein Protein Interactions ◽

Central Function ◽

Extreme Gradient Boosting ◽

Main Protein ◽

The Right ◽

Roc Score

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text