An Innovative AI-Based System for Corruption Risks Assessment Among Corporate Managers to Support Open Source Analysis

Author(s):  
Emanuele Morra ◽  
Roberto Revetria ◽  
Danilo Pecorino ◽  
Matteo Giudici ◽  
Gabriele Galli

The paper has its focus on the creation of an innovative Natural Language Processing system for the quest of available information and consequent data analysis, aimed at reconstructing the corporate chain and monitoring the sensitive risk of corruption for people involved in command positions. Today, the greatest opportunity in finding information is represented by the Internet or other open sources, where the contents related to corporate managers are continuously posted and updated. Given the vastness of the information dimension, it seems remarkably advantageous to have an intelligent analysis system capable of independently finding, analyzing and synthesizing information related to a set of target subjects. The aim of this document is to describe a forecasting model based on Machine Learning and Artificial Intelligence techniques capable of understanding whether a news item related to an individual (sought during a due diligence process) contains information about crime, investigation, conviction, fraud, corruption or sanction relating to the subject sought. Methods based on Artificial Neural Networks and Support Vector Machine, compared one to the others, are introduced and applied for the scope. In particular, results showed the architecture based on SVM with TF-IDF matrix and test pre-processing outperforms the others discussed in this paper demonstrating high accuracy and precision in prediction new data as well.

2018 ◽  
Vol 25 (10) ◽  
pp. 1339-1350 ◽  
Author(s):  
Justin Mower ◽  
Devika Subramanian ◽  
Trevor Cohen

Abstract Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.


Sign in / Sign up

Export Citation Format

Share Document