Discovering bank risk factors from financial statements based on a new semi‐supervised text mining algorithm

2019 ◽  
Vol 59 (3) ◽  
pp. 1519-1552 ◽  
Author(s):  
Lu Wei ◽  
Guowen Li ◽  
Xiaoqian Zhu ◽  
Jianping Li
2020 ◽  
Author(s):  
Emmanuel Bonnet ◽  
Daurès Jean-Pierre ◽  
Landais Paul

Abstract Background: Literature search is challenging when thousands of articles are potentially involved. To facilitate literature search we created TEMAS a Text Mining Algorithm-assisted Search tool that we compared to a PubMed reference search (RS) in the context of etiological epidemiology.Methods: The 4 steps of TEMAS are: 1) a classic PubMed global search 2) a first sort removing articles without abstracts or containing off-topic terms 3) a clustering step with a descending hierarchical classification regrouping articles in independent classes 4) a final sort extracting from the targeted class the abstracts containing the terms of interest, with a link to the corresponding PubMed articles. Validation was performed for risk factors of breast cancer. We estimated the precision and recall rate compared to RS. Average precision and discounted cumulative gain (DCG) were also computed to perform a ranking-based evaluation. We also compared TEMAS results with articles selected in two meta-analyses.Results: For risk factors of breast cancer, breastfeeding, mammographic density, oral contraceptive, and menarche were explored. TEMAS consistently increased precision vs RS (from 23% to 32%), with a recall rate from 95% to 97%, and divided the number of selected articles to read from 2.3 to 4.8 times. Mean average precision for 100 articles was 47.4% for TEMAS vs 20.9% for PubMed ranked by best match, and DCG showed a consistent improvement for TEMAS compared to PubMed best match.Discussion: TEMAS divided the results of a literature search by 3.2, and improved the precision rate, the average precision, and the DCG compared to RS for epidemiological studies. Reducing the number of selected articles inevitably impacted the recall rate. However, it remained satisfactory and did not bias the corpus of information. Moreover, the recall rate was 100% for the two meta-analyses we analyzed, which suggests that the loss of recall rate observed above concerned articles not relevant enough to be included in the meta-analyses.Conclusion: TEMAS provides a user-friendly interface for non-specialists of literature search confronted with thousands of articles and appeared useful for meta-analyses.


2020 ◽  
Author(s):  
Emmanuel Bonnet ◽  
Daurès Jean-Pierre ◽  
Landais Paul

Abstract Background: Literature search is challenging when thousands of articles are potentially involved. To facilitate literature search we created TEMAS a Text Mining Algorithm-assisted Search tool that we compared to a PubMed reference search (RS) in the context of etiological epidemiology.Methods: The 4 steps of TEMAS are: 1) a classic PubMed global search 2) a first sort removing articles without abstracts or containing off-topic terms 3) a clustering step with a descending hierarchical classification regrouping articles in independent classes 4) a final sort extracting from the targeted class the abstracts containing the terms of interest, with a link to the corresponding PubMed articles. Validation was performed for risk factors of breast cancer. We estimated the precision and recall rate compared to RS. Average precision and discounted cumulative gain (DCG) were also computed to perform a ranking-based evaluation. We also compared TEMAS results with articles selected in two meta-analyses.Results: For risk factors of breast cancer, breastfeeding, mammographic density, oral contraceptive, and menarche were explored. TEMAS consistently increased precision vs RS (from 23% to 32%), with a recall rate from 95% to 97%, and divided the number of selected articles to read from 2.3 to 4.8 times. Mean average precision for 100 articles was 47.4% for TEMAS vs 20.9% for PubMed ranked by best match, and DCG showed a consistent improvement for TEMAS compared to PubMed best match.Discussion: TEMAS divided the results of a literature search by 3.2, and improved the precision rate, the average precision, and the DCG compared to RS for epidemiological studies. Reducing the number of selected articles inevitably impacted the recall rate. However, it remained satisfactory and did not bias the corpus of information. Moreover, the recall rate was 100% for the two meta-analyses we analyzed, which suggests that the loss of recall rate observed above concerned articles not relevant enough to be included in the meta-analyses.Conclusion: TEMAS provides a user-friendly interface for non-specialists of literature search confronted with thousands of articles and appeared useful for meta-analyses.


2021 ◽  
Vol 11 (15) ◽  
pp. 6834
Author(s):  
Pradeepa Sampath ◽  
Nithya Shree Sridhar ◽  
Vimal Shanmuganathan ◽  
Yangsun Lee

Tuberculosis (TB) is one of the top causes of death in the world. Though TB is known as the world’s most infectious killer, it can be treated with a combination of TB drugs. Some of these drugs can be active against other infective agents, in addition to TB. We propose a framework called TREASURE (Text mining algoRithm basEd on Affinity analysis and Set intersection to find the action of tUberculosis dRugs against other pathogEns), which particularly focuses on the extraction of various drug–pathogen relationships in eight different TB drugs, namely pyrazinamide, moxifloxacin, ethambutol, isoniazid, rifampicin, linezolid, streptomycin and amikacin. More than 1500 research papers from PubMed are collected for each drug. The data collected for this purpose are first preprocessed, and various relation records are generated for each drug using affinity analysis. These records are then filtered based on the maximum co-occurrence value and set intersection property to obtain the required inferences. The inferences produced by this framework can help the medical researchers in finding cures for other bacterial diseases. Additionally, the analysis presented in this model can be utilized by the medical experts in their disease and drug experiments.


2021 ◽  
Vol 138 ◽  
pp. 105216
Author(s):  
Na XU ◽  
Ling MA ◽  
Qing Liu ◽  
Li WANG ◽  
Yongliang Deng
Keyword(s):  

Author(s):  
Muhammad Sajjad Hussain ◽  
Muhammad Muhaizam Bin Musa Musa ◽  
Muhammad Muhaizam Bin Musa Musa ◽  
Abdelnaser Omran Ali

The financial crisis of 2007-09 was converted the focus of researchers and regulators toward bank risk-taking and this study is also analyzed the private ownership structure impact on Pakistani bank’s risk-taking. This study selects the all Pakistani private banks for investigation and data is collected from financial statements from 2005 to 2016. Most of the past studies found a negative impact of private ownership structure on bank risk-taking and this study is also indicated the negative relationship between private ownership and bank risk-taking. On the other, non-performing loans are double than the international standards that highlighted the owner’s attention toward high risky investments for high return. Thus, this study suggests that check this relationship with other factors that forced the owner’s behavior toward risk.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S480-S480
Author(s):  
Robert Lucero ◽  
Ragnhildur Bjarnadottir

Abstract Two hundred and fifty thousand older adults die annually in United States hospitals because of iatrogenic conditions (ICs). Clinicians, aging experts, patient advocates and federal policy makers agree that there is a need to enhance the safety of hospitalized older adults through improved identification and prevention of ICs. To this end, we are building a research program with the goal of enhancing the safety of hospitalized older adults by reducing ICs through an effective learning health system. Leveraging unique electronic data and healthcare system and human resources at the University of Florida, we are applying a state-of-the-art practice-based data science approach to identify risk factors of ICs (e.g., falls) from structured (i.e., nursing, clinical, administrative) and unstructured or text (i.e., registered nurse’s progress notes) data. Our interdisciplinary academic-clinical partnership includes scientific and clinical experts in patient safety, care quality, health outcomes, nursing and health informatics, natural language processing, data science, aging, standardized terminology, clinical decision support, statistics, machine learning, and hospital operations. Results to date have uncovered previously unknown fall risk factors within nursing (i.e., physical therapy initiation), clinical (i.e., number of fall risk increasing drugs, hemoglobin level), and administrative (i.e., Charlson Comorbidity Index, nurse skill mix, and registered nurse staffing ratio) structured data as well as patient cognitive, environmental, workflow, and communication factors in text data. The application of data science methods (i.e., machine learning and text-mining) and findings from this research will be used to develop text-mining pipelines to support sustained data-driven interdisciplinary aging studies to reduce ICs.


2020 ◽  
Vol 70 (3) ◽  
pp. 203-206
Author(s):  
L Uronen ◽  
H Moen ◽  
S Teperi ◽  
K-P Martimo ◽  
J Hartiala ◽  
...  

Abstract Background Psychosocial risk factors influence early retirement and absence from work. Health checks by occupational health nurses (OHNs) may prevent deterioration of work ability. Health checks are documented electronically mostly as free text, and therefore the effect of psychological risk factors on working capacity is difficult to detect. Aims To evaluate the potential of text mining for automated early detection of psychosocial risk factors by examining health check free-text documentation, which may indicate medical statements recommending early retirement, prolonged sick leave or rehabilitation. Psychosocial risk factors were extracted from OHN documentation in a nationwide occupational health care registry. Methods Analysis of health check documentation and medical statements regarding pension, sick leave and rehabilitation. Annotations of 13 psychosocial factors based on the Prima-EF standard (PAS 1010) were used with a combination of unsupervised machine learning, a document search engine and manual filtering. Results Health check documentation was analysed for 7078 employees. In 83% of their health checks, psychosocial risk factors were mentioned. All of these occurred more frequently in the group that received medical statements for pension, rehabilitation or sick leave than the group that did not receive medical statement. Documentation of career development and work control indicated future loss of work ability. Conclusions This study showed that it was possible to detect risk factors for sick leave, rehabilitation and pension from free-text documentation of health checks. It is suggested to develop a text mining tool to automate the detection of psychosocial risk factors at an early stage.


2015 ◽  
Vol 6 (4) ◽  
pp. 35-49 ◽  
Author(s):  
Laurent Issertial ◽  
Hiroshi Tsuji

This paper proposes a system called CFP Manager specialized on IT field and designed to ease the process of searching conference suitable to one's need. At present, the handling of CFP faces two problems: for emails, the huge quantity of CFP received can be easily skimmed through. For websites, the reviewing of some of the main CFP aggregators available online points out the lack of usable criteria. This system proposes to answer to these problems via its architecture consisting of three components: firstly an Information Extraction module extracting relevant information (as date, location, etc...) from CFP using rule based text mining algorithm. The second component enriches the now extracted data with external one from ontology models. Finally the last one displays the said data and allows the end user to perform complex queries on the CFP dataset and thus allow him to only access to CFP suitable for him. In order to validate the authors' proposal, they eventually process the well-known precision / recall metric on our information extraction component with an average of 0.95 for precision and 0.91 for recall on three different 100 CFP dataset. This paper finally discusses the validity of our approach by confronting our system for different queries with two systems already available online (WikiCFP and IEEE Conference Search) and basic text searching approach standing for searching in an email box. On a 100 CFP dataset with the wide variety of usable data and the possibility to perform complex queries we surpass basic text searching method and WikiCFP by not returning the false positive usually returned by them and find a result close to the IEEE system.


Sign in / Sign up

Export Citation Format

Share Document