scholarly journals Prediction and risk stratification from hospital discharge records based on Hierarchical sLDA

2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Guanglei Yu ◽  
Linlin Zhang ◽  
Ying Zhang ◽  
Jiaqi Zhou ◽  
Tao Zhang ◽  
...  

Abstract Background The greatly accelerated development of information technology has conveniently provided adoption for risk stratification, which means more beneficial for both patients and clinicians. Risk stratification offers accurate individualized prevention and therapeutic decision making etc. Hospital discharge records (HDRs) routinely include accurate conclusions of diagnoses of the patients. For this reason, in this paper, we propose an improved model for risk stratification in a supervised fashion by exploring HDRs about coronary heart disease (CHD). Methods We introduced an improved four-layer supervised latent Dirichlet allocation (sLDA) approach called Hierarchical sLDA model, which categorized patient features in HDRs as patient feature-value pairs in one-hot way according to clinical guidelines for lab test of CHD. To address the data missing and imbalance problem, RFs and SMOTE methods are used respectively. After TF-IDF processing of datasets, variational Bayes expectation-maximization method and generalized linear model were used to recognize the latent clinical state of a patient, i.e., risk stratification, as well as to predict CHD. Accuracy, macro-F1, training and testing time performance were used to evaluate the performance of our model. Results According to the characteristics of our datasets, i.e., patient feature-value pairs, we construct a supervised topic model by adding one more Dirichlet distribution hyperparameter to sLDA. Compared with established supervised algorithm Multi-class sLDA model, we demonstrate that our proposed approach enhances training time by 59.74% and testing time by 25.58% but almost no loss of average prediction accuracy on our datasets. Conclusions A model for risk stratification and prediction of CHD based on sLDA model was proposed. Experimental results show that Hierarchical sLDA model we proposed is competitive in time performance and accuracy. Hierarchical processing of patient features can significantly improve the disadvantages of low efficiency and time-consuming Gibbs sampling of sLDA model.

2021 ◽  
pp. 1-13
Author(s):  
Dangguo Shao ◽  
Chengyao Li ◽  
Chusheng Huang ◽  
Qing An ◽  
Yan Xiang ◽  
...  

Aiming at the low effectiveness of short texts feature extraction, this paper proposes a short texts classification model based on the improved Wasserstein-Latent Dirichlet Allocation (W-LDA), which is a neural network topic model based on the Wasserstein Auto-Encoder (WAE) framework. The improvements of W-LDA are as follows: Firstly, the Bag of Words (BOW) input in the W-LDA is preprocessed by Term Frequency–Inverse Document Frequency (TF-IDF); Subsequently, the prior distribution of potential topics in W-LDA is replaced from the Dirichlet distribution to the Gaussian mixture distribution, which is based on the Variational Bayesian inference; And then the sparsemax function layer is introduced after the hidden layer inferred by the encoder network to generate a sparse document-topic distribution with better topic relevance, the improved W-LDA is named the Sparse Wasserstein-Variational Bayesian Gaussian mixture model (SW-VBGMM); Finally, the document-topic distribution generated by SW-VBGMM is input to BiGRU (Bidirectional Gating Recurrent Unit) for the deep feature extraction and the short texts classification. Experiments on three Chinese short texts datasets and one English dataset represent that our model is better than some common topic models and neural network models in the four evaluation indexes (accuracy, precision, recall, F1 value) of text classification.


Author(s):  
Xi Liu ◽  
Yongfeng Yin ◽  
Haifeng Li ◽  
Jiabin Chen ◽  
Chang Liu ◽  
...  

AbstractExisting software intelligent defect classification approaches do not consider radar characters and prior statistics information. Thus, when applying these appaoraches into radar software testing and validation, the precision rate and recall rate of defect classification are poor and have effect on the reuse effectiveness of software defects. To solve this problem, a new intelligent defect classification approach based on the latent Dirichlet allocation (LDA) topic model is proposed for radar software in this paper. The proposed approach includes the defect text segmentation algorithm based on the dictionary of radar domain, the modified LDA model combining radar software requirement, and the top acquisition and classification approach of radar software defect based on the modified LDA model. The proposed approach is applied on the typical radar software defects to validate the effectiveness and applicability. The application results illustrate that the prediction precison rate and recall rate of the poposed approach are improved up to 15 ~ 20% compared with the other defect classification approaches. Thus, the proposed approach can be applied in the segmentation and classification of radar software defects effectively to improve the identifying adequacy of the defects in radar software.


2012 ◽  
Vol 17 (5) ◽  
pp. 869-878 ◽  
Author(s):  
Heather B. Clayton ◽  
William M. Sappenfield ◽  
Elizabeth Gulitz ◽  
Charles S. Mahan ◽  
Donna J. Petersen ◽  
...  

2018 ◽  
Vol 14 (2) ◽  
pp. 159-166 ◽  
Author(s):  
Kumar Mukherjee ◽  
Khalid M Kamal

Background Atrial fibrillation is a significant risk factor for ischemic stroke and increases cost of treatment. Aims To estimate the incremental inpatient cost and length of stay due to atrial fibrillation among adults hospitalized with a primary diagnosis of ischemic stroke after controlling for sociodemographic, clinical, and hospital characteristics in a nationally representative discharge record of US population. Methods Hospital discharge records with a primary diagnosis of ischemic stroke were identified from the National Inpatient Sample data for the years 2010–2013. Generalized linear model with log link and least-square means were utilized to estimate the incremental inpatient cost and length of stay in ischemic stroke due to atrial fibrillation after controlling for sociodemographic, clinical, and hospital characteristics. Results Among 434,544 hospital discharge records with a primary diagnosis of ischemic stroke, 90,190 (20.76%) discharge records had a secondary diagnosis of atrial fibrillation. The average inpatient cost for all discharge records with a primary diagnosis of ischemic stroke was (mean = $13,072, median = $9270.87) significantly (p < 0.0001) higher compared to all discharge records without ischemic stroke (mean = $12,543.07, median = $7517.13). The mean length of stay for all records was 4.55 days (95% CI = 4.53–4.56). Among those identified with ischemic stroke, adjusted mean inpatient cost was higher by $2829 (95% CI = $2708–$2949) and mean length of stay was greater by 0.85 (95% CI = 0.81–0.89) for those with atrial fibrillation compared to those without. Conclusions The presence of atrial fibrillation was associated with increased inpatient cost and length of stay among patients diagnosed with ischemic stroke. Increased inpatient cost and length of stay call for a more comprehensive patient care approach including targeted interventions among adults diagnosed with ischemic stroke and atrial fibrillation, which could potentially reduce the overall cost in this population.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 415
Author(s):  
Jinli Wang ◽  
Yong Fan ◽  
Hui Zhang ◽  
Libo Feng

Tracking scientific and technological (S&T) research hotspots can help scholars to grasp the status of current research and develop regular patterns in the field over time. It contributes to the generation of new ideas and plays an important role in promoting the writing of scientific research projects and scientific papers. Patents are important S&T resources, which can reflect the development status of the field. In this paper, we use topic modeling, topic intensity, and evolutionary computing models to discover research hotspots and development trends in the field of blockchain patents. First, we propose a time-based dynamic latent Dirichlet allocation (TDLDA) modeling method based on a probabilistic graph model and knowledge representation learning for patent text mining. Second, we present a computational model, topic intensity (TI), that expresses the topic strength and evolution. Finally, the point-wise mutual information (PMI) value is used to evaluate topic quality. We obtain 20 hot topics through TDLDA experiments and rank them according to the strength calculation model. The topic evolution model is used to analyze the topic evolution trend from the perspectives of rising, falling, and stable. From the experiments we found that 8 topics showed an upward trend, 6 topics showed a downward trend, and 6 topics became stable or fluctuated. Compared with the baseline method, TDLDA can have the best effect when K is 40 or less. TDLDA is an effective topic model that can extract hot topics and evolution trends of blockchain patent texts, which helps researchers to more accurately grasp the research direction and improves the quality of project application and paper writing in the blockchain technology domain.


2018 ◽  
Vol 251 ◽  
pp. 06020 ◽  
Author(s):  
David Passmore ◽  
Chungil Chae ◽  
Yulia Kustikova ◽  
Rose Baker ◽  
Jeong-Ha Yim

A topic model was explored using unsupervised machine learning to summarized free-text narrative reports of 77,215 injuries that occurred in coal mines in the USA between 2000 and 2015. Latent Dirichlet Allocation modeling processes identified six topics from the free-text data. One topic, a theme describing primarily injury incidents resulting in strains and sprains of musculoskeletal systems, revealed differences in topic emphasis by the location of the mine property at which injuries occurred, the degree of injury, and the year of injury occurrence. Text narratives clustered around this topic refer most frequently to surface or other locations rather than underground locations that resulted in disability and that, also, increased secularly over time. The modeling success enjoyed in this exploratory effort suggests that additional topic mining of these injury text narratives is justified, especially using a broad set of covariates to explain variations in topic emphasis and for comparison of surface mining injuries with injuries occurring during site preparation for construction.


Sign in / Sign up

Export Citation Format

Share Document