scholarly journals What’s in a Trauma? Using Machine Learning to Unpack What Makes an Event Traumatic

2021 ◽  
Author(s):  
Payton J. Jones

What differentiates a trauma from an event that is merely upsetting? Wildly different definitions of trauma have been used across various settings. Yet there is a dearth of empirical work examining the features of events that individuals use to define an event as a ‘trauma’. First, a group of qualitative coders classified features (e.g., actual physical injury, loss of possessions) of 600 event descriptions (e.g., “was verbally harassed by a boss”, “watched a video of an adult being shot and killed”). Next, across two studies, machine learning was used to predict whether individuals rated event descriptions as ‘trauma’ or ‘traumatic’ in over 100,000 judgment tasks. In Study 1, examining continuous ratings, a cross-validated LASSO regression with interaction terms provided the best out-of-sample predictions (r2 = 0.76), outperforming ridge regression, support vector regression, and linear regression. In Study 2, using binary judgments, a random forest model accurately predicted out-of-sample individual responses (AUC = 0.96), outperform-ing a neural network and an AdaBoost ensemble classifier. The most important event features across the two studies were actual death, threat of death, and the presence of a human perpetrator. The most important human features in predicting judgments were political orientation and gender.

2019 ◽  
Vol 45 (10) ◽  
pp. 3193-3201 ◽  
Author(s):  
Yajuan Li ◽  
Xialing Huang ◽  
Yuwei Xia ◽  
Liling Long

Abstract Purpose To explore the value of CT-enhanced quantitative features combined with machine learning for differential diagnosis of renal chromophobe cell carcinoma (chRCC) and renal oncocytoma (RO). Methods Sixty-one cases of renal tumors (chRCC = 44; RO = 17) that were pathologically confirmed at our hospital between 2008 and 2018 were retrospectively analyzed. All patients had undergone preoperative enhanced CT scans including the corticomedullary (CMP), nephrographic (NP), and excretory phases (EP) of contrast enhancement. Volumes of interest (VOIs), including lesions on the images, were manually delineated using the RadCloud platform. A LASSO regression algorithm was used to screen the image features extracted from all VOIs. Five machine learning classifications were trained to distinguish chRCC from RO by using a fivefold cross-validation strategy. The performance of the classifier was mainly evaluated by areas under the receiver operating characteristic (ROC) curve and accuracy. Results In total, 1029 features were extracted from CMP, NP, and EP. The LASSO regression algorithm was used to screen out the four, four, and six best features, respectively, and eight features were selected when CMP and NP were combined. All five classifiers had good diagnostic performance, with area under the curve (AUC) values greater than 0.850, and support vector machine (SVM) classifier showed a diagnostic accuracy of 0.945 (AUC 0.964 ± 0.054; sensitivity 0.999; specificity 0.800), showing the best performance. Conclusions Accurate preoperative differential diagnosis of chRCC and RO can be facilitated by a combination of CT-enhanced quantitative features and machine learning.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 374 ◽  
Author(s):  
Sudhanshu Kumar ◽  
Monika Gahalawat ◽  
Partha Pratim Roy ◽  
Debi Prosad Dogra ◽  
Byung-Gyu Kim

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 2032-2032
Author(s):  
Protiva Rahman ◽  
Michele LeNoue-Newton ◽  
Sandip Chaugai ◽  
Marilyn Holt ◽  
Neha M Jain ◽  
...  

2032 Background: 30-50% of patients with non-early NSCLC will eventually develop BM, with a median survival of less than one year from BM diagnosis. There are no widely accepted clinical risk models for development of BM in patients without them at baseline. We predicted the binary risk of BM using clinical and genetic factors from a large multi-institutional cohort. Methods: Stage II-IV NSCLC patients from the AACR Project GENIE Biopharma Consortium dataset were eligible. This consisted of 4 academic institutions who curated clinical data of patients who had somatic next-generation tumor sequencing (NGS) between 2015-2017. We excluded patients who had BM at baseline, died within 30 days of NSCLC diagnosis, or did not undergo brain imaging. Covariates included demographics, anticancer therapies (received up to 90 days prior to BM development and within 5 years from NSCLC diagnosis), and NGS data; radiotherapy (RT) data were not available. NGS features included mutations and copy number alterations. These features were restricted to those classified as oncogenic by OncoKB. Univariate feature selection with Fisher’s test (p<.1) was performed on medication and genetic features. We compared 5 different machine learning models for prediction: random forest (RF), support vector machine (SVM), lasso regression, ridge regression, and an ensemble classifier. We split our data into training and test sets. 10-fold cross-validation was done on the training set for parameter tuning. The area under the receiver-operating curve (AUC) is reported on the test set. Results: 956 patients were included, 192 (20%) in the test set. Univariate features associated with BM were treatment with etoposide, Asian race, presence of bone metastases at NSCLC diagnosis, mutations in TP53 and EGFR, amplifications of ERBB2 and EGFR, and deletions of RB1, CDKN2A and CDKN2B. Univariate features inversely associated with BM were older age, treatment with nivolumab, vinorelbine, alectinib, pembrolizumab, atezolizumab, and gemcitabine, as well as mutations in NOTCH1 and KRAS. Ridge regression had the best AUC, 0.73 (Table). Conclusions: We achieved reasonable prediction performance using commonly obtained clinical and genomic information in non-early NSCLC. The biologic role of the associated alterations deserves further scrutiny; this study replicates similar findings for EGFR and KRAS in a much smaller cohort. Certain subsets of NSCLC patients may benefit from increased surveillance for BM and transition to drug therapies known to effectively cross the blood-brain barrier, e.g., nivolumab and alectinib. Inclusion of additional covariates, e.g., brain RT, may further improve model performance.[Table: see text]


2020 ◽  
Vol 13 (3) ◽  
pp. 48 ◽  
Author(s):  
Yuchen Zhang ◽  
Shigeyuki Hamori

In 1983, Meese and Rogoff showed that traditional economic models developed since the 1970s do not perform better than the random walk in predicting out-of-sample exchange rates when using data obtained after the beginning of the floating rate system. Subsequently, whether traditional economical models can ever outperform the random walk in forecasting out-of-sample exchange rates has received scholarly attention. Recently, a combination of fundamental models with machine learning methodologies was found to outcompete the predictability of random walk (Amat et al. 2018). This paper focuses on combining modern machine learning methodologies with traditional economic models and examines whether such combinations can outperform the prediction performance of random walk without drift. More specifically, this paper applies the random forest, support vector machine, and neural network models to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models). We performed a thorough robustness check using six government bonds with different maturities and four price indexes, which demonstrated the superior performance of fundamental models combined with modern machine learning in predicting future exchange rates in comparison with the results of random walk. These results were examined using a root mean squared error (RMSE) and a Diebold–Mariano (DM) test. The main findings are as follows. First, when comparing the performance of fundamental models combined with machine learning with the performance of random walk, the RMSE results show that the fundamental models with machine learning outperform the random walk. In the DM test, the results are mixed as most of the results show significantly different predictive accuracies compared with the random walk. Second, when comparing the performance of fundamental models combined with machine learning, the models using the producer price index (PPI) consistently show good predictability. Meanwhile, the consumer price index (CPI) appears to be comparatively poor in predicting exchange rate, based on its poor results in the RMSE test and the DM test.


2019 ◽  
Vol 20 (S2) ◽  
Author(s):  
Varun Khanna ◽  
Lei Li ◽  
Johnson Fung ◽  
Shoba Ranganathan ◽  
Nikolai Petrovsky

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.


2020 ◽  
Vol 5 (2) ◽  
Author(s):  
Adinda miftahul Ilmi Habiba ◽  
Agi Prasetiadi ◽  
Cepi Ramdani

Penelitian ini untuk mengetahui kualitas kesehatan terumbu karang disuatu wilayah di Indonesia dengan mengambil beberapa faktor seperti wisatawan yang datang, latitude, longtitude, suhu, tahun, populasi warga, jumlah pemuda, dan jumlah industri, dan metode yang digunakan adalah machine learning dengan algoritma K-Nearest Neighbor, Support Vector Machine, dan Ensemble Classifier, untuk ensemble menggunkan randomforest untuk mengambil cabang-cabang pohon atau fitur keputusan yang paling relevan dengan output, penelitian ini diharapkan bisa menjadi acuan bagi wilayah yang kondisi terumbu karangnya masih kurang baik dapat mencontoh wilayah yang kondisi terumbu karangnya sudah baik dengan melihat faktor apa saja yang mempengaruhi terumbu karang disuatu wilayah itu masuk kategori baik. Hasil akhir dari penelitian ini pada algoritma K-Nearest Neighbor faktor yang berpengaruh bagi kesehatan terumbu karang yaitu wisatawan yang datang, latitude, longtitude, suhu, tahum dan pupulasi warga, sementara pada algoritma Support Vector Machine faktor yang berpengaruh wisatawan yang datang, Latitude, suhu dan tahun untuk algoritma Ensemble Classifier faktor yang berpengaruh wisatawan yang datang, latitude, longtitude, suhu dan jumlah industry, Pada kasus ini algoritma Support Vector Machine memiliki kinerja lebih baik dibandingkan K-Nearest Neighbor dan Ensemble Classifier.Kata Kunci: Ekosistem, Ensemble Classifier, K-Nearest Neighbor, Machine Learning, Support Vector Machine 


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Shi-yi Liu ◽  
Rong-hui Zhu ◽  
Zi-tao Wang ◽  
Wei Tan ◽  
Li Zhang ◽  
...  

Background. Epithelial ovarian cancer (EOC) is an extremely lethal gynecological malignancy and has the potential to benefit from the immune checkpoint blockade (ICB) therapy, whose efficacy highly depends on the complex tumor microenvironment (TME). Method and Result. We comprehensively analyze the landscape of TME and its prognostic value through immune infiltration analysis, somatic mutation analysis, and survival analysis. The results showed that high infiltration of immune cells predicts favorable clinical outcomes in EOC. Then, the detailed TME landscape of the EOC had been investigated through “xCell” algorithm, Gene set variation analysis (GSVA), cytokines expression analysis, and correlation analysis. It is observed that EOC patients with high infiltrating immune cells have an antitumor phenotype and are highly correlated with immune checkpoints. We further found that dendritic cells (DCs) may play a dominant role in promoting the infiltration of immune cells into TME and forming an antitumor immune phenotype. Finally, we conducted machine-learning Lasso regression, support vector machines (SVMs), and random forest, identifying six DC-related prognostic genes (CXCL9, VSIG4, ALOX5AP, TGFBI, UBD, and CXCL11). And DC-related risk stratify model had been well established and validated. Conclusion. High infiltration of immune cells predicted a better outcome and an antitumor phenotype in EOC, and the DCs might play a dominant role in the initiation of antitumor immune cells. The well-established risk model can be used for prognostic prediction in EOC.


10.2196/23938 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23938
Author(s):  
Ruairi O'Driscoll ◽  
Jake Turicchi ◽  
Mark Hopkins ◽  
Cristiana Duarte ◽  
Graham W Horgan ◽  
...  

Background Accurate solutions for the estimation of physical activity and energy expenditure at scale are needed for a range of medical and health research fields. Machine learning techniques show promise in research-grade accelerometers, and some evidence indicates that these techniques can be applied to more scalable commercial devices. Objective This study aims to test the validity and out-of-sample generalizability of algorithms for the prediction of energy expenditure in several wearables (ie, Fitbit Charge 2, ActiGraph GT3-x, SenseWear Armband Mini, and Polar H7) using two laboratory data sets comprising different activities. Methods Two laboratory studies (study 1: n=59, age 44.4 years, weight 75.7 kg; study 2: n=30, age=31.9 years, weight=70.6 kg), in which adult participants performed a sequential lab-based activity protocol consisting of resting, household, ambulatory, and nonambulatory tasks, were combined in this study. In both studies, accelerometer and physiological data were collected from the wearables alongside energy expenditure using indirect calorimetry. Three regression algorithms were used to predict metabolic equivalents (METs; ie, random forest, gradient boosting, and neural networks), and five classification algorithms (ie, k-nearest neighbor, support vector machine, random forest, gradient boosting, and neural networks) were used for physical activity intensity classification as sedentary, light, or moderate to vigorous. Algorithms were evaluated using leave-one-subject-out cross-validations and out-of-sample validations. Results The root mean square error (RMSE) was lowest for gradient boosting applied to SenseWear and Polar H7 data (0.91 METs), and in the classification task, gradient boost applied to SenseWear and Polar H7 was the most accurate (85.5%). Fitbit models achieved an RMSE of 1.36 METs and 78.2% accuracy for classification. Errors tended to increase in out-of-sample validations with the SenseWear neural network achieving RMSE values of 1.22 METs in the regression tasks and the SenseWear gradient boost and random forest achieving an accuracy of 80% in classification tasks. Conclusions Algorithms trained on combined data sets demonstrated high predictive accuracy, with a tendency for superior performance of random forests and gradient boosting for most but not all wearable devices. Predictions were poorer in the between-study validations, which creates uncertainty regarding the generalizability of the tested algorithms.


2021 ◽  
Author(s):  
Ying Ma ◽  
Jianli Wang ◽  
Jingying Wu ◽  
Chuxuan Tong ◽  
Ting Zhang

Abstract Background: Due to graphene is currently incorporated into various consumer product and numerous new applications, determining the relationships between physicochemical properties of graphene and their toxicity is a prominent concern for environmental and health risk analysis. Data from the literatures suggested that graphene exposure may resulted in cytotoxicity, however, the toxicity data of graphene is still insufficient to point out its side because of the complexity and heterogeneity of available data on potential risks of graphene. Methods and Results: Here, we developed a meta-analysis approach for assembling published evidence on cytotoxicity based on 792 related publications, 986 cell survival rate samples, 762 IC50 samples, and 100 LDH release samples. In this study, among corresponding attributes, we proved that the cytotoxicity of graphene assessed in the form of cell viability, IC50 and LDH can be primarily predicted from exposure dose and detection method, diameter and surface modification, detection method and organ source, respectively. Furthermore, this paper provides guidance regarding three optional data sets for above-mentioned three endpoints that are chiefly related to cellular toxicity for future studies and cross-validation studies based on machine learning tools including Random Forests (RFs), Support Vector Machine (SVM), LASSO regression, and Elastic Net were conducted for result verification. Conclusions: In summary, our study indicates that following rigorous methodological experimental and extract approaches accompanied with suitable machine learning tools, in parallel to continuous addition to reliable data set developed using our meta-analysis approach, will offer higher predictive power and accuracy, and also help to provide effective information on designing safe graphene.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5022
Author(s):  
Francesco Asci ◽  
Giovanni Costantini ◽  
Pietro Di Leo ◽  
Alessandro Zampogna ◽  
Giovanni Ruoppolo ◽  
...  

Background: Experimental studies using qualitative or quantitative analysis have demonstrated that the human voice progressively worsens with ageing. These studies, however, have mostly focused on specific voice features without examining their dynamic interaction. To examine the complexity of age-related changes in voice, more advanced techniques based on machine learning have been recently applied to voice recordings but only in a laboratory setting. We here recorded voice samples in a large sample of healthy subjects. To improve the ecological value of our analysis, we collected voice samples directly at home using smartphones. Methods: 138 younger adults (65 males and 73 females, age range: 15–30) and 123 older adults (47 males and 76 females, age range: 40–85) produced a sustained emission of a vowel and a sentence. The recorded voice samples underwent a machine learning analysis through a support vector machine algorithm. Results: The machine learning analysis of voice samples from both speech tasks discriminated between younger and older adults, and between males and females, with high statistical accuracy. Conclusions: By recording voice samples through smartphones in an ecological setting, we demonstrated the combined effect of age and gender on voice. Our machine learning analysis demonstrates the effect of ageing on voice.


Sign in / Sign up

Export Citation Format

Share Document