Disclosure Sentiment: Machine Learning vs. Dictionary Methods

2021 ◽  
Author(s):  
Richard Frankel ◽  
Jared Jennings ◽  
Joshua Lee

We compare the ability of dictionary-based and machine-learning methods to capture disclosure sentiment at 10-K filing and conference-call dates. Like Loughran and McDonald [Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1):35–65.], we use returns to assess sentiment. We find that measures based on machine learning offer a significant improvement in explanatory power over dictionary-based measures. Specifically, machine-learning measures explain returns at 10-K filing dates, whereas measures based on the Loughran and McDonald dictionary only explain returns at 10-K filing dates during the time period of their study. Moreover, at conference-call dates, machine-learning methods offer an improvement over the Loughran and McDonald dictionary method of a greater magnitude than the improvement of the Loughran and McDonald dictionary over the Harvard Psychosociological Dictionary. We further find that the random-forest-regression-tree method better captures disclosure sentiment than alternative algorithms, simplifying the application of the machine-learning approach. Overall, our results suggest that machine-learning methods offer an easily implementable, more powerful, and reliable measure of disclosure sentiment than dictionary-based methods. This paper was accepted by Brian Bushee, management science.

2007 ◽  
Vol 33 (3) ◽  
pp. 397-427 ◽  
Author(s):  
Raquel Fernández ◽  
Jonathan Ginzburg ◽  
Shalom Lappin

In this article we use well-known machine learning methods to tackle a novel task, namely the classification of non-sentential utterances (NSUs) in dialogue. We introduce a fine-grained taxonomy of NSU classes based on corpus work, and then report on the results of several machine learning experiments. First, we present a pilot study focused on one of the NSU classes in the taxonomy—bare wh-phrases or “sluices”—and explore the task of disambiguating between the different readings that sluices can convey. We then extend the approach to classify the full range of NSU classes, obtaining results of around an 87% weighted F-score. Thus our experiments show that, for the taxonomy adopted, the task of identifying the right NSU class can be successfully learned, and hence provide a very encouraging basis for the more general enterprise of fully processing NSUs.


2020 ◽  
Author(s):  
Matthew David Nemesure ◽  
Michael Heinz ◽  
Raphael Huang ◽  
Nicholas C. Jacobson

Background: Generalized anxiety disorder (GAD) and major depressive disorder (MDD) are highly prevalent and impairing problems, but frequently go undetected, leading to substantial treatment delays. Electronic medical records (EMRs) collect a great deal of biometric markers and patient characteristics that could foster the detection of GAD and MDD in primary care settings. Methods: We approach the problem of predicting MDD and GAD using a novel machine learning pipeline. The pipeline constitutes an ensemble of algorithmically distinct machine learning methods, including deep learning. A sample of 4,184 undergraduate students completed the study, undergoing a general health screening and completing a psychiatric assessment for MDD and GAD. Using 59 biomedical and demographic features from the general health survey and an additional set of engineered features, we trained the model to predict GAD and MDD. Results: We assessed the model's performance on a held-out test set and found an AUC of 0.72 and 0.66 for GAD, and MDD, respectively. Additionally, we used advanced techniques (Shapley values) to illuminate which features had the greatest impact on prediction for each disease. The top predictive features for MDD were “difficulty memorizing lessons”, “financial difficulties” and “alcohol consumption”. The top predictive features for GAD were the necessity for a control examination, being overweight/obese, and irregular meal consumption. Conclusions: Our results indicate a successful application of machine learning methods in detection of GAD and MDD based on EMR-like data. By identifying biomarkers of GAD and MDD, these results may be used in future research to aid in the early detection of MDD and GAD.


2008 ◽  
Vol 17 (2) ◽  
pp. 121-142 ◽  
Author(s):  
Guido Heumer ◽  
Heni Ben Amor ◽  
Bernhard Jung

This paper presents a comparison of various machine learning methods applied to the problem of recognizing grasp types involved in object manipulations performed with a data glove. Conventional wisdom holds that data gloves need calibration in order to obtain accurate results. However, calibration is a time-consuming process, inherently user-specific, and its results are often not perfect. In contrast, the present study aims at evaluating recognition methods that do not require prior calibration of the data glove. Instead, raw sensor readings are used as input features that are directly mapped to different categories of hand shapes. An experiment was carried out in which test persons wearing a data glove had to grasp physical objects of different shapes corresponding to the various grasp types of the Schlesinger taxonomy. The collected data was comprehensively analyzed using numerous classification techniques provided in an open-source machine learning toolbox. Evaluated machine learning methods are composed of (a) 38 classifiers including different types of function learners, decision trees, rule-based learners, Bayes nets, and lazy learners; (b) data preprocessing using principal component analysis (PCA) with varying degrees of dimensionality reduction; and (c) five meta-learning algorithms under various configurations where selection of suitable base classifier combinations was informed by the results of the foregoing classifier evaluation. Classification performance was analyzed in six different settings, representing various application scenarios with differing generalization demands. The results of this work are twofold: (1) We show that a reasonably good to highly reliable recognition of grasp types can be achieved—depending on whether or not the glove user is among those training the classifier—even with uncalibrated data gloves. (2) We identify the best performing classification methods for the recognition of various grasp types. To conclude, cumbersome calibration processes before productive usage of data gloves can be spared in many situations.


2020 ◽  
Author(s):  
Toni Lange ◽  
Guido Schwarzer ◽  
Thomas Datzmann ◽  
Harald Binder

AbstractBackgroundUpdating systematic reviews is often a time-consuming process involving a lot of human effort and is therefore not carried out as often as it should be. Our aim was therefore to explore the potential of machine learning methods to reduce the human workload, and to particularly also gauge the performance of deep learning methods as compared to more established machine learning methods.MethodsWe used three available reviews of diagnostic test studies as data basis. In order to identify relevant publications we used typical text pre-processing methods. The reference standard for the evaluation was the human-consensus based binary classification (inclusion, exclusion). For the evaluation of models various scenarios were generated using a grid of combinations of data preprocessing steps. Furthermore, we evaluated each machine learning approach with an approach-specific predefined grid of tuning parameters using the Brier score metric.ResultsThe best performance was obtained with an ensemble method for two of the reviews, and by a deep learning approach for the other review. Yet, the final performance of approaches is seen to strongly depend on data preparation. Overall, machine learning methods provided reasonable classification.ConclusionIt seems possible to reduce the human workload in updating systematic reviews by using machine learning methods. Yet, as the influence of data preprocessing on the final performance seems to be at least as important as choosing the specific machine learning approach, users should not blindly expect good performance just by using approaches from a popular class, such as deep learning.


Author(s):  
Andrius Daranda ◽  
Gintautas Dzemyda

Machine learning is compelling in solving various applied problems. Nevertheless, machine learning methods lack the contextual reasoning capabilities and cannot be fitted to utilize additional information about circumstances, environments, backgrounds, etc. Such information provides essential knowledge about possible reasons for particular actions. This knowledge could not be processed directly by either machine learning methods. This paper presents the context-aware machine learning approach for actor behavior contextual reasoning analysis and context-based prediction for threat assessment. Moreover, the proposed approach uses context-aware prediction to tackle the interaction between actors. An idea of the technique lies in the cooperative use of two classification methods when one way predicts an actor’s behavior. The second method discloses such predicted action (behavior) that is non-typical or unusual. Such integration of two-method allows the actor to make the self-awareness threat assessment based on relations between different actors where some multidimensional numerical data define the connections. This approach predicts the possible further situation and makes its threat assessment without any waiting for future actions. The suggested approach is based on the Decision Tree and Support Vector Method algorithm. Due to the complexity of context, marine traffic data was chosen to demonstrate the proposed approach capability. This technique could deal with the end-to-end approach for safe vessel navigation in maritime traffic with considerable ship congestion.


2018 ◽  
Vol 226 (4) ◽  
pp. 259-273 ◽  
Author(s):  
Ranjith Vijayakumar ◽  
Mike W.-L. Cheung

Abstract. Machine learning tools are increasingly used in social sciences and policy fields due to their increase in predictive accuracy. However, little research has been done on how well the models of machine learning methods replicate across samples. We compare machine learning methods with regression on the replicability of variable selection, along with predictive accuracy, using an empirical dataset as well as simulated data with additive, interaction, and non-linear squared terms added as predictors. Methods analyzed include support vector machines (SVM), random forests (RF), multivariate adaptive regression splines (MARS), and the regularized regression variants, least absolute shrinkage and selection operator (LASSO), and elastic net. In simulations with additive and linear interactions, machine learning methods performed similarly to regression in replicating predictors; they also performed mostly equal or below regression on measures of predictive accuracy. In simulations with square terms, machine learning methods SVM, RF, and MARS improved predictive accuracy and replicated predictors better than regression. Thus, in simulated datasets, the gap between machine learning methods and regression on predictive measures foreshadowed the gap in variable selection. In replications on the empirical dataset, however, improved prediction by machine learning methods was not accompanied by a visible improvement in replicability in variable selection. This disparity is explained by the overall explanatory power of the models. When predictors have small effects and noise predominates, improved global measures of prediction in a sample by machine learning methods may not lead to the robust selection of predictors; thus, in the presence of weak predictors and noise, regression remains a useful tool for model building and replication.


2020 ◽  
pp. 1-16
Author(s):  
Yuwen Tao ◽  
Yizhang Jiang ◽  
Kaijian Xia ◽  
Jing Xue ◽  
Leyuan Zhou ◽  
...  

The use of machine learning technology to recognize electrical signals of the brain is becoming increasingly popular. Compared with doctors’ manual judgment, machine learning methods are faster. However, only when its recognition accuracy reaches a high level can it be used in practice. Due to the difference in the data distributions of the training dataset and the test dataset and the lack of training samples, the classification accuracies of general machine learning algorithms are not satisfactory. In fact, among the many machine learning methods used to process epilepsy electroencephalogram (EEG) signals, most are black box methods; however, in medicine, methods with explanatory power are needed. In response to these three challenges, this paper proposes a novel technique based on domain adaptation learning, semi-supervised learning and a fuzzy system. In detail, we use domain adaptation learning to reduce deviation from the data distribution, semi-supervised learning to compensate for the lack of training samples, and the Takagi-Sugen-Kang (TSK) fuzzy system model to improve interpretability. Our experimental results show that the performance of the new method is better than those of most advanced epilepsy classification methods.


Sign in / Sign up

Export Citation Format

Share Document