Inducing stock market lexicons from disparate Chinese texts

2019 ◽  
Vol 120 (3) ◽  
pp. 508-525
Author(s):  
Futao Zhao ◽  
Zhong Yao ◽  
Jing Luan ◽  
Hao Liu

Purpose The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets. Design/methodology/approach This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons. Findings The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks. Originality/value This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
P. Padmavathy ◽  
S. Pakkir Mohideen ◽  
Zameer Gulzar

PurposeThe purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.Design/methodology/approachRecently, in domains like social media(SM), healthcare, hotel, car, product data, etc., research on sentiment analysis (SA) has massively increased. In addition, there is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set, their occurrence signifies a strong inclination with the other sentiment class. Hence, this paper chiefly concentrates on the drawbacks of adapting domain-dependent sentiment lexicon (DDSL) from a collection of labeled user reviews and domain-independent lexicon (DIL) for proposing a framework centered on the information theory that could predict the correct polarity of the words (positive, neutral and negative). The proposed work initially performs SWN- and PMI-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed. Finally, the predicted polarity is inputted to the mtf-idf-based SVM-NN classifier for the SC of reviews. The outcomes are examined and contrasted to the other existing techniques to verify that the proposed work has predicted the class of the reviews more effectually for different datasets.FindingsThere is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set their occurrence signifies a strong inclination with the other sentiment class.Originality/valueThe proposed work initially performs SWN- and PMI-based polarity computation, and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.


2019 ◽  
Vol 16 (2) ◽  
pp. 168-180
Author(s):  
Heng-Yu Chang ◽  
Chun-Ai Ma

Purpose As the capital market in China is still developing, several constraints on a Chinese-listed firm’s financing strategy have a direct impact on its financial flexibility. The purpose of this paper is to reconstruct traditional financial flexibility index (FFI) derived from the western context, provide empirical evidence within eastern context by modified FFI and examine how the managerial efficiency of Chinese-listed firms is demonstrated with modified FFI to escort corporate life cycle hypothesis. Design/methodology/approach By tailored FFI to fit the contemporary operations of Chinese-listed firms, this study investigates how managerial efficiency varies across different life stages to demonstrate the moderating power in the firm performance of financially flexible firm. Findings It is found that financially flexible firms in the Chinese stock market generally experience good firm performance, yet the managerial efficiency could gradually be diminishing at their mature stage even firms’ financial flexibility remains consistent with the agency theory. This paper sheds light on the necessity to reexamine the components in financial flexibility based on the eastern context, and provides avenue to further understand the managerial behavior of Chinese listed firms when considering firm life cycles. Research limitations/implications Although it is difficult for this current study to offer the precise weights on each factor in calculating financial flexibility, the judgment matrix method is adopted to at least provide reliable estimates in accordance with Chinese business contexts with less than 10 percent errors in contrast to the actual weights. Practical implications This modified FFI is particularly suitable for Chinese-listed firms under certain unique financial reporting regulations by adjusting a number of weights and factors. This study may help practitioners understand the managerial conduct of publicly listed firms in China. Originality/value The paper constructs a modified FFI with Chinese stock market characteristics embedded, and provides insightful evidence to explain the new pecking order theory by considering the life cycle stage of Chinese-listed companies.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


Kybernetes ◽  
2018 ◽  
Vol 47 (6) ◽  
pp. 1242-1261 ◽  
Author(s):  
Can Zhong Yao ◽  
Peng Cheng Kuang ◽  
Ji Nan Lin

Purpose The purpose of this study is to reveal the lead–lag structure between international crude oil price and stock markets. Design/methodology/approach The methods used for this study are as follows: empirical mode decomposition; shift-window-based Pearson coefficient and thermal causal path method. Findings The fluctuation characteristic of Chinese stock market before 2010 is very similar to international crude oil prices. After 2010, their fluctuation patterns are significantly different from each other. The two stock markets significantly led international crude oil prices, revealing varying lead–lag orders among stock markets. During 2000 and 2004, the stock markets significantly led international crude oil prices but they are less distinct from the lead–lag orders. After 2004, the effects changed so that the leading effect of Shanghai composite index remains no longer significant, and after 2012, S&P index just significantly lagged behind the international crude oil prices. Originality/value China and the US stock markets develop different pattens to handle the crude oil prices fluctuation after finance crisis in 1998.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Zhengxun Tan ◽  
Yao Fu ◽  
Hong Cheng ◽  
Juan Liu

PurposeThis study aims to examine the long memory as well as the effect of structural breaks in the US and the Chinese stock markets. More importantly, it further explores possible causes of the differences in long memory between these two stock markets.Design/methodology/approachThe authors employ various methods to estimate the memory parameters, including the modified R/S, averaged periodogram, Lagrange multiplier, local Whittle and exact local Whittle estimations.FindingsChina's two stock markets exhibit long memory, whereas the two US markets do not. Furthermore, long memory is robust in Chinese markets even when we test break-adjusted data. The Chinese stock market does not meet the efficient market hypothesis (EMHs), including the efficiency of information disclosure, regulations and supervision, investors' behavior, and trading mechanisms. Therefore, its stock prices' sluggish response to information leads to momentum effects and long memory.Originality/valueThe authors elaborately illustrate how long memory develops by analyzing not only stock market indices but also typical individual stocks in both the emerging China and the developed US, which diversifies the EMH with wider international stylized facts and findings when compared with previous literature. A couple of tests conducted to analyze structural break effects and spurious long memory demonstrate the reliability of the results. The authors’ findings have significant implications for investors and policymakers worldwide.


2015 ◽  
Vol 41 (6) ◽  
pp. 600-614 ◽  
Author(s):  
Liu Liu Kong ◽  
Min Bai ◽  
Peiming Wang

Purpose – The purpose of this paper is to examine whether the framework of Prospect Theory and Mental Accounting proposed by Grinblatt and Han (2005) can be applied to analyzing the relationship between the disposition effect and momentum in the Chinese stock market. Design/methodology/approach – The paper applies the methodology proposed by Grinblatt and Han (2005). Findings – Using firm-level data, with a sample period from January 1998 to June 2013, the authors find evidence that the momentum effect in the Chinese stock market is not driven by the disposition effect, contradicting the findings of Grinblatt and Han (2005) concerning the US stock market. The discrepancies in the findings between the Chinese and US stock markets are robust and independent of sample periods. Research limitations/implications – The findings suggest that Grinblatt and Han’s model may not be applicable to the Chinese stock market. This is possibly because of the regulatory differences between the two stock markets and cross-national variation in investor behavior; in particular, the short-selling prohibition in the Chinese stock market and greater reference point adaptation to unrealized gains/losses among Chinese compared to Americans. Originality/value – This study provides evidence of the inapplicability of Grinblatt and Han’s model for the Chinese stock market, and shows the differences in the relationship between disposition effect and momentum between the Chinese and US stock markets.


2017 ◽  
Vol 43 (5) ◽  
pp. 545-566 ◽  
Author(s):  
Muhammad Zubair Tauni ◽  
Zia-ur-Rehman Rao ◽  
Hong-Xing Fang ◽  
Minghao Gao

Purpose The purpose of this paper is to investigate the impact of the key sources of information, namely, financial advice, word-of-mouth communication and specialized press, on trading behavior of Chinese stock investors. The study also analyzed if the association between the key sources of information and trading behavior is influenced by investor personality. Design/methodology/approach The authors adopted the Big Five personality framework and examined the survey results of individual stock investors (n=541) in China. Personality traits of investors were measured by the NEO-Five Factor Inventory (Costa and McCrae, 1989). The authors performed probit regression analysis to evaluate the moderating influence of investor personality traits on the association between sources of information and stock trading behavior. Findings The results of the study confirm the previous findings that the key sources of information used by investors as a foundation of their financial choices have a significant influence on their trading behavior. The study also provides empirical evidence that investor personality traits moderate the relationship between the key sources of information and trading behavior. Financial advisors tend to increase the frequency of trading in investors with openness, extraversion, neuroticism and agreeableness personality traits, and tend to decrease the intensity of trading in investors with conscientiousness trait. On the other hand, financial information acquired from word-of-mouth communication is more likely to enhance trading frequency in extraverted and agreeable investors, and is more likely to reduce trading frequency in investors with openness, conscientiousness and neuroticism traits. Finally, the use of specialized press leads to more adjustment in portfolios of the investors with openness and conscientiousness traits than those with other personality traits. An alternative mediated model was not supported. Originality/value This research contributes to information search literature and behavioral finance literature and provides empirical evidence that the psychological characteristics of investors are significant predictors of the variations in information-trading link. The study offers new theoretical insights of investors’ behavior due to the characteristics of Chinese stock market which are unique from other stock markets in the world. To the authors’ best knowledge, no previous study has been conducted so far in Chinese stock market to explore variations with regards to the impact of the key sources of information on trading behavior by the Big Five investor personality and this paper seeks to fill this gap.


2015 ◽  
Vol 39 (3) ◽  
pp. 326-345 ◽  
Author(s):  
David Martín-Moncunill ◽  
Miguel-Ángel Sicilia-Urban ◽  
Elena García-Barriocanal ◽  
Salvador Sánchez-Alonso

Purpose – Large terminologies usually contain a mix of terms that are either generic or domain specific, which makes the use of the terminology itself a difficult task that may limit the positive effects of these systems. The purpose of this paper is to systematically evaluate the degree of domain specificity of the AGROVOC controlled vocabulary terms as a representative of a large terminology in the agricultural domain and discuss the generic/specific boundaries across its hierarchy. Design/methodology/approach – A user-oriented study with domain-experts in conjunction with quantitative and systematic analysis. First an in-depth analysis of AGROVOC was carried out to make a proper selection of terms for the experiment. Then domain-experts were asked to classify the terms according to their domain specificity. An evaluation was conducted to analyse the domain-experts’ results. Finally, the resulting data set was automatically compared with the terms in SUMO, an upper ontology and MILO, a mid-level ontology; to analyse the coincidences. Findings – Results show the existence of a high number of generic terms. The motivation for several of the unclear cases is also depicted. The automatic evaluation showed that there is not a direct way to assess the specificity degree of a term by using SUMO and MILO ontologies, however, it provided additional validation of the results gathered from the domain-experts. Research limitations/implications – The “domain-analysis” concept has long been discussed and it could be addressed from different perspectives. A resume of these perspectives and an explanation of the approach followed in this experiment is included in the background section. Originality/value – The authors propose an approach to identify the domain specificity of terms in large domain-specific terminologies and a criterion to measure the overall domain specificity of a knowledge organisation system, based on domain-experts analysis. The authors also provide a first insight about using automated measures to determine the degree to which a given term can be considered domain specific. The resulting data set from the domain-experts’ evaluation can be reused as a gold standard for further research about these automatic measures.


2016 ◽  
Vol 12 (1) ◽  
pp. 71-91 ◽  
Author(s):  
Xiaoming Xu ◽  
Vikash Ramiah ◽  
Imad Moosa ◽  
Sinclair Davidson

Purpose – The purpose of this paper is to: first, test if information-adjusted noise model (IANM) can be applied in China; second, quantify noise trader risk, overreaction, underreaction and information pricing errors in that market; and third, explain the relationship between noise trader risk and return. Design/methodology/approach – The authors use a behavioural asset pricing model (BAPM), CAPM, the information-adjusted noise model and model proposed by Ramiah and Davidson (2010). Findings – The findings show that noise traders are active 99.7 per cent of the time on the Shenzhen A-share market. Furthermore, our results suggest that the Shenzhen market overreacts 41 per cent of the time, underreacts 18 per cent of the time and information pricing errors occur 40 per cent of the time. Originality/value – Various methods have been applied to the Chinese stock market in an effort to measure noise trading activities and all of them failed to account for information arrival. Our study uses a superior and alternative model to detect noise trader risk, overreaction and underreaction in China.


2018 ◽  
Vol 14 (2) ◽  
pp. 233-258 ◽  
Author(s):  
Efthimia Mavridou ◽  
Konstantinos M. Giannoutakis ◽  
Dionysios Kehagias ◽  
Dimitrios Tzovaras ◽  
George Hassapis

Purpose Semantic categorization of Web services comprises a fundamental requirement for enabling more efficient and accurate search and discovery of services in the semantic Web era. However, to efficiently deal with the growing presence of Web services, more automated mechanisms are required. This paper aims to introduce an automatic Web service categorization mechanism, by exploiting various techniques that aim to increase the overall prediction accuracy. Design/methodology/approach The paper proposes the use of Error Correcting Output Codes on top of a Logistic Model Trees-based classifier, in conjunction with a data pre-processing technique that reduces the original feature-space dimension without affecting data integrity. The proposed technique is generalized so as to adhere to all Web services with a description file. A semantic matchmaking scheme is also proposed for enabling the semantic annotation of the input and output parameters of each operation. Findings The proposed Web service categorization framework was tested with the OWLS-TC v4.0, as well as a synthetic data set with a systematic evaluation procedure that enables comparison with well-known approaches. After conducting exhaustive evaluation experiments, categorization efficiency in terms of accuracy, precision, recall and F-measure was measured. The presented Web service categorization framework outperformed the other benchmark techniques, which comprise different variations of it and also third-party implementations. Originality/value The proposed three-level categorization approach is a significant contribution to the Web service community, as it allows the automatic semantic categorization of all functional elements of Web services that are equipped with a service description file.


Sign in / Sign up

Export Citation Format

Share Document