Sector categorization using gradient boosted trees trained on fundamental firm data

2021 ◽  
Vol 8 (3-4) ◽  
pp. 91-99
Author(s):  
Ming Fang ◽  
Lilian Kuo ◽  
Frank Shih ◽  
Stephen Taylor

We examine to what extent the GICS sector categorization of equity securities may be systematically reconstructed from historical quarterly firm fundamental data using gradient boosted tree classification. Model complexity and performance tradeoffs are examined and relative feature importance is described. Potential extensions are outlined including ideas to improve feature engineering, validating internal consistency and integrating additional data sources to further improve classification accuracy.

2020 ◽  
Vol 19 (1) ◽  
pp. 24-36
Author(s):  
Sebastian Wenninger ◽  
Daniel Link ◽  
Martin Lames

AbstractDriven by the increased availability of position and performance data, automated analyses are becoming the daily routine in many top-level sports. Methods from the domains of data mining and machine learning are more frequently used to generate new insights from massive amounts of data. This study evaluates the performance of four current models (multi-layer perceptron, convolutional network, recurrent network, gradient boosted tree) in classifying tactical behaviors on a beach volleyball dataset consisting of 1,356 top-level games. A three-way between-subjects analysis of variance was conducted to determine the effects of model, input features and target behavior on classification accuracy. Results show significant differences in classification accuracy between models as well as significant interaction effects between factors. Our models achieve classification performance similar to previous work in other sports. Nonetheless, they are not yet at the level to warrant practical application in day to day performance analysis in beach volleyball.


PLoS ONE ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. e0231300
Author(s):  
Kenneth D. Roe ◽  
Vibhu Jawa ◽  
Xiaohan Zhang ◽  
Christopher G. Chute ◽  
Jeremy A. Epstein ◽  
...  

2021 ◽  
Vol 13 (14) ◽  
pp. 2785
Author(s):  
Emilie Beriaux ◽  
Alban Jago ◽  
Cozmin Lucau-Danila ◽  
Viviane Planchon ◽  
Pierre Defourny

In this upcoming Common Agricultural Policy (CAP) reform, the use of satellite imagery is taking an increasing role for improving the Integrated Administration and Control System (IACS). Considering the operational aspect of the CAP monitoring process, the use of Sentinel-1 SAR (Synthetic Aperture Radar) images is highly relevant, especially in regions with a frequent cloud cover, such as Belgium. Indeed, SAR imagery does not depend on sunlight and is barely affected by the presence of clouds. Moreover, the SAR signal is particularly sensitive to the geometry and the water content of the target. Crop identification is often a pre-requisite to monitor agriculture at parcel level (ploughing, harvest, grassland mowing, intercropping, etc.) The main goal of this study is to assess the performances and constraints of a SAR-based crop classification in an operational large-scale application. The Random Forest object-oriented classification model is built on Sentinel-1 time series from January to August 2020 only. It can identify crops in the Walloon Region (south part of Belgium) with high performance: 93.4% of well-classified area, representing 88.4% of the parcels. Among the 48 crop groups, the six most represented ones get a F1-score higher or equal to 84%. Additionally, this research documents how the classification performance is affected by different parameters: the SAR orbit, the size of the training dataset, the use of different internal buffers on parcel polygons before signal extraction, the set of explanatory variables, and the period of the time series. In an operational context, this allows to choose the right balance between classification accuracy and model complexity. A key result is that using a training dataset containing only 3.2% of the total number of parcels allows to correctly classify 91.7% of the agricultural area. The impact of rain and snow is also discussed. Finally, this research analyses how the classification accuracy depends on some characteristics of the parcels like their shape or size. This allows to assess the relevance of the classification depending on those characteristics, as well as to identify a subset of parcels for which the global accuracy is higher.


Epidemiologia ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 315-324
Author(s):  
Juan M. Banda ◽  
Ramya Tekumalla ◽  
Guanyu Wang ◽  
Jingyuan Yu ◽  
Tuo Liu ◽  
...  

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.


2021 ◽  
Vol 11 (5) ◽  
pp. 2039
Author(s):  
Hyunseok Shin ◽  
Sejong Oh

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.


2021 ◽  
Vol 65 (1) ◽  
pp. 11-22
Author(s):  
Mengyao Lu ◽  
Shuwen Jiang ◽  
Cong Wang ◽  
Dong Chen ◽  
Tian’en Chen

HighlightsA classification model for the front and back sides of tobacco leaves was developed for application in industry.A tobacco leaf grading method that combines a CNN with double-branch integration was proposed.The A-ResNet network was proposed and compared with other classic CNN networks.The grading accuracy of eight different grades was 91.30% and the testing time was 82.180 ms, showing a relatively high classification accuracy and efficiency.Abstract. Flue-cured tobacco leaf grading is a key step in the production and processing of Chinese-style cigarette raw materials, directly affecting cigarette blend and quality stability. At present, manual grading of tobacco leaves is dominant in China, resulting in unsatisfactory grading quality and consuming considerable material and financial resources. In this study, for fast, accurate, and non-destructive tobacco leaf grading, 2,791 flue-cured tobacco leaves of eight different grades in south Anhui Province, China, were chosen as the study sample, and a tobacco leaf grading method that combines convolutional neural networks and double-branch integration was proposed. First, a classification model for the front and back sides of tobacco leaves was trained by transfer learning. Second, two processing methods (equal-scaled resizing and cropping) were used to obtain global images and local patches from the front sides of tobacco leaves. A global image-based tobacco leaf grading model was then developed using the proposed A-ResNet-65 network, and a local patch-based tobacco leaf grading model was developed using the ResNet-34 network. These two networks were compared with classic deep learning networks, such as VGGNet, GoogLeNet-V3, and ResNet. Finally, the grading results of the two grading models were integrated to realize tobacco leaf grading. The tobacco leaf classification accuracy of the final model, for eight different grades, was 91.30%, and grading of a single tobacco leaf required 82.180 ms. The proposed method achieved a relatively high grading accuracy and efficiency. It provides a method for industrial implementation of the tobacco leaf grading and offers a new approach for the quality grading of other agricultural products. Keywords: Convolutional neural network, Deep learning, Image classification, Transfer learning, Tobacco leaf grading


2019 ◽  
Vol 10 (3) ◽  
pp. 743-766
Author(s):  
Anete Petrusch ◽  
Guilherme Luís Roehe Vaccaro ◽  
Juliane Luchese

Purpose Although discussed for more than 20 years, information about Lean adoption in higher education institutions (HEIs) is scarce, especially in developing countries. This research aims to investigate the degree of Lean thinking adoption on administrative services of Brazilian private HEIs. The results are compared to studies from USA and UK, highlighting the maturity on enablers, principles, tools and performance measures related to Lean. Design/methodology/approach A quantitative survey research was carried out. The instrument is adapted for HEIs from the proposal of Malmbrandt and Åhlström (2013) for Lean services. Cronbach’s alpha and factor analysis were used to validate the adapted instrument. Additional data analysis was based on non-parametric tests. Findings No evidence of broad implementation of Lean thinking in administrative processes of Brazilian private HEIs was found, with the adoption being incipient. The results are convergent to those presented by other studies in the USA and the UK. There is a gap between the existing knowledge about Lean in the academic sphere of the HEIs and its application on their academic processes. Research limitations/implications The effective sample size was of 47, despite contacts being sent to 2,090 institutions. This sample allows exploratory research, although further research is required. Results are adherent to those found in research from other countries. Originality/value The research presents descriptive and exploratory results regarding the adoption of Lean in Brazilian HEIs. No previous similar research was found in the literature.


2020 ◽  
pp. 1-2
Author(s):  
Zhang- sensen

mild cognitive impairment (MCI) is a condition between healthy elderly people and alzheimer's disease (AD). At present, brain network analysis based on machine learning methods can help diagnose MCI. In this paper, the brain network is divided into several subnets based on the shortest path,and the feature vectors of each subnet are extracted and classified. In order to make full use of subnet information, this paper adopts integrated classification model for classification.Each base classification model can predict the classification of a subnet,and the classification results of all subnets are calculated as the classification results of brain network.In order to verify the effectiveness of this method,a brain network of 66 people was constructed and a comparative experiment was carried out.The experimental results show that the classification accuracy of the integrated classification model proposed in this paper is 19% higher than that of SVM,which effectively improves the classification accuracy


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2929 ◽  
Author(s):  
Yuanyuan Wang ◽  
Chao Wang ◽  
Hong Zhang

With the capability to automatically learn discriminative features, deep learning has experienced great success in natural images but has rarely been explored for ship classification in high-resolution SAR images due to the training bottleneck caused by the small datasets. In this paper, convolutional neural networks (CNNs) are applied to ship classification by using SAR images with the small datasets. First, ship chips are constructed from high-resolution SAR images and split into training and validation datasets. Second, a ship classification model is constructed based on very deep convolutional networks (VGG). Then, VGG is pretrained via ImageNet, and fine tuning is utilized to train our model. Six scenes of COSMO-SkyMed images are used to evaluate our proposed model with regard to the classification accuracy. The experimental results reveal that (1) our proposed ship classification model trained by fine tuning achieves more than 95% average classification accuracy, even with 5-cross validation; (2) compared with other models, the ship classification model based on VGG16 achieves at least 2% higher accuracies for classification. These experimental results reveal the effectiveness of our proposed method.


Sign in / Sign up

Export Citation Format

Share Document