Performance of machine learning models in application to beach volleyball data.

AbstractDriven by the increased availability of position and performance data, automated analyses are becoming the daily routine in many top-level sports. Methods from the domains of data mining and machine learning are more frequently used to generate new insights from massive amounts of data. This study evaluates the performance of four current models (multi-layer perceptron, convolutional network, recurrent network, gradient boosted tree) in classifying tactical behaviors on a beach volleyball dataset consisting of 1,356 top-level games. A three-way between-subjects analysis of variance was conducted to determine the effects of model, input features and target behavior on classification accuracy. Results show significant differences in classification accuracy between models as well as significant interaction effects between factors. Our models achieve classification performance similar to previous work in other sports. Nonetheless, they are not yet at the level to warrant practical application in day to day performance analysis in beach volleyball.

Download Full-text

Sector categorization using gradient boosted trees trained on fundamental firm data

Algorithmic Finance ◽

10.3233/af-200308 ◽

2021 ◽

Vol 8 (3-4) ◽

pp. 91-99

Author(s):

Ming Fang ◽

Lilian Kuo ◽

Frank Shih ◽

Stephen Taylor

Keyword(s):

Classification Accuracy ◽

Additional Data ◽

Data Sources ◽

Classification Model ◽

Model Complexity ◽

Feature Engineering ◽

Feature Importance ◽

Boosted Tree ◽

Firm Data ◽

And Performance

We examine to what extent the GICS sector categorization of equity securities may be systematically reconstructed from historical quarterly firm fundamental data using gradient boosted tree classification. Model complexity and performance tradeoffs are examined and relative feature importance is described. Potential extensions are outlined including ideas to improve feature engineering, validating internal consistency and integrating additional data sources to further improve classification accuracy.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

Toward Generating a new Video Education Lectures Dataset and Performance Comparison with Various Machine Learning Algorithms

10.36295/asro.2019.221230 ◽

2019 ◽

Vol 22 (12) ◽

pp. 279-298

Author(s):

M Maysaa H. Abdulameer ◽

Mahmood Z. Abdullah

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

And Performance ◽

Video Education

Download Full-text

Development and Performance Assessment of Novel Machine Learning Models to Predict Postoperative Pneumonia After Liver Transplantation

SSRN Electronic Journal ◽

10.2139/ssrn.3667645 ◽

2020 ◽

Author(s):

Chaojin Chen ◽

Dong Yang ◽

Shilong Gao ◽

Yihan Zhang ◽

Liubing Chen ◽

...

Keyword(s):

Machine Learning ◽

Liver Transplantation ◽

Performance Assessment ◽

Postoperative Pneumonia ◽

Learning Models ◽

And Performance ◽

Machine Learning Models

Download Full-text

An Optimized Approach for Breast Cancer Classification for Histopathological Images Based on Hybrid Feature Set

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405616666200423085826 ◽

2020 ◽

Vol 16 ◽

Cited By ~ 1

Author(s):

Inzamam Mashood Nasir ◽

Muhammad Rashid ◽

Jamal Hussain Shah ◽

Muhammad Sharif ◽

Muhammad Yahiya Haider Awan ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

State Of The Art ◽

Hybrid Approach ◽

Classification Performance ◽

Diagnose Breast Cancer ◽

Histopathological Images ◽

And Performance ◽

Learned Features ◽

Intelligent Healthcare

Background: Breast cancer is considered as the most perilous sickness among females worldwide and the ratio of new cases is expanding yearly. Many researchers have proposed efficient algorithms to diagnose breast cancer at early stages, which have increased the efficiency and performance by utilizing the learned features of gold standard histopathological images. Objective: Most of these systems have either used traditional handcrafted features or deep features which had a lot of noise and redundancy, which ultimately decrease the performance of the system. Methods: A hybrid approach is proposed by fusing and optimizing the properties of handcrafted and deep features to classify the breast cancer images. HOG and LBP features are serially fused with pretrained models VGG19 and InceptionV3. PCR and ICR are used to evaluate the classification performance of proposed method. Results: The method concentrates on histopathological images to classify the breast cancer. The performance is compared with state-of-the-art techniques, where an overall patient-level accuracy of 97.2% and image-level accuracy of 96.7% is recorded. Conclusion: The proposed hybrid method achieves the best performance as compared to previous methods and it can be used for the intelligent healthcare systems and early breast cancer detection.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

The future is coming: promising perspectives regarding the use of machine learning in renal transplantation

Brazilian Journal of Nephrology ◽

10.1590/2175-8239-jbn-2018-0047 ◽

2019 ◽

Vol 41 (2) ◽

pp. 284-287

Author(s):

Pedro Guilherme Coelho Hannun ◽

Luis Gustavo Modelli de Andrade

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Graft Function ◽

Statistical Technique ◽

Daily Routine ◽

Learning Approaches ◽

Post Transplantation ◽

Chronic Allograft Rejection ◽

Institutional Experience ◽

Learning Principles

Abstract Introduction: The prediction of post transplantation outcomes is clinically important and involves several problems. The current prediction models based on standard statistics are very complex, difficult to validate and do not provide accurate prediction. Machine learning, a statistical technique that allows the computer to make future predictions using previous experiences, is beginning to be used in order to solve these issues. In the field of kidney transplantation, computational forecasting use has been reported in prediction of chronic allograft rejection, delayed graft function, and graft survival. This paper describes machine learning principles and steps to make a prediction and performs a brief analysis of the most recent applications of its application in literature. Discussion: There is compelling evidence that machine learning approaches based on donor and recipient data are better in providing improved prognosis of graft outcomes than traditional analysis. The immediate expectations that emerge from this new prediction modelling technique are that it will generate better clinical decisions based on dynamic and local practice data and optimize organ allocation as well as post transplantation care management. Despite the promising results, there is no substantial number of studies yet to determine feasibility of its application in a clinical setting. Conclusion: The way we deal with storage data in electronic health records will radically change in the coming years and machine learning will be part of clinical daily routine, whether to predict clinical outcomes or suggest diagnosis based on institutional experience.

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text

The Classification of Medicinal Plant Leaves Based on Multispectral and Texture Feature Using Machine Learning Approach

Agronomy ◽

10.3390/agronomy11020263 ◽

2021 ◽

Vol 11 (2) ◽

pp. 263

Author(s):

Samreen Naeem ◽

Aqib Ali ◽

Christophe Chesneau ◽

Muhammad H. Tahir ◽

Farrukh Jamal ◽

...

Keyword(s):

Machine Learning ◽

Medicinal Plant ◽

Texture Feature ◽

Stevia Rebaudiana ◽

Ocimum Sanctum ◽

Multi Layer Perceptron ◽

Plant Leaves ◽

Chi Square ◽

Lemon Balm

This study proposes the machine learning based classification of medical plant leaves. The total six varieties of medicinal plant leaves-based dataset are collected from the Department of Agriculture, The Islamia University of Bahawalpur, Pakistan. These plants are commonly named in English as (herbal) Tulsi, Peppermint, Bael, Lemon balm, Catnip, and Stevia and scientifically named in Latin as Ocimum sanctum, Mentha balsamea, Aegle marmelos, Melissa officinalis, Nepeta cataria, and Stevia rebaudiana, respectively. The multispectral and digital image dataset are collected via a computer vision laboratory setup. For the preprocessing step, we crop the region of the leaf and transform it into a gray level format. Secondly, we perform a seed intensity-based edge/line detection utilizing Sobel filter and draw five regions of observations. A total of 65 fused features dataset is extracted, being a combination of texture, run-length matrix, and multi-spectral features. For the feature optimization process, we employ a chi-square feature selection approach and select 14 optimized features. Finally, five machine learning classifiers named as a multi-layer perceptron, logit-boost, bagging, random forest, and simple logistic are deployed on an optimized medicinal plant leaves dataset, and it is observed that the multi-layer perceptron classifier shows a relatively promising accuracy of 99.01% as compared to the competition. The distinct classification accuracy by the multi-layer perceptron classifier on six medicinal plant leaves are 99.10% for Tulsi, 99.80% for Peppermint, 98.40% for Bael, 99.90% for Lemon balm, 98.40% for Catnip, and 99.20% for Stevia.

Download Full-text