Machine Learning Support for EU Funding Project Categorization

Abstract European Union reallocates its money to their member states using different kinds of funding. EU member states categorize EU funding projects using their own categorization system. While EU prepared an integrated European categorization system, many EU members do not use it in their reports. This hinders a straightforward fiscal analysis. The article aims at an automatic support for categorization of EU funding projects by Machine Learning. The experiments showed that Support Vector Machines (SVM) is the top performance Machine Learning algorithm for this task. We experimented with the SVM classifier and the results disclosed that by employing this approach we can classify EU funding projects using a lexical description better than a baseline (i.e. the classification to a major class). Further, we experienced that the approach using the natural language translator outperforms the approach using the word sense disambiguation. Finally, we investigated the influence of the length of project description on the performance of the classifier. The results showed that while there was a positive correlation between the length of project description and the classifier performance for project descriptions in English, in the case of project description in Non-English languages the classifier performed better for shorter project descriptions. In future, we plan to build a new online application which would use the classifier on the back-end and a user would get a category recommendation on the front-end using a visualization of the EU categorization system.

Download Full-text

Smart Face Detection and Recognition in Low Resolution Images Using Alexnet CNN Compare Accuracy with SVM

Alinteri Journal of Agricultural Sciences ◽

10.47059/alinteri/v36i1/ajas21101 ◽

2021 ◽

Vol 36 (1) ◽

pp. 721-726

Author(s):

S. Mahesh ◽

Dr.G. Ramkumar

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Recognition Rate ◽

Vital Role ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Low Resolution ◽

Accuracy Ratio ◽

Low Resolution Images

Aim: Machine learning algorithm plays a vital role in various biometric applications due to its admirable result in detection, recognition and classification. The main objective of this work is to perform comparative analysis on two different machine learning algorithms to recognize the person from low resolution images with high accuracy. Materials & Methods: AlexNet Convolutional Neural Network (ACNN) and Support Vector Machine (SVM) classifiers are implemented to recognize the face in a low resolution image dataset with 20 samples each. Results: Simulation result shows that ACNN achieves a significant recognition rate with 98% accuracy over SVM (89%). Attained significant accuracy ratio (p=0.002) in SPSS statistical analysis as well. Conclusion: For the considered low resolution images ACNN classifier provides better accuracy than SVM Classifier.

Download Full-text

Using Exponential Kernel for Semi-Supervised Word Sense Disambiguation

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5649 ◽

2016 ◽

Vol 13 (10) ◽

pp. 6929-6934

Author(s):

Junting Chen ◽

Liyun Zhong ◽

Caiyun Cai

Keyword(s):

Natural Language ◽

Language Processing ◽

Word Sense Disambiguation ◽

Training Data ◽

Support Vector ◽

Svm Classifier ◽

Data Sets ◽

Word Sense ◽

Exponential Kernel ◽

Sense Disambiguation

Word sense disambiguation (WSD) in natural language text is a fundamental semantic understanding task at the lexical level in natural language processing (NLP) applications. Kernel methods such as support vector machine (SVM) have been successfully applied to WSD. This is mainly due to their relatively high classification accuracy as well as their ability to handle high dimensional and sparse data. A significant challenge in WSD is to reduce the need for labeled training data while maintaining an acceptable performance. In this paper, we present a semi-supervised technique using the exponential kernel for WSD. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. Experiments on several SENSEVAL benchmark data sets demonstrate the proposed approach is sound and effective.

Download Full-text

Healthy Fruits Image Label Categorization through Color Shape and Texture Features Based on Machine Learning Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7740.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 34-40

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Learning Algorithm ◽

Texture Features ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Paper Machine ◽

Textual Features ◽

Occurrence Matrix

The fruit categorization according to their visual quality has recently experienced tremendous growth in the field of agriculture and food products. Due to post-harvest loses during handling and processing, there is an increasing demand for quality products in agro industry which requires accuracy to predict the fruit. Various techniques of machine learning have been successfully applied for classifying the fruit built on binary class. In this paper, machine leaning technique is used to automate the process of categorization and to improve the accuracy of different types of fruits by feature selection. To categorized images domain specific features such as color, shape and textual features are considered. Statistical color features are extracted from the image, bounding box feature for shape features and gray-level co-occurrence matrix (GLCM) is used to extract the textual feature of an image. These features are combined in a single feature fusion. A support vector machine (SVM) classification model is trained using training set features on fruit360 dataset which includes six fruit categories (classes) with two sub category (sub-classes) which builds multiclass classification task. We present one-vs-one coding design of Error correcting output codes (ECOC) and apply to SVM classifier; validation followed a fivefold cross validation strategy. The result shows that the textual features combined with color and shape feature improved fruit classification accuracy.

Download Full-text

The effect of gamma value on support vector machine performance with different kernels

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i5.pp5497-5506 ◽

2020 ◽

Vol 10 (5) ◽

pp. 5497

Author(s):

Intisar Shadeed Al-Mejibli ◽

Jwan K. Alwan ◽

Dhafar Hamed Abd

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Learning Algorithm ◽

Kernel Functions ◽

Supervised Machine Learning ◽

Support Vector ◽

Svm Classifier ◽

Machine Performance ◽

Rbf Kernel ◽

The Impact

Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed.

Download Full-text

Optical Biopsy for Prostate Cancer Diagnosis Using Fluorescence Spectroscopy

International Journal of High Speed Electronics and Systems ◽

10.1142/s0129156418400268 ◽

2018 ◽

Vol 27 (03n04) ◽

pp. 1840026

Author(s):

Binlin Wu ◽

Xin Gao ◽

Jason Smith

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Optical Biopsy ◽

Support Vector ◽

Svm Classifier ◽

Machine Learning Algorithm ◽

Redox Ratio ◽

Tissue Samples ◽

Statistical Measures

Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using an unsupervised machine learning algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved scores of the components are used to estimate the relative concentrations of the native fluorophores such as NADH and FAD and the redox ratio. A supervised machine learning algorithm such as support vector machine (SVM) is used to classify normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. Various statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve are used to show the classification performance. A cross validation method such as leave-one-out is used to further evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting, and the accuracy was found to be 93.3%.

Download Full-text

NP Animacy Identification for Anaphora Resolution

Journal of Artificial Intelligence Research ◽

10.1613/jair.2179 ◽

2007 ◽

Vol 29 ◽

pp. 79-103 ◽

Cited By ~ 11

Author(s):

C. Orasan ◽

R. J. Evans

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Word Sense Disambiguation ◽

Anaphora Resolution ◽

Machine Learning Algorithm ◽

Machine Learning Method ◽

Word Sense ◽

Rule Based ◽

Sense Disambiguation ◽

Integral Role

In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

Machine learning-based ability to classify psychosis and early stages of disease through parenting and attachment-related variables is associated with social cognition

BMC Psychology ◽

10.1186/s40359-021-00552-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Linda A. Antonucci ◽

Alessandra Raio ◽

Giulio Pergola ◽

Barbara Gelao ◽

Marco Papalino ◽

...

Keyword(s):

Machine Learning ◽

Social Cognition ◽

Attachment Style ◽

Learning Algorithm ◽

Early Recognition ◽

Emotion Perception ◽

First Episode ◽

Support Vector ◽

Significant Group ◽

Early Stages

Abstract Background Recent views posited that negative parenting and attachment insecurity can be considered as general environmental factors of vulnerability for psychosis, specifically for individuals diagnosed with psychosis (PSY). Furthermore, evidence highlighted a tight relationship between attachment style and social cognition abilities, a key PSY behavioral phenotype. The aim of this study is to generate a machine learning algorithm based on the perceived quality of parenting and attachment style-related features to discriminate between PSY and healthy controls (HC) and to investigate its ability to track PSY early stages and risk conditions, as well as its association with social cognition performance. Methods Perceived maternal and paternal parenting, as well as attachment anxiety and avoidance scores, were trained to separate 71 HC from 34 PSY (20 individuals diagnosed with schizophrenia + 14 diagnosed with bipolar disorder with psychotic manifestations) using support vector classification and repeated nested cross-validation. We then validated this model on independent datasets including individuals at the early stages of disease (ESD, i.e. first episode of psychosis or depression, or at-risk mental state for psychosis) and with familial high risk for PSY (FHR, i.e. having a first-degree relative suffering from psychosis). Then, we performed factorial analyses to test the group x classification rate interaction on emotion perception, social inference and managing of emotions abilities. Results The perceived parenting and attachment-based machine learning model discriminated PSY from HC with a Balanced Accuracy (BAC) of 72.2%. Slightly lower classification performance was measured in the ESD sample (HC-ESD BAC = 63.5%), while the model could not discriminate between FHR and HC (BAC = 44.2%). We observed a significant group x classification interaction in PSY and HC from the discovery sample on emotion perception and on the ability to manage emotions (both p = 0.02). The interaction on managing of emotion abilities was replicated in the ESD and HC validation sample (p = 0.03). Conclusion Our results suggest that parenting and attachment-related variables bear significant classification power when applied to both PSY and its early stages and are associated with variability in emotion processing. These variables could therefore be useful in psychosis early recognition programs aimed at softening the psychosis-associated disability.

Download Full-text