GENERALIZATION BOUNDS OF REGULARIZATION ALGORITHMS DERIVED SIMULTANEOUSLY THROUGH HYPOTHESIS SPACE COMPLEXITY, ALGORITHMIC STABILITY AND DATA QUALITY

A main issue in machine learning research is to analyze the generalization performance of a learning machine. Most classical results on the generalization performance of regularization algorithms are derived merely with the complexity of hypothesis space or the stability property of a learning algorithm. However, in practical applications, the performance of a learning algorithm is not actually affected only by an unitary factor just like the complexity of hypothesis space, stability of the algorithm and data quality. Therefore, in this paper, we develop a framework of evaluating the generalization performance of regularization algorithms combinatively in terms of hypothesis space complexity, algorithmic stability and data quality. We establish new bounds on the learning rate of regularization algorithms based on the measure of uniform stability and empirical covering number for general type of loss functions. As applications of the generic results, we evaluate the learning rates of support vector machines and regularization networks, and propose a new strategy for regularization parameter setting.

Download Full-text

Fast Linear SVM Validation Based on Early Stopping in Iterative Learning

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415510131 ◽

2015 ◽

Vol 29 (08) ◽

pp. 1551013 ◽

Cited By ~ 9

Author(s):

Mahmoud Famouri ◽

Mohammad Taheri ◽

Zohreh Azimifar

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Regularization Parameter ◽

The Other ◽

Support Vector ◽

Svm Classifier ◽

Training Phase ◽

Early Stopping ◽

Novel Approach

Classification is an important field in machine learning and pattern recognition. Amongst various types of classifiers such as nearest neighbor, neural network and Bayesian classifiers, support vector machine (SVM) is known as a very powerful classifier. One of the advantages of SVM in comparison with the other methods, is its efficient and adjustable generalization capability. The performance of SVM classifier depends on its parameters, specially regularization parameter C, that is usually selected by cross-validation. Despite its generalization, SVM suffers from some limitations such as its considerable low speed training phase. Cross-validation is a very time consuming part of training phase, because for any candidate value of the parameter C, the entire process of training and validating must be repeated completely. In this paper, we propose a novel approach for early stopping of the SVM learning algorithm. The proposed early stopping occurs by integrating the validation part into the optimization part of the SVM training without losing any generality or degrading performance of the classifier. Moreover, this method can be considered in conjunction with the other available accelerator methods since there is not any dependency between our proposed method and the other accelerator ones, thus no redundancy will happen. Our method was tested and verified on various UCI repository datasets and the results indicate that this method speeds up the learning phase of SVM without losing any generality or affecting the final model of classifier.

Download Full-text

The generalization performance of learning algorithms derived simultaneously through algorithmic stability and space complexity

2011 Seventh International Conference on Natural Computation ◽

10.1109/icnc.2011.6022044 ◽

2011 ◽

Author(s):

Jie Xu ◽

Bin Zou

Keyword(s):

Learning Algorithms ◽

Space Complexity ◽

Generalization Performance ◽

Algorithmic Stability

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

Machine learning-based ability to classify psychosis and early stages of disease through parenting and attachment-related variables is associated with social cognition

BMC Psychology ◽

10.1186/s40359-021-00552-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Linda A. Antonucci ◽

Alessandra Raio ◽

Giulio Pergola ◽

Barbara Gelao ◽

Marco Papalino ◽

...

Keyword(s):

Machine Learning ◽

Social Cognition ◽

Attachment Style ◽

Learning Algorithm ◽

Early Recognition ◽

Emotion Perception ◽

First Episode ◽

Support Vector ◽

Significant Group ◽

Early Stages

Abstract Background Recent views posited that negative parenting and attachment insecurity can be considered as general environmental factors of vulnerability for psychosis, specifically for individuals diagnosed with psychosis (PSY). Furthermore, evidence highlighted a tight relationship between attachment style and social cognition abilities, a key PSY behavioral phenotype. The aim of this study is to generate a machine learning algorithm based on the perceived quality of parenting and attachment style-related features to discriminate between PSY and healthy controls (HC) and to investigate its ability to track PSY early stages and risk conditions, as well as its association with social cognition performance. Methods Perceived maternal and paternal parenting, as well as attachment anxiety and avoidance scores, were trained to separate 71 HC from 34 PSY (20 individuals diagnosed with schizophrenia + 14 diagnosed with bipolar disorder with psychotic manifestations) using support vector classification and repeated nested cross-validation. We then validated this model on independent datasets including individuals at the early stages of disease (ESD, i.e. first episode of psychosis or depression, or at-risk mental state for psychosis) and with familial high risk for PSY (FHR, i.e. having a first-degree relative suffering from psychosis). Then, we performed factorial analyses to test the group x classification rate interaction on emotion perception, social inference and managing of emotions abilities. Results The perceived parenting and attachment-based machine learning model discriminated PSY from HC with a Balanced Accuracy (BAC) of 72.2%. Slightly lower classification performance was measured in the ESD sample (HC-ESD BAC = 63.5%), while the model could not discriminate between FHR and HC (BAC = 44.2%). We observed a significant group x classification interaction in PSY and HC from the discovery sample on emotion perception and on the ability to manage emotions (both p = 0.02). The interaction on managing of emotion abilities was replicated in the ESD and HC validation sample (p = 0.03). Conclusion Our results suggest that parenting and attachment-related variables bear significant classification power when applied to both PSY and its early stages and are associated with variability in emotion processing. These variables could therefore be useful in psychosis early recognition programs aimed at softening the psychosis-associated disability.

Download Full-text

Methods for detecting and correcting contextual data quality problems

Intelligent Data Analysis ◽

10.3233/ida-205282 ◽

2021 ◽

Vol 25 (4) ◽

pp. 763-787

Author(s):

Alladoumbaye Ngueilbaye ◽

Hongzhi Wang ◽

Daouda Ahmat Mahamat ◽

Ibrahim A. Elgendy ◽

Sahalu B. Junaidu

Keyword(s):

Data Quality ◽

Quality Evaluation ◽

Web Applications ◽

Support Vector ◽

Similarity Metrics ◽

Distributed Data ◽

Contextual Data ◽

Formal Taxonomy ◽

E Learning ◽

High Scalability

Knowledge extraction, data mining, e-learning or web applications platforms use heterogeneous and distributed data. The proliferation of these multifaceted platforms faces many challenges such as high scalability, the coexistence of complex similarity metrics, and the requirement of data quality evaluation. In this study, an extended complete formal taxonomy and some algorithms that utilize in achieving the detection and correction of contextual data quality anomalies were developed and implemented on structured data. Our methods were effective in detecting and correcting more data anomalies than existing taxonomy techniques, and also highlighted the demerit of Support Vector Machine (SVM). These proposed techniques, therefore, will be of relevance in detection and correction of errors in large contextual data (Big data).

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

Deep learning-based framework for the distinction of membranous nephropathy: a new approach through hyperspectral imagery

BMC Nephrology ◽

10.1186/s12882-021-02421-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tianqi Tu ◽

Xueling Wei ◽

Yue Yang ◽

Nianrong Zhang ◽

Wei Li ◽

...

Keyword(s):

Deep Learning ◽

Renal Biopsy ◽

Membranous Nephropathy ◽

Learning Algorithm ◽

Hyperspectral Imagery ◽

Chinese Patients ◽

Support Vector ◽

Deep Learning Algorithm ◽

The Difference ◽

Complex Deposition

Abstract Background Common subtypes seen in Chinese patients with membranous nephropathy (MN) include idiopathic membranous nephropathy (IMN) and hepatitis B virus-related membranous nephropathy (HBV-MN). However, the morphologic differences are not visible under the light microscope in certain renal biopsy tissues. Methods We propose here a deep learning-based framework for processing hyperspectral images of renal biopsy tissue to define the difference between IMN and HBV-MN based on the component of their immune complex deposition. Results The proposed framework can achieve an overall accuracy of 95.04% in classification, which also leads to better performance than support vector machine (SVM)-based algorithms. Conclusion IMN and HBV-MN can be correctly separated via the deep learning framework using hyperspectral imagery. Our results suggest the potential of the deep learning algorithm as a new method to aid in the diagnosis of MN.

Download Full-text

Analysis of Garnet by Laser-Induced Breakdown Spectroscopy—Two Practical Applications

Minerals ◽

10.3390/min11070705 ◽

2021 ◽

Vol 11 (7) ◽

pp. 705

Author(s):

Peter A. Defnet ◽

Michael A. Wise ◽

Russell S. Harmon ◽

Richard R. Hark ◽

Keith Hilferding

Keyword(s):

Optical Emission ◽

Atomic Emission ◽

Laser Induced Breakdown Spectroscopy ◽

Support Vector ◽

Success Rates ◽

Practical Applications ◽

Chemical Complexity ◽

Geochemical Fingerprinting ◽

Breakdown Spectroscopy ◽

Laser Induced Breakdown

Laser-induced breakdown spectroscopy (LIBS) is a simple and straightforward technique of atomic emission spectroscopy that can provide multi-element detection and quantification in any material, in-situ and in real time because all elements emit in the 200–900 nm spectral range of the LIBS optical emission. This study evaluated two practical applications of LIBS—validation of labels assigned to garnets in museum collections and discrimination of LCT (lithium-cesium-tantalum) and NYF (niobium, yttrium and fluorine) pegmatites based on garnet geochemical fingerprinting, both of which could be implemented on site in a museum or field setting with a handheld LIBS analyzer. Major element compositions were determined using electron microprobe analysis for a suite of 208 garnets from 24 countries to determine garnet type. Both commercial laboratory and handheld analyzers were then used to acquire LIBS broadband spectra that were chemometrically processed by partial least squares discriminant analysis (PLSDA) and linear support vector machine classification (SVM). High attribution success rates (>98%) were obtained using PLSDA and SVM for the handheld data suggesting that LIBS could be used in a museum setting to assign garnet type quickly and accurately. LIBS also identifies changes in garnet composition associated with increasing mineral and chemical complexity of LCT and NYF pegmatites.

Download Full-text

CAFD: Context-Aware Fault Diagnostic Scheme towards Sensor Faults Utilizing Machine Learning

Sensors ◽

10.3390/s21020617 ◽

2021 ◽

Vol 21 (2) ◽

pp. 617

Author(s):

Umer Saeed ◽

Young-Doo Lee ◽

Sana Ullah Jan ◽

Insoo Koo

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Diagnostic System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Context Aware ◽

Sensor Faults ◽

Training Time ◽

Low Intensity ◽

Fault Diagnostic

Sensors’ existence as a key component of Cyber-Physical Systems makes it susceptible to failures due to complex environments, low-quality production, and aging. When defective, sensors either stop communicating or convey incorrect information. These unsteady situations threaten the safety, economy, and reliability of a system. The objective of this study is to construct a lightweight machine learning-based fault detection and diagnostic system within the limited energy resources, memory, and computation of a Wireless Sensor Network (WSN). In this paper, a Context-Aware Fault Diagnostic (CAFD) scheme is proposed based on an ensemble learning algorithm called Extra-Trees. To evaluate the performance of the proposed scheme, a realistic WSN scenario composed of humidity and temperature sensor observations is replicated with extreme low-intensity faults. Six commonly occurring types of sensor fault are considered: drift, hard-over/bias, spike, erratic/precision degradation, stuck, and data-loss. The proposed CAFD scheme reveals the ability to accurately detect and diagnose low-intensity sensor faults in a timely manner. Moreover, the efficiency of the Extra-Trees algorithm in terms of diagnostic accuracy, F1-score, ROC-AUC, and training time is demonstrated by comparison with cutting-edge machine learning algorithms: a Support Vector Machine and a Neural Network.

Download Full-text