GENERALIZATION BOUNDS OF REGULARIZATION ALGORITHMS DERIVED SIMULTANEOUSLY THROUGH HYPOTHESIS SPACE COMPLEXITY, ALGORITHMIC STABILITY AND DATA QUALITY

Author(s):  
XIANGYU CHANG ◽  
ZONGBEN XU ◽  
BIN ZOU ◽  
HAI ZHANG

A main issue in machine learning research is to analyze the generalization performance of a learning machine. Most classical results on the generalization performance of regularization algorithms are derived merely with the complexity of hypothesis space or the stability property of a learning algorithm. However, in practical applications, the performance of a learning algorithm is not actually affected only by an unitary factor just like the complexity of hypothesis space, stability of the algorithm and data quality. Therefore, in this paper, we develop a framework of evaluating the generalization performance of regularization algorithms combinatively in terms of hypothesis space complexity, algorithmic stability and data quality. We establish new bounds on the learning rate of regularization algorithms based on the measure of uniform stability and empirical covering number for general type of loss functions. As applications of the generic results, we evaluate the learning rates of support vector machines and regularization networks, and propose a new strategy for regularization parameter setting.

Author(s):  
Mahmoud Famouri ◽  
Mohammad Taheri ◽  
Zohreh Azimifar

Classification is an important field in machine learning and pattern recognition. Amongst various types of classifiers such as nearest neighbor, neural network and Bayesian classifiers, support vector machine (SVM) is known as a very powerful classifier. One of the advantages of SVM in comparison with the other methods, is its efficient and adjustable generalization capability. The performance of SVM classifier depends on its parameters, specially regularization parameter C, that is usually selected by cross-validation. Despite its generalization, SVM suffers from some limitations such as its considerable low speed training phase. Cross-validation is a very time consuming part of training phase, because for any candidate value of the parameter C, the entire process of training and validating must be repeated completely. In this paper, we propose a novel approach for early stopping of the SVM learning algorithm. The proposed early stopping occurs by integrating the validation part into the optimization part of the SVM training without losing any generality or degrading performance of the classifier. Moreover, this method can be considered in conjunction with the other available accelerator methods since there is not any dependency between our proposed method and the other accelerator ones, thus no redundancy will happen. Our method was tested and verified on various UCI repository datasets and the results indicate that this method speeds up the learning phase of SVM without losing any generality or affecting the final model of classifier.


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


2021 ◽  
Vol 10 (5) ◽  
pp. 992
Author(s):  
Martina Barchitta ◽  
Andrea Maugeri ◽  
Giuliana Favara ◽  
Paolo Marco Riela ◽  
Giovanni Gallo ◽  
...  

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Linda A. Antonucci ◽  
Alessandra Raio ◽  
Giulio Pergola ◽  
Barbara Gelao ◽  
Marco Papalino ◽  
...  

Abstract Background Recent views posited that negative parenting and attachment insecurity can be considered as general environmental factors of vulnerability for psychosis, specifically for individuals diagnosed with psychosis (PSY). Furthermore, evidence highlighted a tight relationship between attachment style and social cognition abilities, a key PSY behavioral phenotype. The aim of this study is to generate a machine learning algorithm based on the perceived quality of parenting and attachment style-related features to discriminate between PSY and healthy controls (HC) and to investigate its ability to track PSY early stages and risk conditions, as well as its association with social cognition performance. Methods Perceived maternal and paternal parenting, as well as attachment anxiety and avoidance scores, were trained to separate 71 HC from 34 PSY (20 individuals diagnosed with schizophrenia + 14 diagnosed with bipolar disorder with psychotic manifestations) using support vector classification and repeated nested cross-validation. We then validated this model on independent datasets including individuals at the early stages of disease (ESD, i.e. first episode of psychosis or depression, or at-risk mental state for psychosis) and with familial high risk for PSY (FHR, i.e. having a first-degree relative suffering from psychosis). Then, we performed factorial analyses to test the group x classification rate interaction on emotion perception, social inference and managing of emotions abilities. Results The perceived parenting and attachment-based machine learning model discriminated PSY from HC with a Balanced Accuracy (BAC) of 72.2%. Slightly lower classification performance was measured in the ESD sample (HC-ESD BAC = 63.5%), while the model could not discriminate between FHR and HC (BAC = 44.2%). We observed a significant group x classification interaction in PSY and HC from the discovery sample on emotion perception and on the ability to manage emotions (both p = 0.02). The interaction on managing of emotion abilities was replicated in the ESD and HC validation sample (p = 0.03). Conclusion Our results suggest that parenting and attachment-related variables bear significant classification power when applied to both PSY and its early stages and are associated with variability in emotion processing. These variables could therefore be useful in psychosis early recognition programs aimed at softening the psychosis-associated disability.


2021 ◽  
Vol 25 (4) ◽  
pp. 763-787
Author(s):  
Alladoumbaye Ngueilbaye ◽  
Hongzhi Wang ◽  
Daouda Ahmat Mahamat ◽  
Ibrahim A. Elgendy ◽  
Sahalu B. Junaidu

Knowledge extraction, data mining, e-learning or web applications platforms use heterogeneous and distributed data. The proliferation of these multifaceted platforms faces many challenges such as high scalability, the coexistence of complex similarity metrics, and the requirement of data quality evaluation. In this study, an extended complete formal taxonomy and some algorithms that utilize in achieving the detection and correction of contextual data quality anomalies were developed and implemented on structured data. Our methods were effective in detecting and correcting more data anomalies than existing taxonomy techniques, and also highlighted the demerit of Support Vector Machine (SVM). These proposed techniques, therefore, will be of relevance in detection and correction of errors in large contextual data (Big data).


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tianqi Tu ◽  
Xueling Wei ◽  
Yue Yang ◽  
Nianrong Zhang ◽  
Wei Li ◽  
...  

Abstract Background Common subtypes seen in Chinese patients with membranous nephropathy (MN) include idiopathic membranous nephropathy (IMN) and hepatitis B virus-related membranous nephropathy (HBV-MN). However, the morphologic differences are not visible under the light microscope in certain renal biopsy tissues. Methods We propose here a deep learning-based framework for processing hyperspectral images of renal biopsy tissue to define the difference between IMN and HBV-MN based on the component of their immune complex deposition. Results The proposed framework can achieve an overall accuracy of 95.04% in classification, which also leads to better performance than support vector machine (SVM)-based algorithms. Conclusion IMN and HBV-MN can be correctly separated via the deep learning framework using hyperspectral imagery. Our results suggest the potential of the deep learning algorithm as a new method to aid in the diagnosis of MN.


Minerals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 705
Author(s):  
Peter A. Defnet ◽  
Michael A. Wise ◽  
Russell S. Harmon ◽  
Richard R. Hark ◽  
Keith Hilferding

Laser-induced breakdown spectroscopy (LIBS) is a simple and straightforward technique of atomic emission spectroscopy that can provide multi-element detection and quantification in any material, in-situ and in real time because all elements emit in the 200–900 nm spectral range of the LIBS optical emission. This study evaluated two practical applications of LIBS—validation of labels assigned to garnets in museum collections and discrimination of LCT (lithium-cesium-tantalum) and NYF (niobium, yttrium and fluorine) pegmatites based on garnet geochemical fingerprinting, both of which could be implemented on site in a museum or field setting with a handheld LIBS analyzer. Major element compositions were determined using electron microprobe analysis for a suite of 208 garnets from 24 countries to determine garnet type. Both commercial laboratory and handheld analyzers were then used to acquire LIBS broadband spectra that were chemometrically processed by partial least squares discriminant analysis (PLSDA) and linear support vector machine classification (SVM). High attribution success rates (>98%) were obtained using PLSDA and SVM for the handheld data suggesting that LIBS could be used in a museum setting to assign garnet type quickly and accurately. LIBS also identifies changes in garnet composition associated with increasing mineral and chemical complexity of LCT and NYF pegmatites.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 617
Author(s):  
Umer Saeed ◽  
Young-Doo Lee ◽  
Sana Ullah Jan ◽  
Insoo Koo

Sensors’ existence as a key component of Cyber-Physical Systems makes it susceptible to failures due to complex environments, low-quality production, and aging. When defective, sensors either stop communicating or convey incorrect information. These unsteady situations threaten the safety, economy, and reliability of a system. The objective of this study is to construct a lightweight machine learning-based fault detection and diagnostic system within the limited energy resources, memory, and computation of a Wireless Sensor Network (WSN). In this paper, a Context-Aware Fault Diagnostic (CAFD) scheme is proposed based on an ensemble learning algorithm called Extra-Trees. To evaluate the performance of the proposed scheme, a realistic WSN scenario composed of humidity and temperature sensor observations is replicated with extreme low-intensity faults. Six commonly occurring types of sensor fault are considered: drift, hard-over/bias, spike, erratic/precision degradation, stuck, and data-loss. The proposed CAFD scheme reveals the ability to accurately detect and diagnose low-intensity sensor faults in a timely manner. Moreover, the efficiency of the Extra-Trees algorithm in terms of diagnostic accuracy, F1-score, ROC-AUC, and training time is demonstrated by comparison with cutting-edge machine learning algorithms: a Support Vector Machine and a Neural Network.


Sign in / Sign up

Export Citation Format

Share Document