target class
Recently Published Documents


TOTAL DOCUMENTS

198
(FIVE YEARS 93)

H-INDEX

23
(FIVE YEARS 4)

2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Usually, the One-Class Support Vector Machine (OC-SVM) requires a large dataset for modeling effectively the target class independently to other classes. For finding the OC-SVM model, the available dataset is subdivided into two subsets namely training and validation, which are used for training and validating the optimal parameters. This approach is effective when a large dataset is available. However, when training samples are reduced, parameters of the OC-SVM are difficult to find in absence of the validation subset. Hence, this paper proposes various techniques for selecting the optimal parameters using only a training subset. The experimental evaluation conducted on several real-world benchmarks proves the effective use of the new selection parameter techniques for validating the model of OC-SVM classifiers versus the standard validation techniques


Molecules ◽  
2021 ◽  
Vol 27 (1) ◽  
pp. 210
Author(s):  
Oanh Vu ◽  
Brian Joseph Bender ◽  
Lisa Pankewitz ◽  
Daniel Huster ◽  
Annette G. Beck-Sickinger ◽  
...  

G protein-coupled receptors (GPCRs) represent the largest membrane protein family and a significant target class for therapeutics. Receptors from GPCRs’ largest class, class A, influence virtually every aspect of human physiology. About 45% of the members of this family endogenously bind flexible peptides or peptides segments within larger protein ligands. While many of these peptides have been structurally characterized in their solution state, the few studies of peptides in their receptor-bound state suggest that these peptides interact with a shared set of residues and undergo significant conformational changes. For the purpose of understanding binding dynamics and the development of peptidomimetic drug compounds, further studies should investigate the peptide ligands that are complexed to their cognate receptor.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Sikandar Ali ◽  
Muhammad Adeel ◽  
Sumaira Johar ◽  
Muhammad Zeeshan ◽  
Samad Baseer ◽  
...  

An incident, in the perception of information technology, is an event that is not part of a normal process and disrupts operational procedure. This research work particularly focuses on software failure incidents. In any operational environment, software failure can put the quality and performance of services at risk. Many efforts are made to overcome this incident of software failure and to restore normal service as soon as possible. The main contribution of this study is software failure incidents classification and prediction using machine learning. In this study, an active learning approach is used to selectively label those data which is considered to be more informative to build models. Firstly, the sample with the highest randomness (entropy) is selected for labeling. Secondly, to classify the labeled observation into either failure or no failure classes, a binary classifier is used that predicts the target class label as failure or not. For classification, Support Vector Machine is used as a main classifier to classify the data. We derived our prediction models from the failure log files collected from the ECLIPSE software repository.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Daisuke Matsuoka

AbstractImage data classification using machine learning is an effective method for detecting atmospheric phenomena. However, extreme weather events with a small number of cases cause a decrease in classification prediction accuracy owing to the imbalance in data between the target class and the other classes. To build a highly accurate classification model, I held a data analysis competition to determine the best classification performance for two classes of cloud image data, specifically tropical cyclones including precursors and other classes. For the top models in the competition, minority data oversampling, majority data undersampling, ensemble learning, deep layer neural networks, and cost-effective loss functions were used to improve the classification performance of the imbalanced data. In particular, the best model of 209 submissions succeeded in improving the classification capability by 65.4% over similar conventional methods in a measure of the low false alarm ratio.


2021 ◽  
Author(s):  
◽  
Kourosh Neshatian

<p><b>Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation.</b></p> <p>In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance.</p> <p>In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods.</p>


2021 ◽  
Author(s):  
◽  
Kourosh Neshatian

<p><b>Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation.</b></p> <p>In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance.</p> <p>In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods.</p>


2021 ◽  
Vol 11 (22) ◽  
pp. 10639
Author(s):  
Alhuseen Omar Alsayed ◽  
Mohd Shafry Mohd Rahim ◽  
Ibrahim AlBidewi ◽  
Mushtaq Hussain ◽  
Syeda Huma Jabeen ◽  
...  

University education has become an integral and basic part of most people preparing for working life. However, placement of students into the appropriate university, college, or discipline is of paramount importance for university education to perform its role. In this study, various explainable machine learning approaches (Decision Tree [DT], Extra tree classifiers [ETC], Random forest [RF] classifiers, Gradient boosting classifiers [GBC], and Support Vector Machine [SVM]) were tested to predict students’ right undergraduate major (field of specialization) before admission at the undergraduate level based on the current job markets and experience. The DT classifier predicts the target class based on simple decision rules. ETC is an ensemble learning technique that builds prediction models by using unpruned decision trees. RF is also an ensemble technique that uses many individual DTs to solve complex problems. GBC classifiers and produce strong prediction models. SVM predicts the target class with a high margin, as compared to other classifiers. The imbalanced dataset includes secondary school marks, higher secondary school marks, experience, and salary to select specialization for students in undergraduate programs. The results showed that the performances of RF and GBC predict the student field of specialization (undergraduate major) before admission, as well as the fact that these measures are as good as DT and ETC. Statistical analysis (Spearman correlation) is also applied to evaluate the relationship between a student’s major and other input variables. The statistical results show that higher student marks in higher secondary (hsc_p), university degree (Degree_p), and entry test (etest_p) play an important role in the student’s area of specialization, and we can recommend study fields according to these features. Based on these results, RF and GBC can easily be integrated into intelligent recommender systems to suggest a good field of specialization to university students, according to the current job market. This study also demonstrates that marks in higher secondary and university and entry tests are useful criteria to suggest the right undergraduate major because these input features most accurately predict the student field of specialization.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Pieter van Bokhoven ◽  
Arno de Wilde ◽  
Lisa Vermunt ◽  
Prisca S. Leferink ◽  
Sasja Heetveld ◽  
...  

Abstract Background Alzheimer’s disease (AD) is a devastating neurodegenerative disease leading to dementia. The field has made significant progress over the last 15 years. AD diagnosis has shifted from syndromal, based on signs and symptoms, to a biomarker construct based on the pathological hallmarks of the disease: amyloid β deposition, pathologic tau, and neurodegeneration. Numerous genetic risk factors for sporadic AD have been identified, providing further insight into the molecular underpinnings of the disease. For the last two decades, however, drug development for AD has been proven to be particularly challenging. Here, we provide a unique overview of the drug development landscape for AD. By comparing preclinical and clinical drug development pipelines, we aim to describe trends and differences regarding target classes and therapeutic modalities in preclinical and clinical development. Methods We analyzed proprietary and public databases and company websites for drugs in preclinical development for AD by the pharmaceutical industry and major clinical trial registries for drugs in clinical development for AD. Drugs were categorized by target class and treatment modality. Results We found a higher proportion of preclinical interventions targeting molecular pathways associated with sporadic AD genetic risk variants, compared to clinical stage interventions. These include apolipoprotein E (ApoE) and lipids, lysosomal/endosomal targets, and proteostasis. Further, we observed a trend suggesting that more traditional therapeutic modalities are developed for these novel targets, while more novel treatment modalities such as gene therapies and enzyme treatments are in development for more traditional targets such as amyloid β and tau. Interestingly, the percentage of amyloid β targeting therapies in preclinical development (19.2%) is even higher than the percentage in clinical development (10.7%), indicating that diversification away from interventions targeting amyloid-beta has not materialized. Inflammation is the second most popular target class in both preclinical and clinical development. Conclusions Our observations show that the AD drug development pipeline is diversifying in terms of targets and treatment modalities, while amyloid-targeting therapies remain a prominent avenue of development as well. To further advance AD drug development, novel companion diagnostics are needed that are directed at disease mechanisms related to genetic risk factors of AD, both for patient stratification and assessment of therapeutic efficacy in clinical trials.


2021 ◽  
Vol 11 (20) ◽  
pp. 9556
Author(s):  
Yuki Matsuo ◽  
Kazuhiro Takemoto

Open-source deep neural networks (DNNs) for medical imaging are significant in emergent situations, such as during the pandemic of the 2019 novel coronavirus disease (COVID-19), since they accelerate the development of high-performance DNN-based systems. However, adversarial attacks are not negligible during open-source development. Since DNNs are used as computer-aided systems for COVID-19 screening from radiography images, we investigated the vulnerability of the COVID-Net model, a representative open-source DNN for COVID-19 detection from chest X-ray images to backdoor attacks that modify DNN models and cause their misclassification when a specific trigger input is added. The results showed that backdoors for both non-targeted attacks, for which DNNs classify inputs into incorrect labels, and targeted attacks, for which DNNs classify inputs into a specific target class, could be established in the COVID-Net model using a small trigger and small fraction of training data. Moreover, the backdoors were effective for models fine-tuned from the backdoored COVID-Net models, although the performance of non-targeted attacks was limited. This indicated that backdoored models could be spread via fine-tuning (thereby becoming a significant security threat). The findings showed that emphasis is required on open-source development and practical applications of DNNs for COVID-19 detection.


Author(s):  
Sanjay Kumar Sonbhadra ◽  
Sonali Agarwal ◽  
P. Nagabhushan

Existing dimensionality reduction (DR) techniques such as principal component analysis (PCA) and its variants are not suitable for target class mining due to the negligence of unique statistical properties of class-of-interest (CoI) samples. Conventionally, these approaches utilize higher or lower eigenvalued principal components (PCs) for data transformation; but the higher eigenvalued PCs may split the target class, whereas lower eigenvalued PCs do not contribute significant information and wrong selection of PCs leads to performance degradation. Considering these facts, the present research offers a novel target class-guided feature extraction method. In this approach, initially, the eigendecomposition is performed on variance–covariance matrix of only the target class samples, where the higher- and lower-valued eigenvectors are rejected via statistical analysis, and the selected eigenvectors are utilized to extract the most promising feature subspace. The extracted feature-subset gives a more tighter description of the CoI with enhanced associativity among target class samples and ensures the strong separation from nontarget class samples. One-class support vector machine (OCSVM) is evaluated to validate the performance of learned features. To obtain optimized values of hyperparameters of OCSVM a novel [Formula: see text]-ary search-based autonomous method is also proposed. Exhaustive experiments with a wide variety of datasets are performed in feature-space (original and reduced) and eigenspace (obtained from original and reduced features) to validate the performance of the proposed approach in terms of accuracy, precision, specificity and sensitivity.


Sign in / Sign up

Export Citation Format

Share Document