Selection of SNP Subsets for Severity of Beta-thalassaemia Classification Problem

Author(s):  
Ek Thamwiwatthana ◽  
Kitsuchart Pasupa ◽  
Sissades Tongsima
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
João Antônio Dantas de Jesus Ferreira ◽  
Ney Rafael Secco

Purpose This paper aims to investigate the possibility of lowering the time taken during the aircraft design for unmanned aerial vehicles by using machine learning (ML) for the configuration selection phase. In this work, a database of unmanned aircraft is compiled and is proposed that decision tree classifiers (DTC) can understand the relations between mission and operational requirements and the resulting aircraft configuration. Design/methodology/approach This paper presents a ML-based approach to configuration selection of unmanned aircraft. Multiple DTC are built to predict the overall configuration. The classifiers are trained with a database of 118 unmanned aircraft with 57 characteristics, 47 of which are inputs for the classification problem, and 10 are the desired outputs, such as wing configuration or engine type. Findings This paper shows that DTC can be used for the configuration selection of unmanned aircraft with reasonable accuracy, understanding the connections between the different mission requirements and the culminating configuration. The framework is also capable of dealing with incomplete databases, maximizing the available knowledge. Originality/value This paper increases the computational usage for the aircraft design while retaining requirements’ traceability and increasing decision awareness.


Proceedings ◽  
2019 ◽  
Vol 19 (1) ◽  
pp. 20
Author(s):  
Diego Pacheco Prado ◽  
Luis Ángel Ruiz

GEOBIA is an alternative to create and update land cover maps. In this work we assessed the combination of geographic datasets of the Cajas National Park (Ecuador) to detect which is the appropriate dataset-algorithm combination for the classification tasks in the Ecuadorian Andean region. The datasets included high resolution data as photogrammetric orthomosaic, DEM and derivated slope. These data were compared with free Sentinel imagery to classify natural land covers. We evaluated two aspects of the classification problem: the appropriate algorithm and the dataset combination. We evaluated SMO, C4.5 and Random Forest algorithms for the selection of attributes and classification of objects. The best results of kappa in the comparison of algorithms of classification were obtained with SMO (0.8182) and Random Forest (0.8117). In the evaluation of datasets the kappa values of the photogrammetry orthomosaic and the combination of Sentinel 1 and 2 have similar values using the C4.5 algorithm.


2021 ◽  
Vol 2142 (1) ◽  
pp. 012013
Author(s):  
A S Nazdryukhin ◽  
A M Fedrak ◽  
N A Radeev

Abstract This work presents the results of using self-normalizing neural networks with automatic selection of hyperparameters, TabNet and NODE to solve the problem of tabular data classification. The method of automatic selection of hyperparameters was realised. Testing was carried out with the open source framework OpenML AutoML Benchmark. As part of the work, a comparative analysis was carried out with seven classification methods, experiments were carried out for 39 datasets with 5 methods. NODE shows the best results among the following methods and overperformed standard methods for four datasets.


2021 ◽  
Vol 26 (1) ◽  
pp. 17
Author(s):  
Thomas Daniel ◽  
Fabien Casenave ◽  
Nissrine Akkari ◽  
David Ryckelynck

Classification algorithms have recently found applications in computational physics for the selection of numerical methods or models adapted to the environment and the state of the physical system. For such classification tasks, labeled training data come from numerical simulations and generally correspond to physical fields discretized on a mesh. Three challenging difficulties arise: the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data. This article introduces two algorithms to address these issues: one for dimensionality reduction via feature selection, and one for data augmentation. These algorithms are combined with a wide variety of classifiers for their evaluation. When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.


2009 ◽  
Vol 50 ◽  
Author(s):  
Gintautas Jakimauskas

Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1,  2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’.  An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure.


Class imbalance is a serious issue in classification problem. If a class is unevenly distributed the classification algorithm unable to classify the response variable, which will result in inaccuracy. The technique Multiclass Data Imbalance Oversampling Techniques (MuDIOT) is to find out the factors which have a hidden negative impact on classification. To alleviate the negative impact the technique MuDIOT concentrates on balancing the data and the result minimizes the problems raised due to uneven distribution of classes. The dataset chosen has a multiclass distribution problem and it is handled to produce better results of classification.


2017 ◽  
Vol 14 (137) ◽  
pp. 20170734 ◽  
Author(s):  
Angkoon Phinyomark ◽  
Rami N. Khushaba ◽  
Esther Ibáñez-Marcelo ◽  
Alice Patania ◽  
Erik Scheme ◽  
...  

The success of biological signal pattern recognition depends crucially on the selection of relevant features. Across signal and imaging modalities, a large number of features have been proposed, leading to feature redundancy and the need for optimal feature set identification. A further complication is that, due to the inherent biological variability, even the same classification problem on different datasets can display variations in the respective optimal sets, casting doubts on the generalizability of relevant features. Here, we approach this problem by leveraging topological tools to create charts of features spaces. These charts highlight feature sub-groups that encode similar information (and their respective similarities) allowing for a principled and interpretable choice of features for classification and analysis. Using multiple electromyographic (EMG) datasets as a case study, we use this feature chart to identify functional groups among 58 state-of-the-art EMG features, and to show that they generalize across three different forearm EMG datasets obtained from able-bodied subjects during hand and finger contractions. We find that these groups describe meaningful non-redundant information, succinctly recapitulating information about different regions of feature space. We then recommend representative features from each group based on maximum class separability, robustness and minimum complexity.


Author(s):  
Tanujit Chakraborty

Private business schools in India face a regular problem of picking quality students for their MBA programs to achieve the desired placement percentage. Generally, such datasets are biased towards one class, i.e., imbalanced in nature. And learning from the imbalanced dataset is a difficult proposition. This paper proposes an imbalanced ensemble classifier which can handle the imbalanced nature of the dataset and achieves higher accuracy in case of the feature selection (selection of important characteristics of students) cum classification problem (prediction of placements based on the students’ characteristics) for Indian business school dataset. The optimal value of an important model parameter is found. Experimental evidence is also provided using Indian business school dataset to evaluate the outstanding performance of the proposed imbalanced ensemble classifier.


2019 ◽  
Vol 7 (8) ◽  
pp. 394-401
Author(s):  
Yonca Yazirli ◽  
Betül Kan-Kilinç

There are various data mining techniques to handle with huge amount of data sets. Rough set based classification provides an opportunity in the efficiency of algorithms when dealing with larger datasets. The selection of eligible attributes by using an efficient rule set offers decision makers save time and cost. This paper presents the comparison of the performance of the rough set based algorithms: Johnson’ s, Genetic Algorithm and Dynamic reducts. The performance of algorithms is measured based on accuracy, AUC and standard error for a 3-class classification problem on training on test data sets. Based on the test data, the results showed that genetic algorithm overperformed the others.


Sign in / Sign up

Export Citation Format

Share Document