scholarly journals Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality

Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1105
Author(s):  
Evgeny M. Mirkes ◽  
Jeza Allohibi ◽  
Alexander Gorban

The curse of dimensionality causes the well-known and widely discussed problems for machine learning methods. There is a hypothesis that using the Manhattan distance and even fractional lp quasinorms (for p less than 1) can help to overcome the curse of dimensionality in classification problems. In this study, we systematically test this hypothesis. It is illustrated that fractional quasinorms have a greater relative contrast and coefficient of variation than the Euclidean norm l2, but it is shown that this difference decays with increasing space dimension. It has been demonstrated that the concentration of distances shows qualitatively the same behaviour for all tested norms and quasinorms. It is shown that a greater relative contrast does not mean a better classification quality. It was revealed that for different databases the best (worst) performance was achieved under different norms (quasinorms). A systematic comparison shows that the difference in the performance of kNN classifiers for lp at p = 0.5, 1, and 2 is statistically insignificant. Analysis of curse and blessing of dimensionality requires careful definition of data dimensionality that rarely coincides with the number of attributes. We systematically examined several intrinsic dimensions of the data.

Author(s):  
M. Vidyasagar

The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.


2018 ◽  
Author(s):  
Ceren Tozlu ◽  
Dylan Edwards ◽  
Aaron Boes ◽  
Douglas Labar ◽  
K. Zoe Tsagaris ◽  
...  

AbstractAccurate predictions of motor improvement resulting from intensive therapy in chronic stroke patients is a difficult task for clinicians, but is key in prescribing appropriate therapeutic strategies. Statistical methods, including machine learning, are a highly promising avenue with which to improve prediction accuracy in clinical practice. The first main objective of this study was to use machine learning methods to predict a chronic stroke individual’s motor function improvement after 6 weeks of intervention using pre-intervention demographic, clinical, neurophysiological and imaging data. The second main objective was to identify which data elements were most important in predicting chronic stroke patients’ impairment after 6 weeks of intervention. Data from one hundred and two patients (Female: 31%, age 61±11 years) who suffered first ischemic stroke 3-12 months prior were included in this study. After enrollment, patients underwent 6 weeks of intensive motor and transcranial magnetic stimulation therapy. Age, gender, handedness, time since stroke, pre-intervention Fugl-Meyer Assessment, stroke lateralization, the difference in motor threshold between the unaffected and affected hemispheres, absence or presence of motor evoked potential in the affected hemisphere and various imaging metrics were used as predictors of post-intervention Fugl-Meyer Assessment. Five machine learning methods, including Elastic-Net, Support Vector Machines, Artificial Neural Networks, Classification and Regression Trees, and Random Forest, were used to predict post-intervention Fugl-Meyer Assessment based on either demographic, clinical and neurophysiological data alone or in combination with the imaging metrics. Cross-validated R-squared and root of mean squared error were used to assess the prediction accuracy and compare the performance of methods. Elastic-Net performed significantly better than the other methods for the model containing pre-intervention Fugl-Meyer Assessment, demographic, clinical and neurophysiological data as predictors of post-intervention Fugl-Meyer Assessment (). Pre-intervention Fugl-Meyer Assessment and difference in motor threshold between affected and unaffected hemispheres were commonly found as the strongest two predictors in the clinical model. The difference in motor threshold had greater importance than the absence or presence of motor evoked potential in the affected hemisphere. The various imaging metrics, including lesion overlap with the spinal cord, largely did not improve the model performance. The approach implemented here may enable clinicians to more accurately predict a chronic stroke patient’s individual response to intervention. The predictive models used in this study could assist clinicians in making treatment decisions and improve the accuracy of prognosis in chronic stroke patients.


2019 ◽  
Author(s):  
Javier de Velasco Oriol ◽  
Antonio Martinez-Torteya ◽  
Victor Trevino ◽  
Israel Alanis ◽  
Edgar E. Vallejo ◽  
...  

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.


Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3145
Author(s):  
Ivan Shcherbatov ◽  
Evgeny Lisin ◽  
Andrey Rogalev ◽  
Grigory Tsurikov ◽  
Marek Dvořák ◽  
...  

Our paper proposes a method for constructing a system for predicting defects and failures of power equipment and the time of their occurrence based on the joint solution of regression and classification problems using machine learning methods. A distinctive feature of this method is the use of the equipment’s technical condition index as an informative parameter. The results of calculating and visualizing the technical condition index in relation to the electro-hydraulic automatic control system of hydropower turbine when predicting the defect “clogging of drainage channels” showed that its determination both for an equipment and for a group of its functional units allows one to quickly and with the required accuracy assess the arising technological disturbances in the operation of power equipment. In order to predict the behavior of the technical condition index of the automatic control system of the turbine, the optimal tuning of the LSTM model of the recurrent neural network was developed and carried out. The result of the application of the model was the forecast of the technical condition index achievement and the limiting characteristic according to the current time data on its values. The developed model accurately predicted the behavior of the technical condition index at time intervals of 3 and 10 h, which made it possible to draw a conclusion about its applicability for early identification of the investigated defect in the automatic control system of the turbine. Thus, we can conclude that the joint solution of regression and classification problems using an information parameter in the form of a technical condition index allows one to develop systems for predicting defects, one significant advantage of which is the ability to early determine the development of degradation phenomena in power equipment.


2020 ◽  
Vol 34 (5) ◽  
pp. 428-439 ◽  
Author(s):  
Ceren Tozlu ◽  
Dylan Edwards ◽  
Aaron Boes ◽  
Douglas Labar ◽  
K. Zoe Tsagaris ◽  
...  

Background. Accurate prediction of clinical impairment in upper-extremity motor function following therapy in chronic stroke patients is a difficult task for clinicians but is key in prescribing appropriate therapeutic strategies. Machine learning is a highly promising avenue with which to improve prediction accuracy in clinical practice. Objectives. The objective was to evaluate the performance of 5 machine learning methods in predicting postintervention upper-extremity motor impairment in chronic stroke patients using demographic, clinical, neurophysiological, and imaging input variables. Methods. A total of 102 patients (female: 31%, age 61 ± 11 years) were included. The upper-extremity Fugl-Meyer Assessment (UE-FMA) was used to assess motor impairment of the upper limb before and after intervention. Elastic net (EN), support vector machines, artificial neural networks, classification and regression trees, and random forest were used to predict postintervention UE-FMA. The performances of methods were compared using cross-validated R2. Results. EN performed significantly better than other methods in predicting postintervention UE-FMA using demographic and baseline clinical data (median [Formula: see text] P < .05). Preintervention UE-FMA and the difference in motor threshold (MT) between the affected and unaffected hemispheres were the strongest predictors. The difference in MT had greater importance than the absence or presence of a motor-evoked potential (MEP) in the affected hemisphere. Conclusion. Machine learning methods may enable clinicians to accurately predict a chronic stroke patient’s postintervention UE-FMA. Interhemispheric difference in the MT is an important predictor of chronic stroke patients’ response to therapy and, therefore, could be included in prospective studies.


2021 ◽  
Vol 14 (6) ◽  
pp. 4335-4353
Author(s):  
Thomas Rieutord ◽  
Sylvain Aubert ◽  
Tiago Machado

Abstract. The atmospheric boundary layer height (BLH) is a key parameter for many meteorological applications, including air quality forecasts. Several algorithms have been proposed to automatically estimate BLH from lidar backscatter profiles. However recent advances in computing have enabled new approaches using machine learning that are seemingly well suited to this problem. Machine learning can handle complex classification problems and can be trained by a human expert. This paper describes and compares two machine-learning methods, the K-means unsupervised algorithm and the AdaBoost supervised algorithm, to derive BLH from lidar backscatter profiles. The K-means for Atmospheric Boundary Layer (KABL) and AdaBoost for Atmospheric Boundary Layer (ADABL) algorithm codes used in this study are free and open source. Both methods were compared to reference BLHs derived from colocated radiosonde data over a 2-year period (2017–2018) at two Météo-France operational network sites (Trappes and Brest). A large discrepancy between the root-mean-square error (RMSE) and correlation with radiosondes was observed between the two sites. At the Trappes site, KABL and ADABL outperformed the manufacturer's algorithm, while the performance was clearly reversed at the Brest site. We conclude that ADABL is a promising algorithm (RMSE of 550 m at Trappes, 800 m for manufacturer) but has training issues that need to be resolved; KABL has a lower performance (RMSE of 800 m at Trappes) than ADABL but is much more versatile.


Author(s):  
Rui Xia ◽  
Mengran Zhang ◽  
Zixiang Ding

The emotion cause extraction (ECE) task aims at discovering the potential causes behind a certain emotion expression in a document. Techniques including rule-based methods, traditional machine learning methods and deep neural networks have been proposed to solve this task. However, most of the previous work considered ECE as a set of independent clause classification problems and ignored the relations between multiple clauses in a document. In this work, we propose a joint emotion cause extraction framework, named RNN-Transformer Hierarchical Network (RTHN), to encode and classify multiple clauses synchronously. RTHN is composed of a lower word-level encoder based on RNNs to encode multiple words in each clause, and an upper clause-level encoder based on Transformer to learn the correlation between multiple clauses in a document. We furthermore propose ways to encode the relative position and global predication information into Transformer that can capture the causality between clauses and make RTHN more efficient. We finally achieve the best performance among 12 compared systems and improve the F1 score of the state-of-the-art from 72.69% to 76.77%.


Author(s):  
Xenia Naidenova

This chapter discusses a revised definition of classification (diagnostic) test. This definition allows considering the problem of inferring classification tests as the task of searching for the best approximations of a given classification on a given set of data. Machine learning methods are reduced to this task. An algebraic model of diagnostic task is brought forward founded upon the partition lattice in which object, class, attribute, value of attribute take their interpretations.


Sign in / Sign up

Export Citation Format

Share Document