Applications of Feature Selection and Regression Techniques in Materials Design

Author(s):  
Partha Dey ◽  
Joe Bible ◽  
Swati Dey ◽  
Somnath Datta

Feature selection is considered as an important preprocessing step to data mining and soft computing, whereas regression is a collection of methods to optimally assess the signal from a noisy output. Both seek to arrive at the dependence and relation between different attributes and a target material property. In the present chapter a flock of regression and feature selection techniques are discussed, and the kind of results that can be obtained with each of them has been illustrated with the help of a dataset on steel. The different methods are capable of abstracting data in different forms, thus revealing hidden knowledge from different perspectives. Choosing the most appropriate method depends on the application at hand and the kind of objective that one is looking for.

Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.


2013 ◽  
Vol 22 (05) ◽  
pp. 1360011 ◽  
Author(s):  
RANDALL WALD ◽  
TAGHI M. KHOSHGOFTAAR ◽  
JOHN C. SLOAN

One of the most important types of signal found in the area of machine condition monitoring/prognostic health monitoring (MCM/PHM) is the vibration signal, a type of waveform. Many time-frequency domain techniques have been proposed to interpret such signals, including wavelet packet decomposition (WPD). Previous work has shown how to extend the WPD algorithm to operate on streaming signals, but the number of output variables becomes exponential in the number of levels of decomposition, hindering data mining in limited-memory environments. Feature selection techniques, well understood in other areas of data mining, can be used to greatly reduce the number of output variables and speed up the machine learning algorithms. This paper presents a case study comparing two versions of WPD both with and without feature selection, demonstrating that removing most of the features produced by the WPD does not impair its performance within the context of MCM/PHM.


2021 ◽  
pp. 171-176
Author(s):  
Ю.И. Нечаев ◽  
Д.В. Никущенко

Рассматривается построение и анализ функций интерпретации моделей нестационарной динамики подводных объектов (ПО) новых поколений на основе функциональных пространств современной теории катастроф (СТК) [1] – [7]. Формальный аппарат концептуальных решений и принципов построения функций интерпретации реализован в нестационарной динамической среде в рамках принципа конкуренции. Процедуры функций интерпретации основаны на использовании различных моделей взаимодействия в зависимости от уровня действующих возмущений. Неопределенность и неполнота исходной информации в динамике взаимодействия ПО в нестационарной среде, определили подход к построению функций интерпретации при построении математического описания задач нестационарной динамики ПО на основе концепции мягких вычислений (Soft Computing) [7] и выявления «скрытых» знаний (Data Mining) [1]. Разработанные модели и алгоритмы интерпретации нестационарной динамики ПО реализованы в функциональном блоке моделирования многофункционального программного комплекса (МПК) динамической визуализации нестационарной динамики ПО в режиме экстренных вычислений (Urgent Computing – UC [6]. The construction and analysis of the interpretation functions of the models of unsteady dynamics of new generation an underwater vehicle (UV) based on the modern theory of disasters (STK) [1] - [7] are considered. The formal apparatus of conceptual solutions and principles of constructing interpretation functions is implemented in a non-stationary dynamic environment within the framework of the principle of competition. The procedures of the interpretation functions are based on the use of various interaction models depending on the level of acting disturbances. The uncertainty and incompleteness of the initial information on the dynamics of the interaction of underwater vehicles in a non-stationary environment determined the approach to constructing interpretation functions when constructing a mathematical description of the problems of non-stationary dynamics of underwater vehicles based on the concept of soft computing (Soft Computing) [7] and the identification of “hidden” knowledge (Data Mining) [1]. The developed models and algorithms for interpreting unsteady dynamics of submarines are implemented in the functional block for modeling a multifunctional software complex (MPC) for dynamic visualization of unsteady dynamics of underwater vehicles in emergency computing mode Urgent Computing [6].


2016 ◽  
Vol 55 (03) ◽  
pp. 234-241 ◽  
Author(s):  
Félix Martín-González ◽  
Javier González-Robledo ◽  
Fernando Sánchez-Hernández ◽  
María Moreno-García

SummaryObjectives: This paper addresses the problem of decision-making in relation to the administration of noninvasive mechanical ventila tion (NIMV) in intensive care units.Methods: Data mining methods were employed to find out the factors influencing the success/failure of NIMV and to predict its results in future patients. These artificial intelligence-based methods have not been applied in this field in spite of the good results obtained in other medical areas.Results: Feature selection methods provided the most influential variables in the success/ failure of NIMV, such as NIMV hours, PaCO2 at the start, PaO2 / FiO2 ratio at the start, hematocrit at the start or PaO2 / FiO2 ratio after two hours. These methods were also used in the preprocessing step with the aim of improving the results of the classifiers. The algorithms provided the best results when the dataset used as input was the one containing the attributes selected with the CFS method. Conclusions: Data mining methods can be successfully applied to determine the most influential factors in the success/failure of NIMV and also to predict NIMV results in future patients. The results provided by classifiers can be improved by preprocessing the data with feature selection techniques.


Sign in / Sign up

Export Citation Format

Share Document