A Feature Selection-Based Method for an Ontological Enrichment Process in Geographic Knowledge Modelling

Author(s):  
Mohamed Farah ◽  
Hafedh Nefzi ◽  
Imed Riadh Farah

Nowadays, geographic information becomes too complex and abundant, thus recent research projects have been undertaken to make it manageable and exploitable. Ontologies are considered as a valuable support for geographic information representation. Building geographic ontologies could be viewed as an enrichment process. Alignment of concepts coming from different ontologies is central to the enrichment process and deeply affects the quality of the resulting ontology. The alignment of ontologies is based on using similarity measures. In the literature, there are many models for ontology alignment that mainly differ with respect to the similarity measures they use and the way they are combined. Most of the alignment methods do not deal with the problem of correlation between similarity measures. In this chapter, we address this issue to better decide which similarity measures we should consider to better assess the true similarity between concepts. Our proposal consists of using feature selection methods, in order to select a reduced set of relevant similarity measures.

Author(s):  
Mohamed Farah ◽  
Hafedh Nefzi ◽  
Imed Riadh Farah

Nowadays, geographic information becomes too complex and abundant, thus recent research projects have been undertaken to make it manageable and exploitable. Ontologies are considered as a valuable support for geographic information representation. Building geographic ontologies could be viewed as an enrichment process. Alignment of concepts coming from different ontologies is central to the enrichment process and deeply affects the quality of the resulting ontology. The alignment of ontologies is based on using similarity measures. In the literature, there are many models for ontology alignment that mainly differ with respect to the similarity measures they use and the way they are combined. Most of the alignment methods do not deal with the problem of correlation between similarity measures. In this chapter, we address this issue to better decide which similarity measures we should consider to better assess the true similarity between concepts. Our proposal consists of using feature selection methods, in order to select a reduced set of relevant similarity measures.


2013 ◽  
Vol 18 (10) ◽  
pp. 1284-1297 ◽  
Author(s):  
Felix Reisen ◽  
Xian Zhang ◽  
Daniela Gabriel ◽  
Paul Selzer

High-content screening (HCS) is a powerful tool for drug discovery being capable of measuring cellular responses to chemical disturbance in a high-throughput manner. HCS provides an image-based readout of cellular phenotypes, including features such as shape, intensity, or texture in a highly multiplexed and quantitative manner. The corresponding feature vectors can be used to characterize phenotypes and are thus defined as HCS fingerprints. Systematic analyses of HCS fingerprints allow for objective computational comparisons of cellular responses. Such comparisons therefore facilitate the detection of different compounds with different phenotypic outcomes from high-throughput HCS campaigns. Feature selection methods and similarity measures, as a basis for phenotype identification and clustering, are critical for the quality of such computational analyses. We systematically evaluated 16 different similarity measures in combination with linear and nonlinear feature selection methods for their potential to capture biologically relevant image features. Nonlinear correlation-based similarity measures such as Kendall’s τ and Spearman’s ρ perform well in most evaluation scenarios, outperforming other frequently used metrics (such as the Euclidian distance). We also present four novel modifications of the connectivity map similarity that surpass the original version, in our experiments. This study provides a basis for generic phenotypic analysis in future HCS campaigns.


Now-a-days constraints of hardware and resource efficient routing are very important for construct at Mobile Adhoc Network. Previous works describe the ABC and WO based energy efficiency. The proposal paradigm intends to reach a particular quality requirement for MANET. A novel Swarm Intelligence and Feature selection methods are developed to new routing algorithm for managing high energy. The protocol grouped together based on their structure, energy, computational complexity and path establishment. To evaluated and compares MANET routing protocols for wireless sensor network. Swarm intelligent found that not only energy as well as has the performance in both routing and Quality of Packet delivering. To using Feature selection method is to find optimal route which from feasible route. This method is to improve the quality of MANET and find intrinsic properties of energy management. The new proposal Improved Swarm Intelligence Routing Algorithm (ISRA) to have standard simulation and performance metrics for comparing different protocol using NS2 based simulator and discover the efficiency.


2020 ◽  
Vol 20 (1) ◽  
pp. 232-247
Author(s):  
Mariusz Kubus

AbstractResearch background: The successful learning of classifiers depends on the quality of data. Modeling is especially difficult when the data are unbalanced or contain many irrelevant variables. This is the case in many applications. The classification of rare events is the overarching goal, e.g. in bankruptcy prediction, churn analysis or fraud detection. The problem of irrelevant variables accompanies situations where the specification of the model is not known a priori, thus in typical conditions for data mining analysts.Purpose: The purpose of this paper is to compare the combinations of the most popular strategies of handling unbalanced data with feature selection methods that represent filters, wrappers and embedded methods.Research methodology: In the empirical study, we use real datasets with additionally introduced irrelevant variables. In this way, we are able to recognize which method correctly eliminates irrelevant variables.Results: Having carried out the experiment we conclude that over-sampling does not work in connection with feature selection. Some recommendations of the most promising methods also are given.Novelty: There are many solutions proposed in the literature concerning unbalanced data as well as feature selection. The innovative field of our interests is to examine their interactions.


Author(s):  
Thị Minh Phương Hà ◽  
Thi My Hanh Le ◽  
Thanh Binh Nguyen

The rapid growth of data has become a huge challenge for software systems. The quality of fault predictionmodel depends on the quality of software dataset. High-dimensional data is the major problem that affects the performance of the fault prediction models. In order to deal with dimensionality problem, feature selection is proposed by various researchers. Feature selection method provides an effective solution by eliminating irrelevant and redundant features, reducing computation time and improving the accuracy of the machine learning model. In this study, we focus on research and synthesis of the Filter-based feature selection with several search methods and algorithms. In addition, five filter-based feature selection methods are analyzed using five different classifiers over datasets obtained from National Aeronautics and Space Administration (NASA) repository. The experimental results show that Chi-Square and Information Gain methods had the best influence on the results of predictive models over other filter ranking methods.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Yikun Huang ◽  
Xingsi Xue ◽  
Chao Jiang

Artificial Internet of Things (AIoT) integrates Artificial Intelligence (AI) with the Internet of Things (IoT) to create the sensor network that can communicate and process data. To implement the communications and co-operations among intelligent systems on AIoT, it is necessary to annotate sensor data with the semantic meanings to overcome heterogeneity problem among different sensors, which requires the utilization of sensor ontology. Sensor ontology formally models the knowledge on AIoT by defining the concepts, the properties describing a concept, and the relationships between two concepts. Due to human’s subjectivity, a concept in different sensor ontologies could be defined with different terminologies and contexts, yielding the ontology heterogeneity problem. Thus, before using these ontologies, it is necessary to integrate their knowledge by finding the correspondences between their concepts, i.e., the so-called ontology matching. In this work, a novel sensor ontology matching framework is proposed, which aggregates three kinds of Concept Similarity Measures (CSMs) and an alignment extraction approach to determine the sensor ontology alignment. To ensure the quality of the alignments, we further propose a compact Particle Swarm Optimization algorithm (cPSO) to optimize the aggregating weights for the CSMs and a threshold for filtering the alignment. The experiment utilizes the Ontology Alignment Evaluation Initiative (OAEI)’s conference track and two pairs of real sensor ontologies to test cPSO’s performance. The experimental results show that the quality of the alignments obtained by cPSO statistically outperforms other state-of-the-art sensor ontology matching techniques.


Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 451
Author(s):  
Okechinyere J. Achilonu ◽  
Victor Olago ◽  
Elvira Singh ◽  
René M. J. C. Eijkemans ◽  
Gideon Nimako ◽  
...  

A cancer pathology report is a valuable medical document that provides information for clinical management of the patient and evaluation of health care. However, there are variations in the quality of reporting in free-text style formats, ranging from comprehensive to incomplete reporting. Moreover, the increasing incidence of cancer has generated a high throughput of pathology reports. Hence, manual extraction and classification of information from these reports can be intrinsically complex and resource-intensive. This study aimed to (i) evaluate the quality of over 80,000 breast, colorectal, and prostate cancer free-text pathology reports and (ii) assess the effectiveness of random forest (RF) and variants of support vector machine (SVM) in the classification of reports into benign and malignant classes. The study approach comprises data preprocessing, visualisation, feature selections, text classification, and evaluation of performance metrics. The performance of the classifiers was evaluated across various feature sizes, which were jointly selected by four filter feature selection methods. The feature selection methods identified established clinical terms, which are synonymous with each of the three cancers. Uni-gram tokenisation using the classifiers showed that the predictive power of RF model was consistent across various feature sizes, with overall F-scores of 95.2%, 94.0%, and 95.3% for breast, colorectal, and prostate cancer classification, respectively. The radial SVM achieved better classification performance compared with its linear variant for most of the feature sizes. The classifiers also achieved high precision, recall, and accuracy. This study supports a nationally agreed standard in pathology reporting and the use of text mining for encoding, classifying, and production of high-quality information abstractions for cancer prognosis and research.


Author(s):  
Fatemeh Alighardashi ◽  
Mohammad Ali Zare Chahooki

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.


Sign in / Sign up

Export Citation Format

Share Document