scholarly journals Selection of data set with fuzzy entropy function

2004 ◽  
Vol 14 (5) ◽  
pp. 655-659
Author(s):  
Sang-Hyuk Lee ◽  
Seong-Pyo Cheon ◽  
Sung shin Kim
2018 ◽  
Vol 143 (5) ◽  
pp. 587-592 ◽  
Author(s):  
Pieter J. Slootweg ◽  
Edward W. Odell ◽  
Daniel Baumhoer ◽  
Roman Carlos ◽  
Keith D. Hunter ◽  
...  

A data set has been developed for the reporting of excisional biopsies and resection specimens for malignant odontogenic tumors by members of an expert panel working on behalf of the International Collaboration on Cancer Reporting, an international organization established to unify and standardize reporting of cancers. Odontogenic tumors are rare, which limits evidence-based support for designing a scientifically sound data set for reporting them. Thus, the selection of reportable elements within the data set and considering them as either core or noncore is principally based on evidence from malignancies affecting other organ systems, limited case series, expert opinions, and/or anecdotal reports. Nevertheless, this data set serves as the initial step toward standardized reporting on malignant odontogenic tumors that should evolve over time as more evidence becomes available and functions as a prompt for further research to provide such evidence.


2019 ◽  
Vol 2 (4) ◽  
pp. 530
Author(s):  
Amr Hassan Yassin ◽  
Hany Hamdy Hussien

Due to the exponential growth of E-Business and computing capabilities over the web for a pay-for-use groundwork, the risk factors regarding security issues also increase rapidly. As the usage increases, it becomes very difficult to identify malicious attacks since the attack patterns change. Therefore, host machines in the network must continually be monitored for intrusions since they are the final endpoint of any network. The purpose of this work is to introduce a generalized neural network model that has the ability to detect network intrusions. Two recent heuristic algorithms inspired by the behavior of natural phenomena, namely, the particle swarm optimization (PSO) and gravitational search (GSA) algorithms are introduced. These algorithms are combined together to train a feed forward neural network (FNN) for the purpose of utilizing the effectiveness of these algorithms to reduce the problems of getting stuck in local minima and the time-consuming convergence rate. Dimension reduction focuses on using information obtained from NSL-KDD Cup 99 data set for the selection of some features to discover the type of attacks. Detecting the network attacks and the performance of the proposed model are evaluated under different patterns of network data.


Genetika ◽  
2014 ◽  
Vol 46 (2) ◽  
pp. 545-559 ◽  
Author(s):  
Mirjana Jankulovska ◽  
Sonja Ivanovska ◽  
Ana Marjanovic-Jeromela ◽  
Snjezana Bolaric ◽  
Ljupcho Jankuloski ◽  
...  

In this study, the use of different multivariate approaches to classify rapeseed genotypes based on quantitative traits has been presented. Tree regression analysis, PCA analysis and two-way cluster analysis were applied in order todescribe and understand the extent of genetic variability in spring rapeseed genotype by trait data. The traits which highly influenced seed and oil yield in rapeseed were successfully identified by the tree regression analysis. Principal predictor for both response variables was number of pods per plant (NP). NP and 1000 seed weight could help in the selection of high yielding genotypes. High values for both traits and oil content could lead to high oil yielding genotypes. These traits may serve as indirect selection criteria and can lead to improvement of seed and oil yield in rapeseed. Quantitative traits that explained most of the variability in the studied germplasm were classified using principal component analysis. In this data set, five PCs were identified, out of which the first three PCs explained 63% of the total variance. It helped in facilitating the choice of variables based on which the genotypes? clustering could be performed. The two-way cluster analysissimultaneously clustered genotypes and quantitative traits. The final number of clusters was determined using bootstrapping technique. This approach provided clear overview on the variability of the analyzed genotypes. The genotypes that have similar performance regarding the traits included in this study can be easily detected on the heatmap. Genotypes grouped in the clusters 1 and 8 had high values for seed and oil yield, and relatively short vegetative growth duration period and those in cluster 9, combined moderate to low values for vegetative growth duration and moderate to high seed and oil yield. These genotypes should be further exploited and implemented in the rapeseed breeding program. The combined application of these multivariate methods can assist in deciding how, and based on which traits to select the genotypes, especially in early generations, at the beginning of a breeding program.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Guanghui Liang ◽  
Jianmin Pang ◽  
Zheng Shan ◽  
Runqing Yang ◽  
Yihang Chen

To address emerging security threats, various malware detection methods have been proposed every year. Therefore, a small but representative set of malware samples are usually needed for detection model, especially for machine-learning-based malware detection models. However, current manual selection of representative samples from large unknown file collection is labor intensive and not scalable. In this paper, we firstly propose a framework that can automatically generate a small data set for malware detection. With this framework, we extract behavior features from a large initial data set and then use a hierarchical clustering technique to identify different types of malware. An improved genetic algorithm based on roulette wheel sampling is implemented to generate final test data set. The final data set is only one-eighteenth the volume of the initial data set, and evaluations show that the data set selected by the proposed framework is much smaller than the original one but does not lose nearly any semantics.


Author(s):  
Ernesto Escobedo ◽  
Liliana Arguello ◽  
Marzia Sepe ◽  
Ilaria Parrella ◽  
Stefano Cioncolini ◽  
...  

Abstract The monitoring and diagnostics of Industrial systems is increasing in complexity with larger volume of data collected and with many methods and analytics able to correlate data and events. The setup and training of these methods and analytics are one of the impacting factors in the selection of the most appropriate solution to provide an efficient and effective service, that requires the selection of the most suitable data set for training of models with consequent need of time and knowledge. The study and the related experiences proposed in this paper describe a methodology for tracking features, detecting outliers and derive, in a probabilistic way, diagnostic thresholds to be applied by means of hierarchical models that simplify or remove the selection of the proper training dataset by a subject matter expert at any deployment. This method applies to Industrial systems employing a large number of similar machines connected to a remote data center, with the purpose to alert one or more operators when a feature exceeds the healthy distribution. Some relevant use cases are presented for an aeroderivative gas turbine covering also its auxiliary equipment, with deep dive on the hydraulic starting system. The results, in terms of early anomaly detection and reduced model training effort, are compared with traditional monitoring approaches like fixed threshold. Moreover, this study explains the advantages of this probabilistic approach in a business application like the fleet monitoring and diagnostic advanced services.


Author(s):  
Yonghao Xiao ◽  
Weiyu Yu ◽  
Jing Tian

Image thresholding segmentation based on Bee Colony Algorithm (BCA) and fuzzy entropy is presented in this chapter. The fuzzy entropy function is simplified with single parameter. The BCA is applied to search the minimum value of the fuzzy entropy function. According to the minimum function value, the optimal image threshold is obtained. Experimental results are provided to demonstrate the superior performance of the proposed approach.


2021 ◽  
Author(s):  
Dat Q. Duong ◽  
Quang M. Le ◽  
Tan-Loc Nguyen-Tai ◽  
Hien D. Nguyen ◽  
Minh-Son Dao ◽  
...  

Accurately assessing the air quality index (AQI) values and levels has become an attractive research topic during the last decades. It is a crucial aspect when studying the possible adverse health effects associated with current air quality conditions. This paper aims to utilize machine learning and an appropriate selection of attributes for the air quality estimation problem using various features, including sensor data (humidity, temperature), timestamp features, location features, and public weather data. We evaluated the performance of different learning models and features to study the problem using the data set “MNR-HCM II”. The experimental results show that adopting TLPW features with Stacking generalization yields higher overall performance than other techniques and features in RMSE, accuracy, and F1-score.


2019 ◽  
Vol 48 (4) ◽  
pp. 475-483
Author(s):  
Matthew N. Green

In the U.S. House of Representatives, the majority party constitutes an organizational cartel that monopolizes the selection of chamber leaders. But in state legislatures, that cartel power is sometimes circumvented by a bipartisan bloc that outvotes the leadership preferences of a majority of the majority party. Drawing from an original data set of instances of cross-party organizational coalitions at the state level, I use statistical analysis to test various hypotheses for when these coalitions are more likely to form. The analysis reveals that party ideology does not adequately explain the violation of these cartels; rather, violations depend on the costs associated with keeping the party unified and the benefits that come from selecting the chamber’s top leadership post. This finding underscores the potential vulnerability of organizational cartels and suggests that governing parties are strategic when deciding how fiercely to defend their cartel power.


2019 ◽  
Vol 78 (8) ◽  
pp. 1025-1032 ◽  
Author(s):  
Marco Gattorno ◽  
Michael Hofer ◽  
Silvia Federici ◽  
Federica Vanoni ◽  
Francesca Bovis ◽  
...  

BackgroundDifferent diagnostic and classification criteria are available for hereditary recurrent fevers (HRF)—familial Mediterranean fever (FMF), tumour necrosis factor receptor-associated periodic fever syndrome (TRAPS), mevalonate kinase deficiency (MKD) and cryopyrin-associated periodic syndromes (CAPS)—and for the non-hereditary, periodic fever, aphthosis, pharyngitis and adenitis (PFAPA). We aimed to develop and validate new evidence-based classification criteria for HRF/PFAPA.MethodsStep 1: selection of clinical, laboratory and genetic candidate variables; step 2: classification of 360 random patients from the Eurofever Registry by a panel of 25 clinicians and 8 geneticists blinded to patients’ diagnosis (consensus ≥80%); step 3: statistical analysis for the selection of the best candidate classification criteria; step 4: nominal group technique consensus conference with 33 panellists for the discussion and selection of the final classification criteria; step 5: cross-sectional validation of the novel criteria.ResultsThe panellists achieved consensus to classify 281 of 360 (78%) patients (32 CAPS, 36 FMF, 56 MKD, 37 PFAPA, 39 TRAPS, 81 undefined recurrent fever). Consensus was reached for two sets of criteria for each HRF, one including genetic and clinical variables, the other with clinical variables only, plus new criteria for PFAPA. The four HRF criteria demonstrated sensitivity of 0.94–1 and specificity of 0.95–1; for PFAPA, criteria sensitivity and specificity were 0.97 and 0.93, respectively. Validation of these criteria in an independent data set of 1018 patients shows a high accuracy (from 0.81 to 0.98).ConclusionEurofever proposes a novel set of validated classification criteria for HRF and PFAPA with high sensitivity and specificity.


Author(s):  
Antonia J. Jones ◽  
Dafydd Evans ◽  
Steve Margetts ◽  
Peter J. Durrant

The Gamma Test is a non-linear modelling analysis tool that allows us to quantify the extent to which a numerical input/output data set can be expressed as a smooth relationship. In essence, it allows us to efficiently calculate that part of the variance of the output that cannot be accounted for by the existence of any smooth model based on the inputs, even though this model is unknown. A key aspect of this tool is its speed: the Gamma Test has time complexity O(Mlog M), where M is the number of datapoints. For data sets consisting of a few thousand points and a reasonable number of attributes, a single run of the Gamma Test typically takes a few seconds. In this chapter we will show how the Gamma Test can be used in the construction of predictive models and classifiers for numerical data. In doing so, we will demonstrate the use of this technique for feature selection, and for the selection of embedding dimension when dealing with a time-series.


Sign in / Sign up

Export Citation Format

Share Document