Discretization and Fuzzification of Numerical Attributes in Attribute-Based Learning

Author(s):  
Ivan Bruha ◽  
Petr Berka
Keyword(s):  
Author(s):  
Gemma C. Garriga ◽  
Hannes Heikinheimo ◽  
Jouni K. Seppanen
Keyword(s):  

Author(s):  
Petr Berka ◽  
Ivan Bruha

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.


2019 ◽  
Vol 8 (12) ◽  
pp. 529
Author(s):  
Noa Binski ◽  
Asya Natapov ◽  
Sagi Dalyot

Landmarks are important for assisting in wayfinding and navigation and for enriching user experience. Although many user-generated geotagged sources exist, landmark entities are still mostly retrieved from authoritative geographic sources. Wikipedia, the world’s largest free encyclopedia, stores geotagged information on many geospatial entities, including a very large and well-founded volume of landmark information. However, not all Wikipedia geotagged landmark entities can be considered valuable and instructive. This research introduces an integrated ranking model for mining landmarks from Wikipedia predicated on estimating and weighting their salience. Other than location, the model is based on the entries’ category and attributed data. Preliminary ranking is formulated on the basis of three spatial descriptors associated with landmark salience, namely permanence, visibility, and uniqueness. This ranking is integrated with a score derived from a set of numerical attributes that are associated with public interest in the Wikipedia page―including the number of redirects and the date of the latest edit. The methodology is comparatively evaluated for various areas in different cities. Results show that the developed integrated ranking model is robust in identifying landmark salience, paving the way for incorporation of Wikipedia’s content into navigation systems.


Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 530
Author(s):  
Haitao Ding ◽  
Chu Sun ◽  
Jianqiu Zeng

It is necessary to optimize clustering processing of communication big data numerical attribute feature information in order to improve the ability of numerical attribute mining of communication big data, and thus a big data clustering algorithm based on cloud computing was proposed. The cloud extended distributed feature fitting method was used to process the numerical attribute linear programming of communication big data, and the mutual information feature quantity of communication big data numerical attribute was extracted. Combined with fuzzy C-means clustering and linear regression analysis, the statistical analysis of big data numerical attribute feature information was carried out, and the associated attribute sample set of communication big data numerical attribute cloud grid distribution was constructed. Cloud computing and adaptive quantitative recurrent classifiers were used for data classification, and block template matching and multi-sensor information fusion were combined to search the clustering center automatically to improve the convergence of clustering. The simulation results show that, after the application of this method, the information fusion performance of the clustering process was better, the automatic searching ability of the data clustering center was stronger, the frequency domain equalization control effect was good, the bit error rate was low, the energy consumption was small, and the ability of fuzzy weighted clustering retrieval of numerical attributes of communication big data was effectively improved.


2017 ◽  
Vol 10 (3) ◽  
pp. 1-21
Author(s):  
Zekri Lougmiri

Skyline queries are important in many fields, especially for decision making. In this context, objects or tuples of databases are defined according to some numerical and non numerical attributes. The skyline operator acts on the numerical ones. The algorithms that implements this skyline operator are genrally of progressive or non progressive. The progressive ones return the skyline operator during its execution while non preogressive alogrithms return the result at the end of its execution. This paper presents a new progressive algorithm for computing the skyline points. This algorithm is based on sorting as a preprocessing of the input. The authors present new theorems for deducing promptly the first skyline points and reducing the candidate space. A new version of Divide-and-Conquer algorithm is used for computing the final skyline. Intensive experimentations on both real and synthetic datasets show that our algorithm presents best performance comparatively to other methods.


Sign in / Sign up

Export Citation Format

Share Document