Evolutionary-based feature construction with substitution for data summarization using DARA

Author(s):  
Florence Sia ◽  
Rayner Alfred
2018 ◽  
Vol 24 (3) ◽  
pp. 1623-1627
Author(s):  
Chung Seng Kheau ◽  
Rayner Alfred ◽  
Lau Hui Keng ◽  
Hoe Tung Yew ◽  
Renee Ka Yin Chin

Author(s):  
Kai Han ◽  
Shuang Cui ◽  
Tianshuai Zhu ◽  
Enpei Zhang ◽  
Benwei Wu ◽  
...  

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.


2021 ◽  
Vol 13 (9) ◽  
pp. 1623
Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.


2021 ◽  
Vol 563 ◽  
pp. 342-357
Author(s):  
Nitin Kumar Mishra ◽  
Pramod Kumar Singh
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document