scholarly journals Hierarchical Feature Selection with Recursive Regularization

Author(s):  
Hong Zhao ◽  
Pengfei Zhu ◽  
Ping Wang ◽  
Qinghua Hu

In the big data era, the sizes of datasets have increased dramatically in terms of the number of samples, features, and classes. In particular, there exists usually a hierarchical structure among the classes. This kind of task is called hierarchical classification. Various algorithms have been developed to select informative features for flat classification. However, these algorithms ignore the semantic hyponymy in the directory of hierarchical classes, and select a uniform subset of the features for all classes. In this paper, we propose a new technique for hierarchical feature selection based on recursive regularization. This algorithm takes the hierarchical information of the class structure into account. As opposed to flat feature selection, we select different feature subsets for each node in a hierarchical tree structure using the parent-children relationships and the sibling relationships for hierarchical regularization. By imposing $\ell_{2,1}$-norm regularization to different parts of the hierarchical classes, we can learn a sparse matrix for the feature ranking of each node. Extensive experiments on public datasets demonstrate the effectiveness of the proposed algorithm.

2021 ◽  
Author(s):  
Combiz Khozoie ◽  
Nurun Fancy ◽  
Mahdi Moradi Marjaneh ◽  
Alan E. Murphy ◽  
Paul M. Matthews ◽  
...  

Advances in single-cell RNA-sequencing technology over the last decade have enabled exponential increases in throughput: datasets with over a million cells are becoming commonplace. The burgeoning scale of data generation, combined with the proliferation of alternative analysis methods, led us to develop the scFlow toolkit and the nf-core/scflow pipeline for reproducible, efficient, and scalable analyses of single-cell and single-nuclei RNA-sequencing data. The scFlow toolkit provides a higher level of abstraction on top of popular single-cell packages within an R ecosystem, while the nf-core/scflow Nextflow pipeline is built within the nf-core framework to enable compute infrastructure-independent deployment across all institutions and research facilities. Here we present our flexible pipeline, which leverages the advantages of containerization and the potential of Cloud computing for easy orchestration and scaling of the analysis of large case/control datasets by even non-expert users. We demonstrate the functionality of the analysis pipeline from sparse-matrix quality control through to insight discovery with examples of analysis of four recently published public datasets and describe the extensibility of scFlow as a modular, open-source tool for single-cell and single nuclei bioinformatic analyses.


Author(s):  
Gerald Schaefer

As image databases are growing, efficient and effective methods for managing such large collections are highly sought after. Content-based approaches have shown large potential in this area as they do not require textual annotation of images. However, while for image databases the query-by-example concept is at the moment the most commonly adopted retrieval method, it is only of limited practical use. Techniques which allow human-centred navigation and visualization of complete image collections therefore provide an interesting alternative. In this chapter we present an effective and efficient approach for user-centred navigation of large image databases. Image thumbnails are projected onto a spherical surface so that images that are visually similar are located close to each other in the visualization space. To avoid overlapping and occlusion effects images are placed on a regular grid structure while large databases are handled through a clustering technique paired with a hierarchical tree structure which allows for intuitive real-time browsing experience.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Shahidul Islam Khan ◽  
Abu Sayed Md Latiful Hoque

Abstract In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the stakeholders, efficiently handling missing values becomes more important. In this paper, we have proposed a new technique for missing data imputation, which is a hybrid approach of single and multiple imputation techniques. We have proposed an extension of popular Multivariate Imputation by Chained Equation (MICE) algorithm in two variations to impute categorical and numeric data. We have also implemented twelve existing algorithms to impute binary, ordinal, and numeric missing values. We have collected sixty-five thousand real health records from different hospitals and diagnostic centers of Bangladesh, maintaining the privacy of data. We have also collected three public datasets from the UCI Machine Learning Repository, ETH Zurich, and Kaggle. We have compared the performance of our proposed algorithms with existing algorithms using these datasets. Experimental results show that our proposed algorithm achieves 20% higher F-measure for binary data imputation and 11% less error for numeric data imputations than its competitors with similar execution time.


2012 ◽  
Vol 155-156 ◽  
pp. 375-380 ◽  
Author(s):  
Wu Ling Ren ◽  
Jin Ju Guo

To make the word similarity calculated results more reasonable and accurate, a new word similarity algorithm is proposed. It uses HowNet primitive hierarchical tree structure, and calculates the two primitives’ distance with the method computing WordNet node distance which considers the tree depth, density, path and connecting intensity, etc. Moreover, algorithm also improves the method that distance into similarity. Finally, this algorithm is compared with related algorithms through experiment. The results show that the proposed algorithm effectively improves the precision and accuracy of word similarity calculation.


2017 ◽  
Vol 21 (4) ◽  
pp. 945-962 ◽  
Author(s):  
Sebastián Maldonado ◽  
Guillermo Armelini ◽  
C. Angelo Guevara

Sign in / Sign up

Export Citation Format

Share Document