sparse data sets
Recently Published Documents


TOTAL DOCUMENTS

45
(FIVE YEARS 5)

H-INDEX

13
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Lisa Maria Steinheuer ◽  
Sebastian Canzler ◽  
Jörg Hackermüller

AbstractGene correlation network inference from single-cell transcriptomics data potentially allows to gain unprecendented insights into cell type-specific regulatory programs. ScRNA-seq data is severely affected by dropout, which significantly hampers and restrains current downstream analysis. Although newly developed tools are capable to deal with sparse data, no appropriate single-cell network inference workflow has been established. A potential way to end this deadlock is the application of data imputation methods, which already proofed to be useful in specific contexts of single-cell data analysis, e.g., recovering cell clusters. In order to infer cell-type specific networks, two prerequisites must be met: the identification of cluster-specific cell-types and the network inference itself.Here, we propose a benchmarking framework to investigate both objections. By using suitable reference data with inherent correlation structure, six representative imputation tools and appropriate evaluation measures, we were able to systematically infer the impact of data imputation on network inference. Major network structures were found to be preserved in low dropout data sets. For moderately sparse data sets, DCA was able to recover gene correlation structures, although systematically introducing higher correlation values. No imputation tool was able to recover true signals from high dropout data. However, by using an additional biological data set we could show that cell-cell correlation by means of specific marker gene expression was not compromised through data imputation.Our analysis showed that network inference is feasible for low and moderately sparse data sets by using the unimputed and DCA-prepared data, respectively. High sparsity data, on the other side, still pose a major problem since current imputation techniques are not able to facilitate network inference. The annotation of cluster-specific cell-types as a prerequisite is not hampered by data imputation but their power to restore the deeply hidden correlation structures is still not sufficient enough.


Author(s):  
David W. Bullock

Abstract The relative importance of key state-level outcomes upon U.S. national corn and soybean production was examined using correlated component regression, a recently developed regression technique for application to multicollinear and sparse data sets. Standardized coefficients were used to rank the states’ relative importance. A Herfindahl-Hirschman Index was used to measure the degree of concentration among the top ranked states. The empirical analysis looked at two time periods: a pre-Genetic Modification (1975–1995) and a post-Genetic Modification (1996–2017) period. The results indicate that U.S. corn production is becoming less geographically concentrated in terms of state-level importance while the opposite holds true for soybean production.


Author(s):  
Masahiko Gosho ◽  
Tomohiro Ohigashi ◽  
Kengo Nagashima ◽  
Yuri Ito ◽  
Kazushi Maruo

Author(s):  
Anindita Borah ◽  
Bhabesh Nath

Abstract Pattern mining has emerged as a compelling field of data mining over the years. Literature has bestowed ample endeavors in this field of research ranging from frequent pattern mining to rare pattern mining. A precise and impartial analysis of the existing pattern mining techniques has therefore become essential to widen the scope of data analysis using the notion of pattern mining. This paper is therefore an attempt to provide a comparative scrutiny of the fundamental algorithms in the field of pattern mining through performance analysis based on several decisive parameters. The paper provides a structural classification of the widely referenced techniques in four pattern mining categories: frequent, maximal frequent, closed frequent and rare. It provides an analytical comparison of these techniques based on computational time and memory consumption using benchmark real and synthetic data sets. The results illustrate that tree based approaches perform exceptionally well over level wise approaches in case of dense data sets for all the categories. However, for sparse data sets, level wise approaches performed better than the former ones. This study has been carried out with an aim to analyze the pros and cons of the well known pattern mining techniques under different categories. Through this empirical study, an endeavor has been made to enable the researchers identify some fruitful and promising research directions in one of the most remarkable area of research, pattern mining.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Alexander Kensert ◽  
Jonathan Alvarsson ◽  
Ulf Norinder ◽  
Ola Spjuth

2016 ◽  
Vol 11 (2) ◽  
pp. 148-161
Author(s):  
Dijana Oreški ◽  
◽  
Mario Konecki

Sign in / Sign up

Export Citation Format

Share Document