scholarly journals LDRD 99-ERI-010 Final Report: Sapphire: Scalable Pattern Recognition for Large-Scale Scientific Data Mining

2002 ◽  
Author(s):  
C Kamath
2002 ◽  
Vol 14 (4) ◽  
pp. 731-749 ◽  
Author(s):  
Xiong Wang ◽  
J.T.L. Wang ◽  
D. Shasha ◽  
B.A. Shapiro ◽  
I. Rigoutsos ◽  
...  

Author(s):  
Rahul Ramachandran ◽  
Sara Graves ◽  
John Rushing ◽  
Ken Keizer ◽  
Manil Maskey ◽  
...  

2017 ◽  
Author(s):  
Joon-Yong Lee ◽  
Grant M. Fujimoto ◽  
Ryan Wilson ◽  
H. Steven Wiley ◽  
Samuel H. Payne

AbstractIdentifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is that the vast majority of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.


2021 ◽  
Vol 284 ◽  
pp. 04018
Author(s):  
Akhram Nishanov ◽  
Bakhtiyorjon Akbaraliev ◽  
Rasul Beglerbekov ◽  
Oybek Akhmedov ◽  
Shukhrat Tajibaev ◽  
...  

Feature selection is one of the most important issues in Data Mining and Pattern Recognition. Correctly selected features or a set of features in the final report determines the success of further work, in particular, the solution of the classification and forecasting problem. This work is devoted to the development and study of an analytical method for determining informative attribute sets (IAS) taking into account the resource for criteria based on the use of the scattering measure of classified objects. The areas of existence of the solution are determined. Statements and properties are proved for the Fisher type informativeness criterion, using which the proposed analytical method for determining IAS guarantees the optimality of results in the sense of maximizing the selected functional. The relevance of choosing this type of informativeness criterion is substantiated. The universality of the method with respect to the type of features is shown. An algorithm for implementing this method is presented. In addition, the paper discussed the dynamics of the growth of information in the world, problems associated with big data, as well as problems and tasks of data preprocessing. The relevance of reducing the dimension of the attribute space for the implementation of data processing and visualization without unnecessary difficulties is substantiated. The disadvantages of existing methods and algorithms for choosing an informative set of attributes are shown.


Sign in / Sign up

Export Citation Format

Share Document