mining methods and algorithms
Recently Published Documents


TOTAL DOCUMENTS

3
(FIVE YEARS 1)

H-INDEX

1
(FIVE YEARS 0)

2021 ◽  
Vol 12 (5-2021) ◽  
pp. 91-103
Author(s):  
Olga V. Fridman ◽  

The article provides a brief overview of Data Mining methods and algorithms which are used in solving various tasks where both quantitative and qualitative data have to be processed. The purpose of the review is a brief description of the methods and algorithms, as well as a list of sources in which they are described in detail. The features of existing approaches to solving such problems are considered, the analysis of modern methods for solving Data Mining problems is carried out.


Author(s):  
L. V. Rudikova

Approaches evolution and concept of data accumulation in warehouse and subsequent Data Mining use is perspective due to the fact that, Belarusian segment of the same IT-developments is organizing. The article describes the general concept for creation a system of storage and practice-oriented data analysis, based on the data warehousing technology. The main aspect in universal system design on storage layer and working with data is approach uses extended data warehouse, based on universal platform of stored data, which grants access to storage and subsequent data analysis different structure and subject domains have compound’s points (nodes) and extended functional with data structure choice option for data storage and subsequent intrasystem integration. Describe the universal system general architecture of storage and analysis practice-oriented data, structural elements. Main components of universal system for storage and processing practice-oriented data are: online data sources, ETL-process, data warehouse, subsystem of analysis, users. An important place in the system is analytical processing of data, information search, document’s storage and providing a software interface for accessing the functionality of the system from the outside. An universal system based on describing concept will allow collection information of different subject domains, get analytical summaries, do data processing and apply appropriate Data Mining methods and algorithms.


Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 2973-2973
Author(s):  
Brian Van Ness ◽  
Majda Haznadar ◽  
Gang Fang ◽  
Wen Wang ◽  
Vanja Paunic ◽  
...  

Abstract Abstract 2973 Disease risk and therapeutic outcomes are impacted by both tumor heterogeneity as well as germline variations found in the population. Multiple myeloma (MM) shows significant heterogeneity in genetic aberrations in tumor cells, that together with inherited polymorphisms, affects disease risk and therapeutic response. In order to identify the impact of genetic variations (SNPs) on MM we have developed a Bank On A Cure platform for examining 3404 SNPs, selected in 983 genes associated with pathways affecting cellular functions important in cancer. Using SNP data sets we sought to identify genetic interactions, beyond single univariate association analysis. The challenge was to use data mining methods that take into account relatively small cohorts of patients, in which false discovery rates typically exceed the power of the study. We report results from using novel computational approaches that efficiently identify higher order SNP interactions associated with disease risk as well as survival outcomes, while minimizing the false discovery rate. The BOAC SNP panel was used to develop a data base on 143 patients selected for short (<1yr) versus long (>3yr) survival in ECOG 9486 and SWOG 9321; as well as 247 newly diagnosed patients and equal number of controls for disease risk analysis. One algorithm developed employs a discriminative pattern mining approach in which defined pathway sets of SNPs are used in combination testing. A second algorithm used identified SNPs that had some association with outcome (survival or disease status); but demonstrated a significant increase in associations when examined in combinations – we refer to this as a p-value jump association. Variations in genes associated with cell cycle, apoptosis, drug metabolism, stress response and immunity reached very low p-values, and survived multiple comparison testing when analyzed in combinations associated with both survival (PFS) predictions as well as analysis of case-control disease risk. Some of the key genetic variations identified in various combinations, included: PTRB, PTEN, CDK5, XRCC4, GSTA4, GPX, DYPD, PCNA, CYP4F2, VEGF, PON1, ALK, and BAG3. The data mining methods and algorithms used, and specific combinations associated with risk and survival, will be presented. These results are being further validated in new cohorts, and functional implications of identified genetic variants are being investigated in HapMap cell lines. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document