Identification of Single Nucleotide Polymorphism Interactions Associated with Survival and Risk In Multiple Myeloma Using Novel Data Mining Methods

Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 2973-2973
Author(s):  
Brian Van Ness ◽  
Majda Haznadar ◽  
Gang Fang ◽  
Wen Wang ◽  
Vanja Paunic ◽  
...  

Abstract Abstract 2973 Disease risk and therapeutic outcomes are impacted by both tumor heterogeneity as well as germline variations found in the population. Multiple myeloma (MM) shows significant heterogeneity in genetic aberrations in tumor cells, that together with inherited polymorphisms, affects disease risk and therapeutic response. In order to identify the impact of genetic variations (SNPs) on MM we have developed a Bank On A Cure platform for examining 3404 SNPs, selected in 983 genes associated with pathways affecting cellular functions important in cancer. Using SNP data sets we sought to identify genetic interactions, beyond single univariate association analysis. The challenge was to use data mining methods that take into account relatively small cohorts of patients, in which false discovery rates typically exceed the power of the study. We report results from using novel computational approaches that efficiently identify higher order SNP interactions associated with disease risk as well as survival outcomes, while minimizing the false discovery rate. The BOAC SNP panel was used to develop a data base on 143 patients selected for short (<1yr) versus long (>3yr) survival in ECOG 9486 and SWOG 9321; as well as 247 newly diagnosed patients and equal number of controls for disease risk analysis. One algorithm developed employs a discriminative pattern mining approach in which defined pathway sets of SNPs are used in combination testing. A second algorithm used identified SNPs that had some association with outcome (survival or disease status); but demonstrated a significant increase in associations when examined in combinations – we refer to this as a p-value jump association. Variations in genes associated with cell cycle, apoptosis, drug metabolism, stress response and immunity reached very low p-values, and survived multiple comparison testing when analyzed in combinations associated with both survival (PFS) predictions as well as analysis of case-control disease risk. Some of the key genetic variations identified in various combinations, included: PTRB, PTEN, CDK5, XRCC4, GSTA4, GPX, DYPD, PCNA, CYP4F2, VEGF, PON1, ALK, and BAG3. The data mining methods and algorithms used, and specific combinations associated with risk and survival, will be presented. These results are being further validated in new cohorts, and functional implications of identified genetic variants are being investigated in HapMap cell lines. Disclosures: No relevant conflicts of interest to declare.

Author(s):  
Manish Gupta ◽  
Jiawei Han

Sequential pattern mining methods have been found to be applicable in a large number of domains. Sequential data is omnipresent. Sequential pattern mining methods have been used to analyze this data and identify patterns. Such patterns have been used to implement efficient systems that can recommend based on previously observed patterns, help in making predictions, improve usability of systems, detect events, and in general help in making strategic product decisions. In this chapter, we discuss the applications of sequential data mining in a variety of domains like healthcare, education, Web usage mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera. We conclude with a summary of the work.


2011 ◽  
Vol 109 ◽  
pp. 729-733
Author(s):  
Jiang Yin ◽  
Yun Li ◽  
Cen Cheng Shen ◽  
Bo Liu

Multi-Relational Sequential mining is one of the areas of data mining that rapidly developed in recent years. However, the performance issues of traditional mining methods are not ideal. To effectively mining the pattern, we proposed an algorithm based on Iceberg concept lattice, adopting optimization methods of partition and merger to just mining the frequent sequences. Experimental results show this algorithm effectively reduced the time complexity of multi-relational sequential pattern mining.


Author(s):  
Kun-Ming Yu ◽  
Sheng-Hui Liu ◽  
Li-Wei Zhou ◽  
Shu-Hao Wu

Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large database. Numerous methods have been proposed to solve this problem, such as Apriori and FP-growth. These are regarded as fundamental frequent pattern mining methods. In addition, parallel computing architectures, such as an on-cloud platform, a grid system, multi-core and GPU platform, have been popular in data mining. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures. In this study, multi-core architectures were used as well as two high efficiency load balancing parallel data mining methods based on the Apriori algorithm. The main goal of the proposed algorithms was to reduce the massive number of duplicate candidates generated using previous methods. This goal was achieved for, in this detailed experimental study the algorithms performed better than the previous methods. The experimental results demonstrated that the proposed algorithms had dramatically reduced computation time when using more threads. Moreover, the observations showed that the workload was equally balanced among the computing units.


2014 ◽  
Vol 519-520 ◽  
pp. 189-192
Author(s):  
Zhuo Shi Li ◽  
Ran Shi Jiang ◽  
Jian Li

Honeypot is a new type of active defense security technologies. This paper attempts to use of data mining methods to be mining and analysis of information collected on the honeypot system. Build a Windows system based on virtual machine technology research honeynet. Data collection be standardized and sequential pattern mining. Finding out the correlation between different data records and frequent with time-based sequence of audit data, which found that,select the law of value of the attack.


2014 ◽  
Vol 490-491 ◽  
pp. 1361-1367
Author(s):  
Xin Huang ◽  
Hui Juan Chen ◽  
Mao Gong Zheng ◽  
Ping Liu ◽  
Jing Qian

With the advent of location-based social media and locationacquisition technologies, trajectory data are becoming more and more ubiquitous in the real world. A lot of data mining algorithms have been successfully applied to trajectory data sets. Trajectory pattern mining has received a lot of attention in recent years. In this paper, we review the most inuential methods as well as typical applications within the context of trajectory pattern mining.


2018 ◽  
Vol 28 ◽  
pp. 01027
Author(s):  
Leszek Ośródka ◽  
Ewa Krajny ◽  
Marek Wojtylak

The paper presents an attempt to use selected data mining methods to determine the influence of a complex of meteorological conditions on the concentrations of PM10 (PM2.5) proffering the example of the regions of Silesia and Northern Moravia. The collection of standard meteorological data has been supplemented by increments and derivatives of measurable weather elements such as vertical pseudo-gradient of air temperature. The main objective was to develop a universal methodology for the assessment of these impacts, i.e. one that would be independent of the analysed pollution. The probability of occurrence (at a given location) of the assumed concentration level as exceeding the value of the specified distributional quintile was adopted as the discriminant of the incidence. As a result of the analyses conducted, incidences of elevated concentrations of air pollution particulate matter PM10 have been identified and the types of weather responsible for the emergence of such situations have also been determined.


2020 ◽  
Vol 15 (2) ◽  
pp. 124-139
Author(s):  
Amela Omerašević ◽  
Jasmina Selimović

AbstractThis paper investigates the impact of risk classification on life insurance ratemaking with particular reference to Bosnia and Herzegovina (BiH). The research is based on a sample of over eighteen thousand insurance policies for passenger vehicles collected over the period 2015-2020. In our empirical investigation we develop a standard risk model based on the application of Poisson Generalized linear models (GLM) for claims frequency estimate and Gamma GLM for claim severity estimate. The analysis reveals that GLM does not provide a reliable parameter estimates for Multi-level factor (MLF) categorical predictors. Although GLM is widely used method to deter insurance premiums, improvements of GLM by using the data mining methods identified in this paper may solve practical challenges for the risk models. The popularity of applying data mining methods in the actuarial community has been growing in recent years due to its efficiency and precision. These models are recommended to be considered in BiH and South East European region in general.


2018 ◽  
Vol 232 ◽  
pp. 02049
Author(s):  
Dalin Xu ◽  
Yingmei Wei

Sequential pattern mining is always a very important branch of time series data mining. The pattern mining with visual means can be used to extract the knowledge of time series data more intuitively. Based on the research content, this paper analyzes the sequence pattern mining methods in different aspects and their combination with visualization technology. We further discuss and summarize the advantages of different visualization methods in discovering the potential patterns in time series data. Different systems and models have their unique information to show the focus. Compared with the characteristics of the model, the development and evolution of visualization technology for the discovery of potential patterns of time series data can be summarized. Finally, this paper discusses its development trend and how to play a greater role in the era of big data.


2021 ◽  
Vol 12 (5-2021) ◽  
pp. 91-103
Author(s):  
Olga V. Fridman ◽  

The article provides a brief overview of Data Mining methods and algorithms which are used in solving various tasks where both quantitative and qualitative data have to be processed. The purpose of the review is a brief description of the methods and algorithms, as well as a list of sources in which they are described in detail. The features of existing approaches to solving such problems are considered, the analysis of modern methods for solving Data Mining problems is carried out.


Sign in / Sign up

Export Citation Format

Share Document