A Highly Scalable and Adaptable Co-Learning Framework on Multimodal Data Mining in a Multimedia Database

This chapter presents a highly scalable and adaptable co-learning framework on multimodal data mining in a multimedia database. The co-learning framework is based on the multiple instance learning theory. The framework enjoys a strong scalability in the sense that the query time complexity is a constant, independent of the database scale, and the mining effectiveness is also independent of the database scale, allowing facilitating a multimodal querying to a very large scale multimedia database. At the same time, this framework also enjoys a strong adaptability in the sense that it allows incrementally updating the database indexing with a constant operation when the database is dynamically updated with new information. Hence, this framework excels many of the existing multimodal data mining methods in the literature that are neither scalable nor adaptable at all. Theoretic analysis and empirical evaluations are provided to demonstrate the advantage of the strong scalability and adaptability. While this framework is general for multimodal data mining in any specific domains, to evaluate this framework, the authors apply it to the Berkeley Drosophila ISH embryo image database for the evaluations of the mining performance. They have compared the framework with a state-of-the-art multimodal data mining method to demonstrate the effectiveness and the promise of the framework.

Download Full-text

A Multiple-Instance Learning Based Approach to Multimodal Data Mining

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch007 ◽

2012 ◽

pp. 124-142

Author(s):

Zhongfei (Mark) Zhang ◽

Zhen Guo ◽

Jia-Yu Pan

Keyword(s):

Data Mining ◽

State Of The Art ◽

Image Database ◽

Multiple Instance Learning ◽

Multimedia Database ◽

Theoretic Analysis ◽

Specific Domain ◽

Multimodal Data ◽

Learning Framework ◽

Multimodal Data Mining

This paper presents multiple-instance learning based approach to multimodal data mining in a multimedia database. This approach is a highly scalable and adaptable framework that the authors call co-learning. Theoretic analysis and empirical evaluations demonstrate the advantage of the strong scalability and adaptability. Although this framework is general for multimodal data mining in any specific domain, to evaluate this framework, the authors apply it to the Berkeley Drosophila ISH embryo image database for the evaluations of the mining performance in comparison with a state-of-the-art multimodal data mining method to showcase the promise of the co-learning framework.

Download Full-text

Enhanced max margin learning on multimodal data mining in a multimedia database

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07 ◽

10.1145/1281192.1281231 ◽

2007 ◽

Cited By ~ 7

Author(s):

Zhen Guo ◽

Zhongfei Zhang ◽

Eric Xing ◽

Christos Faloutsos

Keyword(s):

Data Mining ◽

Multimedia Database ◽

Multimodal Data ◽

Multimodal Data Mining

Download Full-text

A Multiple-Instance Learning Based Approach to Multimodal Data Mining

International Journal of Digital Library Systems ◽

10.4018/jdls.2010040102 ◽

2010 ◽

Vol 1 (2) ◽

pp. 24-42

Author(s):

Zhongfei Zhang ◽

Zhen Guo ◽

Jia-Yu Pan

Keyword(s):

Data Mining ◽

Multiple Instance Learning ◽

Multimodal Data ◽

Multimodal Data Mining

Download Full-text

Multimodal Data Mining in a Multimedia Database Based on Structured Max Margin Learning

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/2742549 ◽

2016 ◽

Vol 10 (3) ◽

pp. 1-30 ◽

Cited By ~ 1

Author(s):

Zhen Guo ◽

Zhongfei (Mark) Zhang ◽

Eric P. Xing ◽

Christos Faloutsos

Keyword(s):

Data Mining ◽

Multimedia Database ◽

Multimodal Data ◽

Multimodal Data Mining

Download Full-text

A decision tree-based multimodal data mining framework for soccer goal detection

2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763) ◽

10.1109/icme.2004.1394176 ◽

2005 ◽

Cited By ~ 8

Author(s):

Shu-Ching Chen ◽

Mei-Ling Shyu ◽

Min Chen ◽

Chengcui Zhang

Keyword(s):

Data Mining ◽

Decision Tree ◽

Multimodal Data ◽

Goal Detection ◽

Multimodal Data Mining

Download Full-text

EDUCATIONAL MULTIMODAL DATA MINING AND FUSION THROUGH KNOWLEDGE GRAPHS FOR TOPIC-RELATION EXTRACTION IN STUDY RECOMMENDATIONS

EDULEARN20 Proceedings ◽

10.21125/edulearn.2020.1020 ◽

2020 ◽

Author(s):

Sahaj Tomar ◽

Hasan Abu Rasheed ◽

Madjid Fathi

Keyword(s):

Data Mining ◽

Relation Extraction ◽

Multimodal Data ◽

Knowledge Graphs ◽

Multimodal Data Mining

Download Full-text

A multimodal data mining framework for soccer goal detection based on decision tree logic

International Journal of Computer Applications in Technology ◽

10.1504/ijcat.2006.012001 ◽

2006 ◽

Vol 27 (4) ◽

pp. 312 ◽

Cited By ~ 38

Author(s):

Shu Ching Chen ◽

Mei Ling Shyu ◽

Chengcui Zhang ◽

Min Chen

Keyword(s):

Data Mining ◽

Decision Tree ◽

Multimodal Data ◽

Goal Detection ◽

Multimodal Data Mining

Download Full-text

CREST - Risk Prediction for Clostridium Difficile Infection Using Multimodal Data Mining

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-319-71273-4_5 ◽

2017 ◽

pp. 52-63 ◽

Cited By ~ 3

Author(s):

Cansu Sen ◽

Thomas Hartvigsen ◽

Elke Rundensteiner ◽

Kajal Claypool

Keyword(s):

Data Mining ◽

Clostridium Difficile ◽

Risk Prediction ◽

Clostridium Difficile Infection ◽

Multimodal Data ◽

Multimodal Data Mining

Download Full-text

Aggregation Tool for Genomic Concepts (ATGC): A deep learning framework for sparse genomic measures and its application to tumor mutational burden

10.1101/2020.08.05.237206 ◽

2020 ◽

Author(s):

Jordan Anaya ◽

John-William Sidhom ◽

Craig A. Cummings ◽

Alexander S. Baras ◽

Keyword(s):

Deep Learning ◽

Large Scale ◽

Multiple Instance Learning ◽

Machine Learning Algorithms ◽

Learning Framework ◽

Mutational Burden ◽

Tumor Mutational Burden ◽

Training Examples ◽

The Individual ◽

Gene Panels

ABSTRACTDeep learning has the ability to extract meaningful features from data given enough training examples. Large scale genomic data are well suited for this class of machine learning algorithms; however, for many of these data the labels are at the level of the sample instead of at the level of the individual genomic measures. To leverage the power of deep learning for these types of data we turn to a multiple instance learning framework, and present an easily extensible tool built with TensorFlow and Keras. We show how this tool can be applied to somatic variants (featurizing genomic position and sequence context), and accurately classify samples according to whether they contain a specific variant (hotspot or tumor suppressor) or whether they contain a type of variant (microsatellite instability). We then apply our model to the calibration of tumor mutational burden (TMB), an increasingly important metric in the field of immunotherapy, across a variety of commonly used gene panels. Regardless of the panel, we observed improvements in regression to the gold standard whole exome derived value for this metric, with additional performance benefits as more data were provided to the model (such as noncoding variants from panel assays). Our results suggest this framework could lead to improvements in a range of tasks where the sample level metric is determined by the aggregation of a set of genomic measures, such as somatic mutations that we focused on in this study.

Download Full-text