A Highly Scalable and Adaptable Co-Learning Framework on Multimodal Data Mining in a Multimedia Database

Data Mining ◽  
2013 ◽  
pp. 567-586
Author(s):  
Zhongfei (Mark) Zhang ◽  
Zhen Guo ◽  
Christos Faloutsos ◽  
Jia-Yu Pan

This chapter presents a highly scalable and adaptable co-learning framework on multimodal data mining in a multimedia database. The co-learning framework is based on the multiple instance learning theory. The framework enjoys a strong scalability in the sense that the query time complexity is a constant, independent of the database scale, and the mining effectiveness is also independent of the database scale, allowing facilitating a multimodal querying to a very large scale multimedia database. At the same time, this framework also enjoys a strong adaptability in the sense that it allows incrementally updating the database indexing with a constant operation when the database is dynamically updated with new information. Hence, this framework excels many of the existing multimodal data mining methods in the literature that are neither scalable nor adaptable at all. Theoretic analysis and empirical evaluations are provided to demonstrate the advantage of the strong scalability and adaptability. While this framework is general for multimodal data mining in any specific domains, to evaluate this framework, the authors apply it to the Berkeley Drosophila ISH embryo image database for the evaluations of the mining performance. They have compared the framework with a state-of-the-art multimodal data mining method to demonstrate the effectiveness and the promise of the framework.

Author(s):  
Zhen Guo ◽  
Christos Faloutsos ◽  
Zhongfei (Mark) Zhang ◽  
Zhongfei (Mark) Zhang

This chapter presents a highly scalable and adaptable co-learning framework on multimodal data mining in a multimedia database. The co-learning framework is based on the multiple instance learning theory. The framework enjoys a strong scalability in the sense that the query time complexity is a constant, independent of the database scale, and the mining effectiveness is also independent of the database scale, allowing facilitating a multimodal querying to a very large scale multimedia database. At the same time, this framework also enjoys a strong adaptability in the sense that it allows incrementally updating the database indexing with a constant operation when the database is dynamically updated with new information. Hence, this framework excels many of the existing multimodal data mining methods in the literature that are neither scalable nor adaptable at all. Theoretic analysis and empirical evaluations are provided to demonstrate the advantage of the strong scalability and adaptability. While this framework is general for multimodal data mining in any specific domains, to evaluate this framework, the authors apply it to the Berkeley Drosophila ISH embryo image database for the evaluations of the mining performance. They have compared the framework with a state-of-the-art multimodal data mining method to demonstrate the effectiveness and the promise of the framework.


Author(s):  
Zhongfei (Mark) Zhang ◽  
Zhen Guo ◽  
Jia-Yu Pan

This paper presents multiple-instance learning based approach to multimodal data mining in a multimedia database. This approach is a highly scalable and adaptable framework that the authors call co-learning. Theoretic analysis and empirical evaluations demonstrate the advantage of the strong scalability and adaptability. Although this framework is general for multimodal data mining in any specific domain, to evaluate this framework, the authors apply it to the Berkeley Drosophila ISH embryo image database for the evaluations of the mining performance in comparison with a state-of-the-art multimodal data mining method to showcase the promise of the co-learning framework.


2016 ◽  
Vol 10 (3) ◽  
pp. 1-30 ◽  
Author(s):  
Zhen Guo ◽  
Zhongfei (Mark) Zhang ◽  
Eric P. Xing ◽  
Christos Faloutsos

2020 ◽  
Author(s):  
Jordan Anaya ◽  
John-William Sidhom ◽  
Craig A. Cummings ◽  
Alexander S. Baras ◽  

ABSTRACTDeep learning has the ability to extract meaningful features from data given enough training examples. Large scale genomic data are well suited for this class of machine learning algorithms; however, for many of these data the labels are at the level of the sample instead of at the level of the individual genomic measures. To leverage the power of deep learning for these types of data we turn to a multiple instance learning framework, and present an easily extensible tool built with TensorFlow and Keras. We show how this tool can be applied to somatic variants (featurizing genomic position and sequence context), and accurately classify samples according to whether they contain a specific variant (hotspot or tumor suppressor) or whether they contain a type of variant (microsatellite instability). We then apply our model to the calibration of tumor mutational burden (TMB), an increasingly important metric in the field of immunotherapy, across a variety of commonly used gene panels. Regardless of the panel, we observed improvements in regression to the gold standard whole exome derived value for this metric, with additional performance benefits as more data were provided to the model (such as noncoding variants from panel assays). Our results suggest this framework could lead to improvements in a range of tasks where the sample level metric is determined by the aggregation of a set of genomic measures, such as somatic mutations that we focused on in this study.


Sign in / Sign up

Export Citation Format

Share Document