Multilevel Clustering on Very Large Scale of Web Data

Author(s):  
Amine Chemchem ◽  
Habiba Drias
Keyword(s):  
Author(s):  
Juan Li ◽  
Ranjana Sharma ◽  
Yan Bai

Drug discovery is a lengthy, expensive and difficult process. Indentifying and understanding the hidden relationships among drugs, genes, proteins, and diseases will expedite the process of drug discovery. In this paper, we propose an effective methodology to discover drug-related semantic relationships over large-scale distributed web data in medicine, pharmacology and biotechnology. By utilizing semantic web and distributed system technologies, we developed a novel hierarchical knowledge abstraction and an efficient relation discovery protocol. Our approach effectively facilitates the realization of the full potential of harnessing the collective power and utilization of the drug-related knowledge scattered over the Internet.


Author(s):  
Shen Yi ◽  
Shengsheng Shi ◽  
Haitao Wang ◽  
Wu Wei ◽  
Chunfeng Yuan ◽  
...  

Author(s):  
Jia Li ◽  
Yafei Song ◽  
Jianfeng Zhu ◽  
Lele Cheng ◽  
Ying Su ◽  
...  

Author(s):  
Juan Li ◽  
Ranjana Sharma ◽  
Yan Bai

Drug discovery is a lengthy, expensive and difficult process. Indentifying and understanding the hidden relationships among drugs, genes, proteins, and diseases will expedite the process of drug discovery. In this paper, we propose an effective methodology to discover drug-related semantic relationships over large-scale distributed web data in medicine, pharmacology and biotechnology. By utilizing semantic web and distributed system technologies, we developed a novel hierarchical knowledge abstraction and an efficient relation discovery protocol. Our approach effectively facilitates the realization of the full potential of harnessing the collective power and utilization of the drug-related knowledge scattered over the Internet.


2011 ◽  
pp. 2206-2249
Author(s):  
Aidan Hogan ◽  
Andreas Harth ◽  
Axel Polleres

In this article the authors discuss the challenges of performing reasoning on large scale RDF datasets from the Web. Using ter-Horst’s pD* fragment of OWL as a base, the authors compose a rulebased framework for application to web data: they argue their decisions using observations of undesirable examples taken directly from the Web. The authors further temper their OWL fragment through consideration of “authoritative stheirces” which counter-acts an observed behavitheir which we term “ontology hijacking”: new ontologies published on the Web re-defining the semantics of existing entities resident in other ontologies. They then present their system for performing rule-based forward-chaining reasoning which they call SAOR: Scalable Authoritative OWL Reasoner. Based upon observed characteristics of web data and reasoning in general, they design their system to scale: the system is based upon a separation of terminological data from assertional data and comprises of a lightweight in-memory index, on-disk sorts and file-scans. The authors evaluate their methods on a dataset in the order of a hundred million statements collected from real-world Web stheirces and present scale-up experiments on a dataset in the order of a billion statements collected from the Web.


Author(s):  
Xiaoxiao Sun ◽  
Liyi Chen ◽  
Jufeng Yang

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.


Author(s):  
Long Cheng ◽  
Spyros Kotoulas ◽  
Tomas E. Ward ◽  
Georgios Theodoropoulos

2009 ◽  
Author(s):  
Eyal Oren ◽  
Spyros Kotoulas ◽  
George Anadiotis ◽  
Ronny Siebes ◽  
Annette ten Teije ◽  
...  

10.28945/2966 ◽  
2006 ◽  
Author(s):  
Samuel Sambasivam ◽  
Nick Theodosopoulos

The aim of this paper is to evaluate, propose and improve the use of advanced web data clustering techniques, allowing data analysts to conduct more efficient execution of large-scale web data searches. Increasing the efficiency of this search process requires a detailed knowledge of abstract categories, pattern matching techniques, and their relationship to search engine speed. In this paper we compare several alternative advanced techniques of data clustering in creation of abstract categories for these algorithms. These algorithms will be submitted to a side-by-side speed test to determine the effectiveness of their design. In effect this paper serves to evaluate and improve upon the effectiveness of current web data search clustering techniques.


Sign in / Sign up

Export Citation Format

Share Document