Large scale ontology for semantic web using clustering method over Hadoop

Author(s):  
Surbhi Gopal Atal ◽  
P. N. Chatur
PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262499
Author(s):  
Negin Alisoltani ◽  
Mostafa Ameli ◽  
Mahdi Zargayouna ◽  
Ludovic Leclercq

Real-time ride-sharing has become popular in recent years. However, the underlying optimization problem for this service is highly complex. One of the most critical challenges when solving the problem is solution quality and computation time, especially in large-scale problems where the number of received requests is huge. In this paper, we rely on an exact solving method to ensure the quality of the solution, while using AI-based techniques to limit the number of requests that we feed to the solver. More precisely, we propose a clustering method based on a new shareability function to put the most shareable trips inside separate clusters. Previous studies only consider Spatio-temporal dependencies to do clustering on the mobility service requests, which is not efficient in finding the shareable trips. Here, we define the shareability function to consider all the different sharing states for each pair of trips. Each cluster is then managed with a proposed heuristic framework in order to solve the matching problem inside each cluster. As the method favors sharing, we present the number of sharing constraints to allow the service to choose the number of shared trips. To validate our proposal, we employ the proposed method on the network of Lyon city in France, with half-million requests in the morning peak from 6 to 10 AM. The results demonstrate that the algorithm can provide high-quality solutions in a short time for large-scale problems. The proposed clustering method can also be used for different mobility service problems such as car-sharing, bike-sharing, etc.


Author(s):  
Ming Cao ◽  
Qinke Peng ◽  
Ze-Gang Wei ◽  
Fei Liu ◽  
Yi-Fan Hou

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.


Author(s):  
Juan Li ◽  
Ranjana Sharma ◽  
Yan Bai

Drug discovery is a lengthy, expensive and difficult process. Indentifying and understanding the hidden relationships among drugs, genes, proteins, and diseases will expedite the process of drug discovery. In this paper, we propose an effective methodology to discover drug-related semantic relationships over large-scale distributed web data in medicine, pharmacology and biotechnology. By utilizing semantic web and distributed system technologies, we developed a novel hierarchical knowledge abstraction and an efficient relation discovery protocol. Our approach effectively facilitates the realization of the full potential of harnessing the collective power and utilization of the drug-related knowledge scattered over the Internet.


Author(s):  
Christopher Walton

At the start of this book we outlined the challenges of automatic computer based processing of information on the Web. These numerous challenges are generally referred to as the ‘vision’ of the Semantic Web. From the outset, we have attempted to take a realistic and pragmatic view of this vision. Our opinion is that the vision may never be fully realized, but that it is a useful goal on which to focus. Each step towards the vision has provided new insights on classical problems in knowledge representation, MASs, and Web-based techniques. Thus, we are presently in a significantly better position as a result of these efforts. It is sometimes difficult to see the purpose of the Semantic Web vision behind all of the different technologies and acronyms. However, the fundamental purpose of the Semantic Web is essentially large scale and automated data integration. The Semantic Web is not just about providing a more intelligent kind of Web search, but also about taking the results of these searches and combining them in interesting and useful ways. As stated in Chapter 1, the possible applications for the Semantic Web include: automated data mining, e-science experiments, e-learning systems, personalized newspapers and journals, and intelligent devices. The current state of progress towards the Semantic Web vision is summarized in Figure 8.1. This figure shows a pyramid with the human-centric Web at the bottom, sometimes termed the Syntactic Web, and the envisioned Semantic Web at the top. Throughout this book, we have been moving upwards on this pyramid, and it should be clear that a great deal of progress that has been made towards the goal. This progress is indicated by the various stages of the pyramid, which can be summarized as follows: • The lowest stage on the pyramid is the basic Web that should be familiar to everyone. This Web of information is human-centric and contains very little automation. Nonetheless, the Web provides the basic protocols and technologies on which the Semantic Web is founded. Furthermore, the information which is represented on the Web will ultimately be the source of knowledge for the Semantic Web.


Author(s):  
José Manuel Gómez-Pérez ◽  
Víctor Méndez

Since the use of electronic invoicing in business transactions was approved by the EU back in 2002, its application in Europe has grown considerably. However, despite the existence of standards like EDIFACT or UBL, widespread take up of electronic invoicing has been hindered by the enormous heterogeneity of proprietary solutions. In this chapter, the authors present an approach towards addressing the interoperability problem in electronic invoice exchange, based on ontologies and Semantic Web technologies. The authors propose methods and provide usable tools that leverage the knowledge of users of electronic invoicing systems by empowering them to define correspondences between sample electronic invoice data and a formal model of electronic invoicing represented as networked ontologies. The chapter follows a learn-by-example approach where, based on such correspondences, networked ontologies serve as a semantic hub for large-scale transformation of e-invoice data between heterogeneous e-invoicing formats and models. The approach has been evaluated through the development of a reference implementation and its deployment in the pharmaceutical sector.


2011 ◽  
Vol 20 (01) ◽  
pp. 30-32
Author(s):  
P. Ruch ◽  

SummaryTo summarize current advances of the so-called Web 3.0 and emerging trends of the semantic web.We provide a synopsis of the articles selected for the IMIA Yearbook 2011, from which we attempt to derive a synthetic overview of the today’s and future activities in the field.while the state of the research in the field is illustrated by a set of fairly heterogeneous studies, it is possible to identify significant clusters. While the most salient challenge and obsessional target of the semantic web remains its ambition to simply interconnect all available information, it is interesting to observe the developments of complementary research fields such as information sciences and text analytics. The combined expression power and virtually unlimited data aggregation skills of Web 3.0 technologies make it a disruptive instrument to discover new biomedical knowledge. In parallel, such an unprecedented situation creates new threats for patients participating in large-scale genetic studies as Wjst demonstrate how various data set can be coupled to re-identify anonymous genetic information.The best paper selection of articles on decision support shows examples of excellent research on methods concerning original development of core semantic web techniques as well as transdisciplinary achievements as exemplified with literature-based analytics. This selected set of scientific investigations also demonstrates the needs for computerized applications to transform the biomedical data overflow into more operational clinical knowledge with potential threats for confidentiality directly associated with such advances. Altogether these papers support the idea that more elaborated computer tools, likely to combine heterogeneous text and data contents should soon emerge for the benefit of both experimentalists and hopefully clinicians.


2016 ◽  
Vol 25 (01) ◽  
pp. 184-187
Author(s):  
J. Charlet ◽  
L. F. Soualmia ◽  

Summary Objectives: To summarize excellent current research in the field of Knowledge Representation and Management (KRM) within the health and medical care domain. Method: We provide a synopsis of the 2016 IMIA selected articles as well as a related synthetic overview of the current and future field activities. A first step of the selection was performed through MEDLINE querying with a list of MeSH descriptors completed by a list of terms adapted to the KRM section. The second step of the selection was completed by the two section editors who separately evaluated the set of 1,432 articles. The third step of the selection consisted of a collective work that merged the evaluation results to retain 15 articles for peer-review. Results: The selection and evaluation process of this Yearbook’s section on Knowledge Representation and Management has yielded four excellent and interesting articles regarding semantic interoperability for health care by gathering heterogeneous sources (knowledge and data) and auditing ontologies. In the first article, the authors present a solution based on standards and Semantic Web technologies to access distributed and heterogeneous datasets in the domain of breast cancer clinical trials. The second article describes a knowledge-based recommendation system that relies on ontologies and Semantic Web rules in the context of chronic diseases dietary. The third article is related to concept-recognition and text-mining to derive common human diseases model and a phenotypic network of common diseases. In the fourth article, the authors highlight the need for auditing the SNOMED CT. They propose to use a crowd-based method for ontology engineering. Conclusions: The current research activities further illustrate the continuous convergence of Knowledge Representation and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care by proposing solutions to cope with the problem of semantic interoperability. Indeed, there is a need for powerful tools able to manage and interpret complex, large-scale and distributed datasets and knowledge bases, but also a need for user-friendly tools developed for the clinicians in their daily practice.


Author(s):  
Xu Yin ◽  
Hong Xingyong ◽  
Zhou Wenjiang ◽  
Wang Lunwen ◽  
Zhang Ling ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document