Extracting Ontology Properties from the Web-Tables

Author(s):  
Song-il Cha ◽  
Z. M. Ma

Web-tables are ubiquitous in Web pages. Since tables themselves are organized structurally and semantically, they are good resources from which we can easily extract ontology. But, most Web-tables are designed for intuitive perception of humans, thus, it has a certain limit to interpret table content using only structural information of the table. So this paper focuses on the method for interpretation of table content based on semantic characteristics of the table. In order to obtain many property elements used for ontology inference, in this paper, the authors discuss how to extract ontology properties from Web-tables. The extracted properties include the following elements: Is-a relationship, class-instance relationship, triple, property domain, property range, symmetric property, transitive property, functional property, and inverse functional property, property for defining super-sub relationship. Through experiment, the authors show that their method can effectively extract property elements from Web-tables.

2003 ◽  
Vol 18 ◽  
pp. 149-181 ◽  
Author(s):  
K. Lerman ◽  
S. N. Minton ◽  
C. A. Knoblock

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.


2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


Think India ◽  
2019 ◽  
Vol 22 (2) ◽  
pp. 174-187
Author(s):  
Harmandeep Singh ◽  
Arwinder Singh

Nowadays, internet satisfying people with different services related to different fields. The profit, as well as non-profit organization, uses the internet for various business purposes. One of the major is communicated various financial as well as non-financial information on their respective websites. This study is conducted on the top 30 BSE listed public sector companies, to measure the extent of governance disclosure (non-financial information) on their web pages. The disclosure index approach to examine the extent of governance disclosure on the internet was used. The governance index was constructed and broadly categorized into three dimensions, i.e., organization and structure, strategy & Planning and accountability, compliance, philosophy & risk management. The empirical evidence of the study reveals that all the Indian public sector companies have a website, and on average, 67% of companies disclosed some kind of governance information directly on their websites. Further, we found extreme variations in the web disclosure between the three categories, i.e., The Maharatans, The Navratans, and Miniratans. However, the result of Kruskal-Wallis indicates that there is no such significant difference between the three categories. The study provides valuable insights into the Indian economy. It explored that Indian public sector companies use the internet for governance disclosure to some extent, but lacks symmetry in the disclosure. It is because there is no such regulation for web disclosure. Thus, the recommendation of the study highlighted that there must be such a regulated framework for the web disclosure so that stakeholders ensure the transparency and reliability of the information.


2013 ◽  
Vol 347-350 ◽  
pp. 2758-2762
Author(s):  
Zhi Juan Wang

Negative Internet information is harmful for social stability and national unity. Opinion tendency analyzing can find the negative Internet information. Here, a method based on regular expression is introduces that neednt complex technologies about semantics. This method includes: building negative information bank, designing regular expression and the realization of program. The result gotten from this method verified it works perfect on judging the opinion of the web pages.


Author(s):  
Carmen Domínguez-Falcón ◽  
Domingo Verano-Tacoronte ◽  
Marta Suárez-Fuentes

Purpose The strong regulation of the Spanish pharmaceutical sector encourages pharmacies to modify their business model, giving the customer a more relevant role by integrating 2.0 tools. However, the study of the implementation of these tools is still quite limited, especially in terms of a customer-oriented web page design. This paper aims to analyze the online presence of Spanish community pharmacies by studying the profile of their web pages to classify them by their degree of customer orientation. Design/methodology/approach In total, 710 community pharmacies were analyzed, of which 160 had Web pages. Using items drawn from the literature, content analysis was performed to evaluate the presence of these items on the web pages. Then, after analyzing the scores on the items, a cluster analysis was conducted to classify the pharmacies according to the degree of development of their online customer orientation strategy. Findings The number of pharmacies with a web page is quite low. The development of these websites is limited, and they have a more informational than relational role. The statistical analysis allows to classify the pharmacies in four groups according to their level of development Practical implications Pharmacists should make incremental use of their websites to facilitate real two-way communication with customers and other stakeholders to maintain a relationship with them by having incorporated the Web 2.0 and social media (SM) platforms. Originality/value This study analyses, from a marketing perspective, the degree of Web 2.0 adoption and the characteristics of the websites, in terms of aiding communication and interaction with customers in the Spanish pharmaceutical sector.


Information ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 228 ◽  
Author(s):  
Zuping Zhang ◽  
Jing Zhao ◽  
Xiping Yan

Web page clustering is an important technology for sorting network resources. By extraction and clustering based on the similarity of the Web page, a large amount of information on a Web page can be organized effectively. In this paper, after describing the extraction of Web feature words, calculation methods for the weighting of feature words are studied deeply. Taking Web pages as objects and Web feature words as attributes, a formal context is constructed for using formal concept analysis. An algorithm for constructing a concept lattice based on cross data links was proposed and was successfully applied. This method can be used to cluster the Web pages using the concept lattice hierarchy. Experimental results indicate that the proposed algorithm is better than previous competitors with regard to time consumption and the clustering effect.


2003 ◽  
Vol 9 (1) ◽  
pp. 17-22 ◽  
Author(s):  
E D Lemaire ◽  
G Greene

We produced continuing education material in physical rehabilitation using a variety of electronic media. We compared four methods of delivering the learning modules: in person with a computer projector, desktop videoconferencing, Web pages and CD-ROM. Health-care workers at eight community hospitals and two nursing homes were asked to participate in the project. A total of 394 questionnaires were received for all modalities: 73 for in-person sessions, 50 for desktop conferencing, 227 for Web pages and 44 for CD-ROM. This represents a 100% response rate from the in-person, desktop conferencing and CD-ROM groups; the response rate for the Web group is unknown, since the questionnaires were completed online. Almost all participants found the modules to be helpful in their work. The CD-ROM group gave significantly higher ratings than the Web page group, although all four learning modalities received high ratings. A combination of all four modalities would be required to provide the best possible learning opportunity.


Author(s):  
Satinder Kaur ◽  
Sunil Gupta

Inform plays a very important role in life and nowadays, the world largely depends on the World Wide Web to obtain any information. Web comprises of a lot of websites of every discipline, whereas websites consists of web pages which are interlinked with each other with the help of hyperlinks. The success of a website largely depends on the design aspects of the web pages. Researchers have done a lot of work to appraise the web pages quantitatively. Keeping in mind the importance of the design aspects of a web page, this paper aims at the design of an automated evaluation tool which evaluate the aspects for any web page. The tool takes the HTML code of the web page as input, and then it extracts and checks the HTML tags for the uniformity. The tool comprises of normalized modules which quantify the measures of design aspects. For realization, the tool has been applied on four web pages of distinct sites and design aspects have been reported for comparison. The tool will have various advantages for web developers who can predict the design quality of web pages and enhance it before and after implementation of website without user interaction.


2021 ◽  
Author(s):  
Sumit Bala ◽  
Ambarnil Ghosh ◽  
Subhra Pradhan

AbstractHigh rate of mutation and structural flexibilities in viral proteins quickly make them resistant to the host immune system and existing antiviral strategies. For most of the pathogenic viruses, the key survival strategies lie in their ability to evolve rapidly through mutations that affects the protein structure and function. Along with the experimental research related to antiviral development, computational data mining also plays an important role in deciphering the molecular and genomic signatures of the viral adaptability. Uncovering conserved regions in viral proteins with diverse chemical and biological properties is an important area of research for developing antiviral therapeutics, though assigning those regions is not a trivial work. Advancement in protein structural information databases and repositories, made by experimental research accelerated the in-silico mining of the data to generate more integrative information. Despite of the huge effort on correlating the protein structural information with its sequence, it is still a challenge to defeat the high mutability and adaptability of the viral genomics structure. In this current study, the authors have developed a user-friendly web application interface that will allow users to study and visualize protein segment variabilities in viral proteins and may help to find antiviral strategies. The present work of web application development allows thorough mining of the surface properties and variabilities of viral proteins which in combination with immunogenicity and evolutionary properties make the visualization robust. In combination with previous research on 20-Dimensional Euclidian Geometry based sequence variability characterization algorithm, four other parameters has been considered for this platform: [1] predicted solvent accessibility information, [2] B-Cell epitopic potential, [3] T-Cell epitopic potential and [4] coevolving region of the viral protein. Uniqueness of this study lies in the fact that a protein sequence stretch is being characterized rather than single residue-based information, which helps to compare properties of protein segments with variability. In current work, as an example, beside presenting the web application platform, five proteins of SARS-CoV2 was presented with keeping focus on protein-S. Current web-application database contains 29 proteins from 7 viruses including a GitHub repository of the raw data used in this study. The web application is up and running in the following address: http://www.protsegvar.com.


Sign in / Sign up

Export Citation Format

Share Document