scholarly journals The Problem of Reference Rot in Spatial Metadata Catalogues

2021 ◽  
Vol 11 (1) ◽  
pp. 27
Author(s):  
Sergio Martin-Segura ◽  
Francisco Javier Lopez-Pellicer ◽  
Javier Nogueras-Iso ◽  
Javier Lacasta ◽  
Francisco Javier Zarazaga-Soria

The content at the end of any hyperlink is subject to two phenomena: the link may break (Link Rot) or the content at the end of the link may no longer be the same as it was when it was created (Content Drift). Reference Rot denotes the combination of both effects. Spatial metadata records rely on hyperlinks for indicating the location of the resources they describe. Therefore, they are also subject to Reference Rot. This paper evaluates the presence of Reference Rot and its impact on the 22,738 distribution URIs of 18,054 metadata records from 26 European INSPIRE spatial data catalogues. Our Link Rot checking method detects broken links while considering the specific requirements of spatial data services. Our Content Drift checking method uses the data format as an indicator. It compares the data formats declared in the metadata with the actual data types returned by the hyperlinks. Findings show that 10.41% of the distribution URIs suffer from Link Rot and at least 6.21% of records suffer from Content Drift (do not declare its distribution types correctly). Additionally, 14.94% of metadata records only contain intermediate HTML web pages as distribution URIs and 31.37% contain at least one HTML web page; thus, they cannot be accessed or checked directly.

2020 ◽  
Vol 14 ◽  
Author(s):  
Shefali Singhal ◽  
Poonam Tanwar

Abstract:: Now-a-days when everything is going digitalized, internet and web plays a vital role in everyone’s life. When one has to ask something or has any online task to perform, one has to use internet to access relevant web-pages throughout. These web-pages are mainly designed for large screen terminals. But due to mobility, handy and economic reasons most of the persons are using small screen terminals (SST) like mobile phone, palmtop, pagers, tablet computers and many more. Reading a web page which is actually designed for large screen terminal on a small screen is time consuming and cumbersome task because there are many irrelevant content parts which are to be scrolled or there are advertisements, etc. Here main concern is e-business users. To overcome such issues the source code of a web page is organized in tree data-structure. In this paper we are arranging each and every main heading as a root node and all the content of this heading as a child node of the logical structure. Using this structure, we regenerate a web-page automatically according to SST size. Background:: DOM and VIPS algorithms are the main background techniques which are supporting the current research. Objective:: To restructure a web page in a more user friendly and content presenting format. Method Backtracking:: Method Backtracking: Results:: web page heading queue generation. Conclusion:: Concept of logical structure supports every SST.


Author(s):  
B Sathiya ◽  
T.V. Geetha

The prime textual sources used for ontology learning are a domain corpus and dynamic large text from web pages. The first source is limited and possibly outdated, while the second is uncertain. To overcome these shortcomings, a novel ontology learning methodology is proposed to utilize the different sources of text such as a corpus, web pages and the massive probabilistic knowledge base, Probase, for an effective automated construction of ontology. Specifically, to discover taxonomical relations among the concept of the ontology, a new web page based two-level semantic query formation methodology using the lexical syntactic patterns (LSP) and a novel scoring measure: Fitness built on Probase are proposed. Also, a syntactic and statistical measure called COS (Co-occurrence Strength) scoring, and Domain and Range-NTRD (Non-Taxonomical Relation Discovery) algorithms are proposed to accurately identify non-taxonomical relations(NTR) among concepts, using evidence from the corpus and web pages.


Author(s):  
He Hu ◽  
Xiaoyong Du

Online tagging is crucial for the acquisition and organization of web knowledge. We present TYG (Tag-as-You-Go) in this paper, a web browser extension for online tagging of personal knowledge on standard web pages. We investigate an approach to combine a K-Medoid-style clustering algorithm with the user input to achieve semi-automatic web page annotation. The annotation process supports user-defined tagging schema and comprises an automatic mechanism that is built upon clustering techniques, which can automatically group similar HTML DOM nodes into clusters corresponding to the user specification. TYG is a prototype system illustrating the proposed approach. Experiments with TYG show that our approach can achieve both efficiency and effectiveness in real world annotation scenarios.


2002 ◽  
Vol 7 (1) ◽  
pp. 9-25 ◽  
Author(s):  
Moses Boudourides ◽  
Gerasimos Antypas

In this paper we are presenting a simple simulation of the Internet World-Wide Web, where one observes the appearance of web pages belonging to different web sites, covering a number of different thematic topics and possessing links to other web pages. The goal of our simulation is to reproduce the form of the observed World-Wide Web and of its growth, using a small number of simple assumptions. In our simulation, existing web pages may generate new ones as follows: First, each web page is equipped with a topic concerning its contents. Second, links between web pages are established according to common topics. Next, new web pages may be randomly generated and subsequently they might be equipped with a topic and be assigned to web sites. By repeated iterations of these rules, our simulation appears to exhibit the observed structure of the World-Wide Web and, in particular, a power law type of growth. In order to visualise the network of web pages, we have followed N. Gilbert's (1997) methodology of scientometric simulation, assuming that web pages can be represented by points in the plane. Furthermore, the simulated graph is found to possess the property of small worlds, as it is the case with a large number of other complex networks.


Author(s):  
Carmen Domínguez-Falcón ◽  
Domingo Verano-Tacoronte ◽  
Marta Suárez-Fuentes

Purpose The strong regulation of the Spanish pharmaceutical sector encourages pharmacies to modify their business model, giving the customer a more relevant role by integrating 2.0 tools. However, the study of the implementation of these tools is still quite limited, especially in terms of a customer-oriented web page design. This paper aims to analyze the online presence of Spanish community pharmacies by studying the profile of their web pages to classify them by their degree of customer orientation. Design/methodology/approach In total, 710 community pharmacies were analyzed, of which 160 had Web pages. Using items drawn from the literature, content analysis was performed to evaluate the presence of these items on the web pages. Then, after analyzing the scores on the items, a cluster analysis was conducted to classify the pharmacies according to the degree of development of their online customer orientation strategy. Findings The number of pharmacies with a web page is quite low. The development of these websites is limited, and they have a more informational than relational role. The statistical analysis allows to classify the pharmacies in four groups according to their level of development Practical implications Pharmacists should make incremental use of their websites to facilitate real two-way communication with customers and other stakeholders to maintain a relationship with them by having incorporated the Web 2.0 and social media (SM) platforms. Originality/value This study analyses, from a marketing perspective, the degree of Web 2.0 adoption and the characteristics of the websites, in terms of aiding communication and interaction with customers in the Spanish pharmaceutical sector.


Information ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 228 ◽  
Author(s):  
Zuping Zhang ◽  
Jing Zhao ◽  
Xiping Yan

Web page clustering is an important technology for sorting network resources. By extraction and clustering based on the similarity of the Web page, a large amount of information on a Web page can be organized effectively. In this paper, after describing the extraction of Web feature words, calculation methods for the weighting of feature words are studied deeply. Taking Web pages as objects and Web feature words as attributes, a formal context is constructed for using formal concept analysis. An algorithm for constructing a concept lattice based on cross data links was proposed and was successfully applied. This method can be used to cluster the Web pages using the concept lattice hierarchy. Experimental results indicate that the proposed algorithm is better than previous competitors with regard to time consumption and the clustering effect.


2002 ◽  
Vol 13 (04) ◽  
pp. 521-530 ◽  
Author(s):  
WEN GAO ◽  
SHI WANG ◽  
BIN LIU

This paper presents a new real-time, dynamic web page recommendation system based on web-log mining. The visit sequences of previous visitors are used to train a classifier for web page recommendation. The recommendation engine identifies a current active user, and submits its visit sequence as an input to the classifier. The output of the recommendation engine is a set of recommended web pages, whose links are attached to bottom of the requested page. Our experiments show that the proposed approach is effective: the predictive accuracy is quite high (over 90%), and the time for the recommendation is quite small.


Author(s):  
Satinder Kaur ◽  
Sunil Gupta

Inform plays a very important role in life and nowadays, the world largely depends on the World Wide Web to obtain any information. Web comprises of a lot of websites of every discipline, whereas websites consists of web pages which are interlinked with each other with the help of hyperlinks. The success of a website largely depends on the design aspects of the web pages. Researchers have done a lot of work to appraise the web pages quantitatively. Keeping in mind the importance of the design aspects of a web page, this paper aims at the design of an automated evaluation tool which evaluate the aspects for any web page. The tool takes the HTML code of the web page as input, and then it extracts and checks the HTML tags for the uniformity. The tool comprises of normalized modules which quantify the measures of design aspects. For realization, the tool has been applied on four web pages of distinct sites and design aspects have been reported for comparison. The tool will have various advantages for web developers who can predict the design quality of web pages and enhance it before and after implementation of website without user interaction.


2020 ◽  
Author(s):  
A. E. Sullivan ◽  
S. J. Tappan ◽  
P. J. Angstman ◽  
A. Rodriguez ◽  
G. C. Thomas ◽  
...  

AbstractWith advances in microscopy and computer science, the technique of digitally reconstructing, modeling, and quantifying microscopic anatomies has become central to many fields of biological research. MBF Bioscience has chosen to openly document their digital reconstruction file format, Neuromorphological File Specification (4.0), available at www.mbfbioscience.com/filespecification (Angstman et al. 2020). One of such technologies, the format created and maintained by MBF Bioscience is broadly utilized by the neuroscience community. The data format’s structure and capabilities have evolved since its inception, with modifications made to keep pace with advancements in microscopy and the scientific questions raised by worldwide experts in the field. More recent modifications to the neuromorphological data format ensure it abides by the Findable, Accessible, Interoperable, and Reusable (FAIR) data standards promoted by the International Neuroinformatics Coordinating Facility (INCF; Wilkinson et al. 2016). The incorporated metadata make it easy to identify and repurpose these data types for downstream application and investigation. This publication describes key elements of the file format and details their relevant structural advantages in an effort to encourage the reuse of these rich data files for alternative analysis or reproduction of derived conclusions.


Sign in / Sign up

Export Citation Format

Share Document