An Automatic Extraction of Academia-Industry Collaborative Research and Development Documents on the Web

Author(s):  
Kei Kurakawa ◽  
Yuan Sun ◽  
Nagayoshi Yamashita ◽  
Yasumasa Baba
2020 ◽  
Vol 5 (4) ◽  
pp. 43-55
Author(s):  
Gianpiero Bianchi ◽  
Renato Bruni ◽  
Cinzia Daraio ◽  
Antonio Laureti Palma ◽  
Giulio Perani ◽  
...  

AbstractPurposeThe main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities.Design/methodology/approachWebometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.FindingsThe main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitationsThe results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad.Practical implicationsThe approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites. The approach could be applied to other university systems.Originality/valueThis work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping, optical character recognition and nontrivial text mining operations (Bruni & Bianchi, 2020).


Author(s):  
Rafael Moreno-Sanchez

The Semantic Web (SW) and Geospatial Semantic Web (GSW) are considered the next step in the evolution of the Web. For most non-Web specialists, geospatial information professionals, and non-computer-science students these concepts and their impacts on the way we use the Web are not clearly understood. The purpose of this chapter is to provide this broad audience of non-specialists with a basic understanding of: the needs and visions driving the evolution toward the SW and GSW; the principles and technologies involved in their implementation; the state of the art in the efforts to create the GSW; the impacts of the GSW on the way we use the Web to discover, evaluate, and integrate geospatial data and services; and the needs for future research and development to make the GSW a reality. A background on the SW is first presented to serve as a basis for more specific discussions on the GSW.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Peter L Gorski ◽  
Emanuel von Zezschwitz ◽  
Luigi Lo Iacono ◽  
Matthew Smith

Abstract We present a systematization of usable security principles, guidelines and patterns to facilitate the transfer of existing knowledge to researchers and practitioners. Based on a literature review, we extracted 23 principles, 11 guidelines and 47 patterns for usable security and identified their interconnection. The results indicate that current research tends to focus on only a subset of important principles. The fact that some principles are not yet addressed by any design patterns suggests that further work on refining these patterns is needed. We developed an online repository, which stores the harmonized principles, guidelines and patterns. The tool enables users to search for relevant guidance and explore it in an interactive and programmatic manner. We argue that both the insights presented in this article and the web-based repository will be highly valuable for students to get a good overview, practitioners to implement usable security and researchers to identify areas of future research.


Sign in / Sign up

Export Citation Format

Share Document