Automatic Extraction of Semantically-Meaningful Information from the Web

An Automatic Extraction of Academia-Industry Collaborative Research and Development Documents on the Web

Studies in Classification, Data Analysis, and Knowledge Organization - German-Japanese Interchange of Data Analysis Results ◽

10.1007/978-3-319-01264-3_18 ◽

2013 ◽

pp. 203-211

Author(s):

Kei Kurakawa ◽

Yuan Sun ◽

Nagayoshi Yamashita ◽

Yasumasa Baba

Keyword(s):

Research And Development ◽

Collaborative Research ◽

Automatic Extraction ◽

The Web

Download Full-text

Exploring the Potentialities of Automatic Extraction of University Webometric Information

Journal of Data and Information Science ◽

10.2478/jdis-2020-0040 ◽

2020 ◽

Vol 5 (4) ◽

pp. 43-55

Author(s):

Gianpiero Bianchi ◽

Renato Bruni ◽

Cinzia Daraio ◽

Antonio Laureti Palma ◽

Giulio Perani ◽

...

Keyword(s):

Text Mining ◽

Web Mining ◽

Knowledge Extraction ◽

Automatic Extraction ◽

Mining Operations ◽

Automatic Data ◽

Link Type ◽

Web Scraping ◽

University Systems ◽

The Web

AbstractPurposeThe main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities.Design/methodology/approachWebometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.FindingsThe main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitationsThe results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad.Practical implicationsThe approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites. The approach could be applied to other university systems.Originality/valueThis work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping, optical character recognition and nontrivial text mining operations (Bruni & Bianchi, 2020).

Download Full-text

Web Usage Mining Issues in Big Data

Impacts and Challenges of Cloud Business Intelligence - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-7998-5040-3.ch007 ◽

2021 ◽

pp. 102-112

Author(s):

Sunny Sharma ◽

Manisha Malhotra

Keyword(s):

Data Mining ◽

Big Data ◽

User Behavior ◽

Web Usage Mining ◽

Web Personalization ◽

Data Mining Techniques ◽

Meaningful Information ◽

Web Usage ◽

Use Of Data ◽

The Web

Web usage mining is the use of data mining techniques to analyze user behavior in order to better serve the needs of the user. This process of personalization uses a set of techniques and methods for discovering the linking structure of information on the web. The goal of web personalization is to improve the user experience by mining the meaningful information and presented the retrieved information in a way the user intends. The arrival of big data instigated novel issues to the personalization community. This chapter provides an overview of personalization, big data, and identifies challenges related to web personalization with respect to big data. It also presents some approaches and models to fill the gap between big data and web personalization. Further, this research brings additional opportunities to web personalization from the perspective of big data.

Download Full-text

A Decision Tree Framework for Semi-Automatic Extraction of Product Attributes from the Web

Studies in Computational Intelligence - Advances in Web Intelligence and Data Mining ◽

10.1007/3-540-33880-2_21 ◽

2006 ◽

pp. 201-210

Author(s):

Lior Rokach ◽

Roni Romano ◽

Barak Chizi ◽

Oded Maimon

Keyword(s):

Decision Tree ◽

Automatic Extraction ◽

Product Attributes ◽

The Web

Download Full-text

Extraction of Meaningful Information from the Web: a Brief Survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.19.28283 ◽

2018 ◽

Vol 7 (4.19) ◽

pp. 1041

Author(s):

Santosh V. Chobe ◽

Dr. Shirish S. Sane

Keyword(s):

Information Extraction ◽

Relevant Information ◽

Unstructured Data ◽

Web Pages ◽

Extraction Techniques ◽

Web Documents ◽

Meaningful Information ◽

The Web

There is an explosive growth of information on Internet that makes extraction of relevant data from various sources, a difficult task for its users. Therefore, to transform the Web pages into databases, Information Extraction (IE) systems are needed. Relevant information in Web documents can be extracted using information extraction and presented in a structured format.By applying information extraction techniques, information can be extracted from structured, semi-structured, and unstructured data. This paper presents some of the major information extraction tools. Here, advantages and limitations of the tools are discussed from a user’s perspective.

Download Full-text

Automatic Extraction of Meaning from the Web

2006 IEEE International Symposium on Information Theory ◽

10.1109/isit.2006.261979 ◽

2006 ◽

Cited By ~ 15

Author(s):

Rudi Cilibrasi ◽

Paul Vitanyi

Keyword(s):

Automatic Extraction ◽

The Web

Download Full-text

Automatic Extraction for Product Feature Words from Comments on the Web

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-642-04769-5_10 ◽

2009 ◽

pp. 112-123 ◽

Cited By ~ 8

Author(s):

Zhichao Li ◽

Min Zhang ◽

Shaoping Ma ◽

Bo Zhou ◽

Yu Sun

Keyword(s):

Automatic Extraction ◽

The Web

Download Full-text

A Survey of Automatic Extraction of Personal Name Alias from the Web

International Journal of Signal Processing Image Processing and Pattern Recognition ◽

10.14257/ijsip.2014.7.6.07 ◽

2014 ◽

Vol 7 (6) ◽

pp. 75-84

Author(s):

A. Muthusamy ◽

A. Subramani

Keyword(s):

Automatic Extraction ◽

The Web

Download Full-text

Automatic extraction of acronym definitions from the Web

Applied Intelligence ◽

10.1007/s10489-009-0197-4 ◽

2009 ◽

Vol 34 (2) ◽

pp. 311-327 ◽

Cited By ~ 29

Author(s):

David Sánchez ◽

David Isern

Keyword(s):

Automatic Extraction ◽

The Web

Download Full-text

A Survey on Aspect-Based Sentiment Classification

ACM Computing Surveys ◽

10.1145/3503044 ◽

2021 ◽

Author(s):

Gianni Brauwers ◽

Flavius Frasincar

Keyword(s):

State Of The Art ◽

Knowledge Bases ◽

Sentiment Classification ◽

Automatic Extraction ◽

Learning Models ◽

Text Documents ◽

Fine Grained ◽

Knowledge Based ◽

Transformer Model ◽

The Web

With the constantly growing number of reviews and other sentiment-bearing texts on the Web, the demand for automatic sentiment analysis algorithms continues to expand. Aspect-based sentiment classification (ABSC) allows for the automatic extraction of highly fine-grained sentiment information from text documents or sentences. In this survey, the rapidly evolving state of the research on ABSC is reviewed. A novel taxonomy is proposed that categorizes the ABSC models into three major categories: knowledge-based, machine learning, and hybrid models. This taxonomy is accompanied with summarizing overviews of the reported model performances, and both technical and intuitive explanations of the various ABSC models. State-of-the-art ABSC models are discussed, such as models based on the transformer model, and hybrid deep learning models that incorporate knowledge bases. Additionally, various techniques for representing the model inputs and evaluating the model outputs are reviewed. Furthermore, trends in the research on ABSC are identified and a discussion is provided on the ways in which the field of ABSC can be advanced in the future.

Download Full-text