scholarly journals Exploring the Potentialities of Automatic Extraction of University Webometric Information

2020 ◽  
Vol 5 (4) ◽  
pp. 43-55
Author(s):  
Gianpiero Bianchi ◽  
Renato Bruni ◽  
Cinzia Daraio ◽  
Antonio Laureti Palma ◽  
Giulio Perani ◽  
...  

AbstractPurposeThe main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities.Design/methodology/approachWebometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.FindingsThe main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitationsThe results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad.Practical implicationsThe approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites. The approach could be applied to other university systems.Originality/valueThis work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping, optical character recognition and nontrivial text mining operations (Bruni & Bianchi, 2020).

2021 ◽  
Vol 7 (3) ◽  
pp. a1en
Author(s):  
Marcello Tenorio de Farias ◽  
Alan César Belo Angeluci ◽  
Brasilina Passarelli

With the spread of access and use of information through the web and social networks, information retrieval in large volumes of data has become unfeasible by manual methods. In this applied study, the contribution of the development and use of a prototype tool for automatic data scraping from online evaluations made on Google Maps – Discovery Stars – was reported. The retrieved data allowed us to investigate how these assessments can have the potential to influence the behavior of the platform's users. Among the results, it was observed that the reading and posting of reviews impact the formation of opinion and motivations of Google Maps users.  


Author(s):  
Harmandeep Kaur ◽  
Kamaljit Kaur Dhillon

This article approaches the utilization of the Naive Bayes (in a matter of moments NB) classifier. It exhibits that the count NB improves the assignments of the Web mining by the precision reports arrange. This recommendation separated the execution of Naïve Bayes count with other gathering frameworks. The probability of making a gathering model for doling out the Scholarships to understudies by focusing on precision of the system, numerous components have been bankrupt down, notwithstanding several them are discovered capable when accuracy was considered


2021 ◽  
Vol 2089 (1) ◽  
pp. 012048
Author(s):  
Kishor Kumar Reddy C ◽  
P R Anisha ◽  
Nhu Gia Nguyen ◽  
G Sreelatha

Abstract This research involves the usage of Machine Learning technology and Natural Language Processing (NLP) along with the Natural Language Tool-Kit (NLTK). This helps develop a logical Text Summarization tool, which uses the Extractive approach to generate an accurate and a fluent summary. The aim of this tool is to efficiently extract a concise and a coherent version, having only the main needed outline points from the long text or the input document avoiding any type of repetitions of the same text or information that has already been mentioned earlier in the text. The text to be summarized can be inherited from the web using the process of web scraping or entering the textual data manually on the platform i.e., the tool. The summarization process can be quite beneficial for the users as these long texts, needs to be shortened to help them to refer to the input quickly and understand points that might be out of their scope to understand.


Author(s):  
Mohamed Atef Mosa

Due to the great growth of data on the web, mining to extract the most informative data as a conceptual brief would be beneficial for certain users. Therefore, there is great enthusiasm concerning the developing automatic text summary approaches. In this chapter, the authors highlight using the swarm intelligence (SI) optimization techniques for the first time in solving the problem of text summary. In addition, a convincing justification of why nature-heuristic algorithms, especially ant colony optimization (ACO), are the best algorithms for solving complicated optimization tasks is introduced. Moreover, it has been perceived that the problem of text summary had not been formalized as a multi-objective optimization (MOO) task before, despite there are many contradictory objectives in needing to be achieved. The SI has not been employed before to support the real-time tasks. Therefore, a novel framework of short text summary has been proposed to fulfill this issue. Ultimately, this chapter will enthuse researchers for further consideration for SI algorithms in solving summary tasks.


2018 ◽  
Vol 69 (12) ◽  
pp. 1446-1459 ◽  
Author(s):  
Tharindu Rukshan Bandaragoda ◽  
Daswin De Silva ◽  
Damminda Alahakoon ◽  
Weranja Ranasinghe ◽  
Damien Bolton

2017 ◽  
Author(s):  
Morgan N. Price ◽  
Adam P. Arkin

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.


Data Mining ◽  
2013 ◽  
pp. 1312-1319
Author(s):  
Marco Scarnò

CASPUR allows many academic Italian institutions located in the Centre-South of Italy to access more than 7 million articles through a digital library platform. The behaviour of its users were analyzed by considering their “traces”, which are stored in the web server log file. Using several web mining and data mining techniques the author discovered a gradual and dynamic change in the way articles are accessed. In particular there is evidence of a journal browsing increase in comparison to the searching mode. Such phenomenon were interpreted using the idea that browsing better meets the needs of users when they want to keep abreast about the latest advances in their scientific field, in comparison to a more generic searching inside the digital library.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 49 ◽  
Author(s):  
Fabian Schreiber

Summary: Phylogenetic trees are widely used to represent the evolution of gene families. As the history of gene families can be complex (including lots of gene duplications), its visualisation can become a difficult task. A good/accurate visualisation of phylogenetic trees - especially on the web - allows easier understanding and interpretation of trees to help to reveal the mechanisms that shape the evolution of a specific set of gene/species. Here, I present treeWidget, a modular BioJS component to visualise phylogenetic trees on the web. Through its modularity, treeWidget can be easily customized to allow the display of sequence information, e.g. protein domains and alignment conservation patterns.Availability: http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7707


Sign in / Sign up

Export Citation Format

Share Document