Data Text Mining Based on Swarm Intelligence Techniques

Author(s):  
Mohamed Atef Mosa

Due to the great growth of data on the web, mining to extract the most informative data as a conceptual brief would be beneficial for certain users. Therefore, there is great enthusiasm concerning the developing automatic text summary approaches. In this chapter, the authors highlight using the swarm intelligence (SI) optimization techniques for the first time in solving the problem of text summary. In addition, a convincing justification of why nature-heuristic algorithms, especially ant colony optimization (ACO), are the best algorithms for solving complicated optimization tasks is introduced. Moreover, it has been perceived that the problem of text summary had not been formalized as a multi-objective optimization (MOO) task before, despite there are many contradictory objectives in needing to be achieved. The SI has not been employed before to support the real-time tasks. Therefore, a novel framework of short text summary has been proposed to fulfill this issue. Ultimately, this chapter will enthuse researchers for further consideration for SI algorithms in solving summary tasks.

2020 ◽  
Vol 5 (4) ◽  
pp. 43-55
Author(s):  
Gianpiero Bianchi ◽  
Renato Bruni ◽  
Cinzia Daraio ◽  
Antonio Laureti Palma ◽  
Giulio Perani ◽  
...  

AbstractPurposeThe main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’ websites. The information automatically extracted can be potentially updated with a frequency higher than once per year, and be safe from manipulations or misinterpretations. Moreover, this approach allows us flexibility in collecting indicators about the efficiency of universities’ websites and their effectiveness in disseminating key contents. These new indicators can complement traditional indicators of scientific research (e.g. number of articles and number of citations) and teaching (e.g. number of students and graduates) by introducing further dimensions to allow new insights for “profiling” the analyzed universities.Design/methodology/approachWebometrics relies on web mining methods and techniques to perform quantitative analyses of the web. This study implements an advanced application of the webometric approach, exploiting all the three categories of web mining: web content mining; web structure mining; web usage mining. The information to compute our indicators has been extracted from the universities’ websites by using web scraping and text mining techniques. The scraped information has been stored in a NoSQL DB according to a semi-structured form to allow for retrieving information efficiently by text mining techniques. This provides increased flexibility in the design of new indicators, opening the door to new types of analyses. Some data have also been collected by means of batch interrogations of search engines (Bing, www.bing.com) or from a leading provider of Web analytics (SimilarWeb, http://www.similarweb.com). The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register (https://eter.joanneum.at/#/home), a database collecting information on Higher Education Institutions (HEIs) at European level. All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.FindingsThe main findings of this study concern the evaluation of the potential in digitalization of universities, in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’ websites. These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitationsThe results reported in this study refers to Italian universities only, but the approach could be extended to other university systems abroad.Practical implicationsThe approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites. The approach could be applied to other university systems.Originality/valueThis work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping, optical character recognition and nontrivial text mining operations (Bruni & Bianchi, 2020).


Nowadays, internet has become the easiest way to obtain more information from the web and millions of users search internet to find out the information. The continuous growth of web pages and users interest to search more information about various topics increases the complexity of recommendation. The user's behavior is extracted by using the web mining techniques, which are used in web server log. The main aim of this research study is to identify the navigation pattern of users from the log files. There are three major steps in the web mining process namely pre-processing the data, classification of pattern and users discovery. In recent periods, the web page articles are classified by the researchers before recommending the requested page to users. However, every category size is too large or manual labors are often needed for classification tasks. A high time complexity issues are faced by some existing clustering methods or according to the initial parameters, these techniques provides the iterative computing that leads to insufficient results. To address the above issues, a recommendation for web page is developed by initializing the margin parameters of classification techniques which considers both effectiveness and efficiency. This research work initializes the Random Forest's (RF) margin parameters by using the FireFly Algorithm (FFA) for reducing the processing time to speed up the process. A large volume of user's interest data is processed by these margin parameters, which provides a better recommendation than existing techniques. The experimental results show that RF-FFA method achieved 41.89% accuracy and recall values, when compared with other heuristic algorithms.


Biomarkers ◽  
2021 ◽  
pp. 1-22
Author(s):  
Fábio Trindade ◽  
Luís Perpétuo ◽  
Rita Ferreira ◽  
Adelino Leite-Moreira ◽  
Inês Falcão-Pires ◽  
...  

2021 ◽  
Author(s):  
Bo Galle ◽  

<p>We present a detailed global data-set of volcanic sulphur dioxide (SO2) emissions during the period 2005-2017. Measurements were obtained by scanning-DOAS instruments of the NOVAC network at 32 volcanoes, and processed using a standardized procedure. We reveal the daily statistics of volcanic gas emissions under a variety of volcanological and meteorological conditions. Data from several volcanoes are presented for the first time. Our results  are compared with yearly averages derived from measurements from space by the Aura/OMI instrument and with historical inventories of GEIA. This comparison shows some interesting differences which reasons are briefly discussed. Data is openly available through the web repository at https://novac.chalmers.se/.</p>


Author(s):  
Horacio Saggion

Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text simplification concerns both the modification of the vocabulary of the text (lexical simplification) and the modification of the structure of the sentences (syntactic simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe simplification applications and full systems also outline language resources and evaluation approaches.


Data Mining ◽  
2013 ◽  
pp. 1312-1319
Author(s):  
Marco Scarnò

CASPUR allows many academic Italian institutions located in the Centre-South of Italy to access more than 7 million articles through a digital library platform. The behaviour of its users were analyzed by considering their “traces”, which are stored in the web server log file. Using several web mining and data mining techniques the author discovered a gradual and dynamic change in the way articles are accessed. In particular there is evidence of a journal browsing increase in comparison to the searching mode. Such phenomenon were interpreted using the idea that browsing better meets the needs of users when they want to keep abreast about the latest advances in their scientific field, in comparison to a more generic searching inside the digital library.


Author(s):  
P. Matrenin ◽  
V. Myasnichenko ◽  
N. Sdobnyakov ◽  
D. Sokolov ◽  
S. Fidanova ◽  
...  

<span lang="EN-US">In recent years, hybrid approaches on population-based algorithms are more often applied in industrial settings. In this paper, we present the approach of a combination of universal, problem-free Swarm Intelligence (SI) algorithms with simple deterministic domain-specific heuristic algorithms. The approach focuses on improving efficiency by sharing the advantages of domain-specific heuristic and swarm algorithms. A heuristic algorithm helps take into account the specifics of the problem and effectively translate the positions of agents (particle, ant, bee) into the problem's solution. And a Swarm algorithm provides an increase in the adaptability and efficiency of the approach due to stochastic and self-organized properties. We demonstrate this approach on two non-trivial optimization tasks: scheduling problem and finding the minimum distance between 3D isomers.</span>


Author(s):  
Ricardo Baeza-Yates ◽  
Roi Blanco ◽  
Malú Castellanos

Web search has become a ubiquitous commodity for Internet users. This fact puts a large number of documents with plenty of text content at our fingertips. To make good use of this data, we need to mine web text. This triggers the two problems covered here: sentiment analysis and entity retrieval in the context of the Web. The first problem answers the question of what people think about a given product or a topic, in particular sentiment analysis in social media. The second problem addresses the issue of solving certain enquiries precisely by returning a particular object: for instance, where the next concert of my favourite band will be or who the best cooks are in a particular region. Where to find these objects and how to retrieve, rank, and display them are tasks related to the entity retrieval problem.


Web Mining ◽  
2011 ◽  
pp. 27-49 ◽  
Author(s):  
Penelope Markellou ◽  
Maria Rigou ◽  
Spiros Sirmakessis

The Web has become a huge repository of information and keeps growing exponentially under no editorial control, while the human capability to find, read and understand content remains constant. Providing people with access to information is not the problem; the problem is that people with varying needs and preferences navigate through large Web structures, missing the goal of their inquiry. Web personalization is one of the most promising approaches for alleviating this information overload, providing tailored Web experiences. This chapter explores the different faces of personalization, traces back its roots and follows its progress. It describes the modules typically comprising a personalization process, demonstrates its close relation to Web mining, depicts the technical issues that arise, recommends solutions when possible, and discusses the effectiveness of personalization and the related concerns. Moreover, the chapter illustrates current trends in the field suggesting directions that may lead to new scientific results.


Sign in / Sign up

Export Citation Format

Share Document